Session 12 : Operating System Programming Concepts : Lab 12 - AWK & SED


Lab 12 : Awk & Sed
Some useful Built-in functions (string manipulation )

length(string) → it gives you the number of characters in string
match(string,regexp)
it searches the string, “string”, for the longest, leftmost substring matched by the regular expression, “regexp”
it returns the character position, or index of where that substring begins (1, if it starts at the beginning of string). If no match is found, it returns 0
split(string, array, field_separator)
this divides “string” into pieces separated by “field_separator”, and stores the pieces in “array”.
sprintf(format,expression1, …...)
this returns (without printing) the string that printf whould have printed out with the same arugments
sub(regexp, replacement, target)
the sub function alters the value of target.
it seaches this value, which should be a string, for the leftmost substring matched by the regular expression, regexp, extending this match as far as possible. Then the entire string is changed by replacing the matched text with replacement.
gsub(regexp, replacement, target)
this is similar to the sub function, except gsub replaces all the longest, leftmost, nonoverlapping matching substrings it can find


Q?How do you redirect output to files
/>
print items > output-file
/> prints the items onto the “output-file”
print items >> output-file
/> appends the items onto the “output-file”

Examples :
awk {print $0 > “fileout.txt”} filein.txt
awk {print $0 >> “fileout.txt”} filein.txt



Q? What is sed
/> It stands for stream editor
/> it is used to perform text transformations on an input stream( a file or input from a pipeline )
/> it has an ability to filter text in a pipeline which particularly distinguishes it from other types of editors

Q? How do you select lines with SED
/> number – specifying a line number will match only that line in the input
/> number1, number2 – specifying a range between line number1 and line number2 (included)
/> /Pattern1/,/Pattern2/ - specifying a range between line that matches Pattern1 and line that matches Pattern2
/> /Pattern/ - selecting only lines that matches the Pattern
/> first ~ step – this gnu extension matches every step-th line starting with line first
/> $ - matches the last line of input

Some Most common sed commands :
s – subsitution :
syntax: 's/regexp/replacement/flags'
/> the portion of the pattern that matches regexp is replaced with replacement
/> the replacement might include :
& : refers to the whole portion that matches regexp
\L : turn the replacement to lowercase until a \U or \E is found
\I : turn the next character to lowercase
\U : turn the replacement to uppercase until a \L or \E is found
\u : turn the next character to uppercase
\E : stop case conversion started by \L or \U

Example:
echo ”abbbb” | sed -e “s/^.*$/\u&/”
echo “abbbb” | sed -e “s/^.*$/\U&\E/”

s – substitution flags
g : apply the replacement to all matches to the regexp, not just the first
number : only replace the number that matches to the regexp
p : if the subsitution was made, then print the whole line


Example :
echo HEADER HEADER | sed -e “s/HEADER//”
echo HEADER HEADER | sed -e “s/HEADER//g”
echo HEADER HEADER | sed -n -e “s/HEADER//”
echo HEADER HEADER | sed -n -e “s/HEADER//p”
d- delete
: delete the whole lines that match the restriction
Example :
sed -e “1,5d” filein.txt
sed -e /HEADER/d
sed -e /^HEADER$/d


p – print
: print the whole lines that match the restriction
Example :
sed -n -e “1,5p” filein.txt
sed -e “1,5p” filein.txt

Q? What is regular expression
/> A regular expression is a pattern that is matched against a subject string from left to right
/> it helps us to include alternatives and repetations in the patterns
Char : a single ordinary character matches itself
* : matches a sequence of zero or more instances of matches for the preceding regular expression
\+ : as * , but matches one or more
\? : as *, but only matches zero or one
\{i\}: as *, but matches exactly I sequences
\{i,j\} : matches between I and j , inclusive, seqences
\{i,\}: matches more than or equal to I sequences
\(regexp\) : groups the inner regexp
. : matches any character, including newline
^ : matches the null string at beginning of line
$ : matches the null string at the end of the line
[list] : matches any single character in list
[^list]: a leading ^ reverses the meaning of list, so that it matches any single character not in list
regexp1/regexp2 : matches either regexp1 or regexp2 (use parenthesis to use complex alternatives )
\digit Matches the digit-th\(..\) : parenthesized subexpression in the regular expression
\n : matches the new line character

Example:
echo “HEADER HEADER33456” | sed -e “s/ HEADER[ 0-9]* / -/”

Comments

Popular Posts