Session 12 : Operating System Programming Concepts : Lab 12 - AWK & SED
Lab
12 : Awk & Sed
Some
useful Built-in functions (string manipulation )
length(string)
→ it gives you the number of characters in string
match(string,regexp)
→ it
searches the string, “string”, for the longest, leftmost
substring matched by the regular expression, “regexp”
→ it
returns the character position, or index of where that substring
begins (1, if it starts at the beginning of string). If no match is
found, it returns 0
split(string,
array, field_separator)
→ this
divides “string” into pieces separated by “field_separator”,
and stores the pieces in “array”.
sprintf(format,expression1,
…...)
→ this
returns (without printing) the string that printf whould have printed
out with the same arugments
sub(regexp,
replacement, target)
→ the
sub function alters the value of target.
→ it
seaches this value, which should be a string, for the leftmost
substring matched by the regular expression, regexp, extending this
match as far as possible. Then the entire string is changed by
replacing the matched text with replacement.
gsub(regexp,
replacement, target)
→ this
is similar to the sub function, except gsub replaces all the longest,
leftmost, nonoverlapping matching substrings it can find
Q?How
do you redirect output to files
/>
print
items > output-file
/>
prints the items onto the “output-file”
print
items >> output-file
/>
appends the items onto the “output-file”
Examples
:
awk
{print $0 > “fileout.txt”} filein.txt
awk
{print $0 >> “fileout.txt”} filein.txt
Q?
What is sed
/> It
stands for stream editor
/> it
is used to perform text transformations on an input stream( a file or
input from a pipeline )
/> it
has an ability to filter text in a pipeline which particularly
distinguishes it from other types of editors
Q?
How do you select lines with SED
/>
number – specifying a line number will match only that line in the
input
/>
number1, number2 – specifying a range between line number1 and line
number2 (included)
/>
/Pattern1/,/Pattern2/ - specifying a range between line that matches
Pattern1 and line that matches Pattern2
/>
/Pattern/ - selecting only lines that matches the Pattern
/>
first ~ step – this gnu extension matches every step-th line
starting with line first
/> $
- matches the last line of input
Some
Most common sed commands :
s –
subsitution :
syntax:
's/regexp/replacement/flags'
/>
the portion of the pattern that matches regexp is replaced with
replacement
/>
the replacement might include :
&
: refers to the whole portion that matches regexp
\L
: turn the replacement to lowercase until a \U or \E is found
\I
: turn the next character to lowercase
\U
: turn the replacement to uppercase until a \L or \E is found
\u
: turn the next character to uppercase
\E
: stop case conversion started by \L or \U
Example:
echo
”abbbb” | sed -e “s/^.*$/\u&/”
echo
“abbbb” | sed -e “s/^.*$/\U&\E/”
s –
substitution flags
g
: apply the replacement to all matches to the regexp, not
just the first
number
: only replace the number that matches to the regexp
p
: if the subsitution was made, then print the whole line
Example
:
echo
HEADER HEADER | sed -e “s/HEADER//”
echo
HEADER HEADER | sed -e “s/HEADER//g”
echo
HEADER HEADER | sed -n -e “s/HEADER//”
echo
HEADER HEADER | sed -n -e “s/HEADER//p”
d-
delete
:
delete the whole lines that match the restriction
Example
:
sed -e
“1,5d” filein.txt
sed -e
/HEADER/d
sed -e
/^HEADER$/d
p –
print
: print
the whole lines that match the restriction
Example
:
sed -n
-e “1,5p” filein.txt
sed -e
“1,5p” filein.txt
Q?
What is regular expression
/> A
regular expression is a pattern that is matched against a subject
string from left to right
/> it
helps us to include alternatives and repetations in the patterns
Char
: a single ordinary character matches itself
*
: matches a sequence of zero or more instances of matches for the
preceding regular expression
\+
: as * , but matches one or more
\?
: as *, but only matches zero or one
\{i\}:
as *, but matches exactly I sequences
\{i,j\}
: matches between I and j , inclusive, seqences
\{i,\}:
matches more than or equal to I sequences
\(regexp\)
: groups the inner regexp
.
: matches any character, including newline
^
: matches the null string at beginning of line
$
: matches the null string at the end of the line
[list]
: matches any single character in list
[^list]:
a leading ^ reverses the meaning of list, so that it matches any
single character not in list
regexp1/regexp2
: matches either regexp1 or regexp2 (use parenthesis to use complex
alternatives )
\digit
Matches the digit-th\(..\) : parenthesized subexpression
in the regular expression
\n
: matches the new line character
Example:
echo
“HEADER HEADER33456” | sed -e “s/ HEADER[ 0-9]* / -/”
Comments
Post a Comment