Session 10 : Operating System Programming Concepts : Lab 10 - Posix Threads & AWK



Lab 10: Posix Threads & Awk

Q? What is thread
/> it is a sequence of control within a process
/> creating a new thread and creating a new process is different
/> when we create a new thread in a process, the new thread of execution gets its own stack (and hence local variable) but shares:
global variables – file descriptors – signal handlers – its current directory

Q? How to use Posix Thread
/> First thread creation is done :
int pthread_create(pthread_t *thread, pthread_attr_t *attr, void *(*start_routine)(void *) , void *arg);
void *(*start_routine) (void *) is simply saying we must pass the address of a function taking a pointer to void as parameter, and returning a pointer to void.

/> Second we join the created threads:
int pthread_join( pthread_t th, void **thread_return)

/> Lastly we exit the thread
void pthread_exit(void * retval)
the function terminates the calling thread, returning a pointer to an object


Q? What is awk
/> it is a utility that interprets a special-purpose programming language that makes it possible to handle simple data- reforming jobs easy.
Ex: make changes in text files when certain pattern appear
extract data from parts of certain lines while discarding the rest
/> actually it searches for line that contains certain pattern
/> The rule of awk program is to search for one pattern and one action to perform when that pattern is found :
pattern1 { action1 }
pattern2 { action2 }

Note: * awk keeps processing input lines in this way until searching for pattern in a line till the end of input file is reached

Q? How to run awk programs
/> If the program is short we run as follows:
awk 'program' input-file1 input-file2 …....
Note: the program consist of format described above
/> if the program is long it is usually more convienient to put it in a file and run it with a command like this:
awk -f program-file.awk input-file1 input-file2 …........


Q? How do we read input files
a/> All input can be read from the standard input (keyboard or pipe from another command).
b/> or we can read from files whose names are specified on a awk command line
/> in case of reading from files , awk reads them in order, reading all the data from one before going on to the next (this unit of reading is called records : process 1 record at a time )
/> in rare condition we use getline command which can do explicit input form any number of files.

Q? How input is split into records
/> awk language divides its input into records and fields and this records are seperated by a character called record seperator (RS)
/> by default the RS is a newline character (\n) i.e. the 1 record is 1 single line
/> we can also use the built-in variable RS to use a different char to seperate our records

Q? What is fields
/> records are automatically parsed into chunks called fields.
/> by default field are separated by whitespace (ex: like words in a line)
/> $1 refers to first field , $2 to second and so on …........
/> $NF represent the latest field (whose value is the number of fields in the current records )

Q? how fields are sperated
/> this is controlled by field separator FS(built in variable)
/> this FS is a single character or a regular expresssion

Q? How do we print outputs
/> we use print statement
/> we can specify the string or numbers to be printed in a list separated by comma
print item1, item2,..............

Example:
awk '{if (NF < 4) printf “line content is ” $0; }' file.in

Note: * conditional syntax is same as C language

awk '/user/' /etc/passwd → prints the line that contains string user
awk 'length($0) > 80 ' file.in → prints every line longer than 80 chars
awk 'NF > 0' → prints every line that has at least one field
awk '{print $1}' file.in → prints the first field of file

/string1/ {if ($3>0) print $1 } # rule 1 comment
/string2/ {if($4>10) print $NF} #rule 2

Execution:
awk -f simple_script.awk inputFile



Learning goals: in this laboratory activity you will practice writing C multithreading applications
by using the pthread library.
You will learn how to write simple AWK scripts and you will also improve your Bash scripting
skills.
Exercise 1
Write a concurrent program able to sort data files using threads as follows.
The input files include:
on the first line the total number of integer values;
on the following lines a number for each line;
For example a file could be:
5
3
45
76
9
11
The program reads N input parameters, for each couple of parameters:
The first parameter identifies input files
The second one output files.
Then the program creates N/2 threads.
Each thread:
reads the corresponding input file
sort the corresponding integer vector in ascending order
store the result in the corresponding output file
For example:
./thread_sort.exe file1.in file1.out file2.in file2.out file3.in file3.out
It will create 3 threads :
Thread 1 will sort file1.in and it will store the result in file1.out
Thread 2 will sort file2.in and it will store the result in file2.out
Thread 3 will sort file3.in and it will store the result in file3.out
Hint:
Each thread calls 3 functions:
1. ReadFileIn
2. Sort
3. WriteFileOut
The program has to implement the following precedence graph(example with 4 parameters → 2 Threads ):

















 

M1 , M2 : main begin and main end respectively
R1 , R2 : reading input files
O1 , O2 : vectors sorting
W1 , W2 : writing output files


Exercise 2
Write a Bash script which reads a file name as the first command line parameter and a word as the
second command line parameter.
The file includes a list of directories, and for each directory searches all the text files (ending in
.txt”) and for each file generates statistics in two files:
1. The first file with the same file name but ending in “.stat” containing file statistics
number of lines , number of chars , number of words and the length of the longest line;
2. The second with the same file name but ending in “.graph” containing a histogram (made by
+” and “–“ symbols) representing the occurrence of each word (second parameter) in the
text file. For each line of the text file, the graph file contains the line number followed by a
+” symbol for each occurrence of the word and a “–“ symbol if the word does not appear
in the corresponding line. sh
At the end both the “.stat” file and the “.graph” file are stored in a new directory with the same
name of the original directory but ending in “_stats”.
Hint: use basename command to remove extension of a filename (man basename)
Optional: The script also create a compress archive for each stat directory and move it to a
directory named “backup”.


Exercise 3
Using only AWK perform the following tasks:
1. Print the name of the initialization process, the first process executed with PID 1;
2. Print the name and PID of the processes whose status is R or R+;
Hint: use ps -el to list the processes and redirect the output to the awk command


Summary
At the end of this laboratory activity you should have understood how to use threads to write
multithreading applications. You should also have improved your understanding about writing Bash
and AWK scripts.












Comments

Popular Posts