CISC3130: Lab Assignment 2

The focus of this lab is the regular expression and grep family commands, sed and filter commands such as sort, cut, paste, tr, uniq, etc.

Things to keep in mind: Read the requirement carefully, and clarify with the instructor if it's not clear.

1. Extract phone numbers from text file

People writes phone numbers in various different formats.
  1. plain 10 digits number: for example, 7188174484
  2. using hyphens to separate the digits: 718-817-4484
  3. enclosing area code with parenthesis, and insert a comma before the extensions: (718)817,4484
  4. omitting area code, e.g., 817-4484, or 817,4484, 8174484.
  5. including the coutry code, e.g., (01)718,817,4484, 01,718-817-4484
Write a script to search all possible telephone numbers in a set of files. You can consider using the -f option of grep family commands to read the patterns from a file.

Solution: Use command grep -f phone.grep file.txt to match phone numbers in file.txt. The phone.grep (see here) stores the list of patterns as follows.

[^0-9][0-9]\{10\}$
[^0-9][0-9]\{10\}[^0-9]
[^0-9][0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}$
[^0-9][0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}[^0-9]
[^0-9][0-9]\{3\}\,[0-9]\{4\}$
[^0-9][0-9]\{3\}\,[0-9]\{4\}[^0-9]
[^0-9][0-9]\{3\}\-[0-9]\{4\}$
[^0-9][0-9]\{3\}\-[0-9]\{4\}[^0-9]
[^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\)[0-9]\{3\}\,[0-9]\{4\}$
[^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\)[0-9]\{3\}\,[0-9]\{4\}[^0-9]
[^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\))?[0-9]\{3\}\-[0-9]\{4\}$
[^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\))?[0-9]\{3\}\-[0-9]\{4\}[^0-9]
[^0-9]*[0-9]\{2\}\,[0-9]\{3\}\,[0-9]\{3\}\,[0-9]\{4\}$
[^0-9]*[0-9]\{2\}\,[0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}[^0-9]
[^0-9]*[0-9]\{2\}\,[0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}$
[^0-9]*[0-9]\{2\}\,[0-9]\{3\}\,[0-9]\{3\}\,[0-9]\{4\}[^0-9]

2. Fun with words

Download a word list from the Internet, or from the following link. You should use command wget as demonstrated below:
wget http://storm.cis.fordham.edu/~zhang/cs3130/Codes/wordlist.txt 
The wget command retrieves a resource (wordlist.txt) specified by an URL, and stores it under your current directory. Write a bash script that finds out words matching the following patterns (one grep command per pattern):
  1. All 6 letters long palindromes (Note, somehow backreference does not work on this large file, you can create a small file that contains some made-up words in them to test your command).

    Solutions:

    echo "6 letters palindrome"
    grep '\(.\)\(.\)\(.\)\3\2\1' wordlist.txt
    
  2. All words that have no more than five letters in it (Note, command grep has an option that allow you to invert the matching, i.e., display lines that do not contain the pattern). Solutions:
    echo "Words no more than five letter long"
    grep -v '......' wordlist.txt
    
    Or
    grep -E '(^.$)|(^..$)|(^...$)|(^....$)' wordlist.txt
    
  3. All words that contains letters c, a, t in them, in this order (can be separate with other letters), e.g., chat, catch, .... Solutions:
    echo "words that contains letters c, a, t in them, in this order (can be separate with other letters), e.g., chat, catch, ...."
    grep 'c[a-z]*a[a-z]*t' wordlist.txt
    

3. Batch Editing using sed

  1. Write a sed script file that remove all one-line comments from C/C++ source code files. Note that such comments starting with //, and ends at the end of line. You need to take care the cases where // appears in double quote, or single quote, in thsse cases, what comes after // is not comment.

Solution: We create a sed script, named rmcnt.sed as follows, make the file executable:

#!/bin/sed -f
## remove one-line comments from C/C++ code

/^[^'"]*\/\// s/\/\/.*$/ /g
And then we use the following command to remove comment from C/C++ code:
rmcnt.sed sample.cpp 

4. Collecting/formatting data using cut, join, tr, sort, cat

  1. Write a command line (possibly using pipeline to connect multiple commands) to list all files under current directories, showing the file name and the file size. (Note we tried it in class, and as fields in ls -l output can be multiple spaces, simplying using cut on the output does not work properly. Try to fix the problem (hint: use tr).

    Solution:

    ls -l |   ## Get the long listing   
    tr -s ' ' ' ' |   ## replace mulitple spaces with a single space 
    cut -d ' ' -f 5,9   ## using space as field delimiter, choose the 5th, 9th fields
    
  2. Suppose you want to count how much lines of code you have written for a class, assume all source codes are under current directory. How do you count the number of lines in these source code (.cc, .cpp files)?

    Solution:

    cat *.cc *.cpp | wc -l
    
  3. Collect info. about a user. Write a shell script that takes a parameter (a user id), and then print out information as bellow:
    $ checkUser zhang
    zhang 2 login window; login shell is /bin/bash
    
    You can find the number times the user log in using information provided by command who. You can find the login shell used by the user by looking up file /etc/passwd. Hint: You need to figure out how to cut and compose the above output.

    Solution: The following is the CheckUser script:

    #!/bin/bash
    # CheckUser: check how many login windows a user has, and the shell the user uses
    
    if [ $# -ne 1 ]
    then
       echo "Usage: CheckUser ";
       exit 1;
    fi
    
    echo -n "$1 "
    echo -n `who | grep $1 | wc -l`
    echo -n " login window;"
    echo -n "login shell is "
    grep ^$1 /etc/passwd | cut -d ':' -f 7
    

What to submit: Please submit only your scripts. Just use command cat to merge all scripts into one file named lab3.txt, and then follow the instruction here to submit the lab3.txt file to me.