Wednesday, September 04, 2013

Bash & Sed - Display Unix Directory Structure as a Tree

Recently, I wanted to display a directory structure as a tree. However, I did not want to do programming in a high level language like Java or Ruby and parse the file structure etc. I wanted to use something like Bash.

So, here it goes:
find . -name '*' | sed -e 's/^/|-/' -e 's/[^-][^\/]*\//|   /g' -e 's/|   \([A-Za-z0-9_.]\)/|   +--\1/'

The output of this command will be like this:
|   +--a.out
|   +--arraydecl.c
|   +--coverage
|   |   +--a.out
|   |   +--cov.c
|   +--interposn
|   |   +--cmain.c
|   |   +--cparts.c
|   |   +--cparts.o
|   |
|   |   +--main
|   |   +--main.c
|   |   +--main.o
|   |   +--withso.c
|   +--IPC
|   |   +--pipe.c
|   |   +--pipe_impl.c
|   +--KandR
|   |   +--detab.c
|   |   +--entab.c
|   |   +--Plan.txt
|   |   +--Plan.txt1
|   +--notbitand.c
|   +--ptrjoin.c
|   +--telewords.c

Now to explain this:
First use the find command:
find . -name '*'
This displays the name of all the files and directories under the current directory. Remember that using find would display the relative path from the current directory and does not follow symbolic links, by default.

Next, we need to split the sed command to understand it better.
-e 's/^/|-/'
This replaces the beginning of every line with the characters |-.

Next we have a more complex looking sed expression.
-e 's/[^-][^\/]*\//|   /g'
This replaces any element in the path to the file except the basename with three spaces.

Finally, the expression
-e 's/|   \([A-Za-z0-9_.]\)/|   +--\1/
This replaces the two spaces followed by an alphabet/number/underscore/dot with a "+--" followed by the same alphabet/number/underscore/dot. This gives the "+--" in the structure.

Sunday, August 04, 2013


Scripts are text files which need to be executed as if they are executables. One way to do this is to execute the script by giving the name of the interpreter and the name of the script like this:
$ inter path/to/file
Here, inter should be replaced by the name of the interpreter. These would be the corresponding executables for the languages like ruby, perl, bash etc. But what if you don't want to expose which language the script is in?
The first thing that you need to do is provide executable permissions on the script. Once this is done, the script becomes eligible to be executed. In other words, it can be executed as follows:

$ path/to/file

A simplistic explanation of what happens is as follows1:
  1. The current process is forked and a copy of itself is created.
  2. The corresponding interpreter is exec'ed to execute the script with the right interpreter.

But how will the OS1 know the correct interpreter to use?

Enter Sha-bang!

Sha-bang is the term for the symbol #!. This has to be given as the first two characters in the script and should be followed by the path to the interpreter in the same line. Some examples are as follows:



The OS1 takes whatever is given after #! till the end of the line and does the exec using that.

It is important to remember that the rest of the line is totally taken and used. This means that you can have spaces before pass arguments to the interpreter. For example, -d can be passed to perl like this:

#!/usr/bin/perl -d

This would interpret the script as a Perl script and execute it in debug mode.

There is one catch in this method. What if the interpreter is installed in a different location from what is specified in the sha-bang?

The env command is used for this purpose. Here is the usage:

#!/usr/bin/env perl -d

This command finds the interpreter found in the environment variable $PATH and executes the command with the interpreter found.

1Please note that I am not trying to be technically correct. I am trying to explain things simply. For a proper understanding of this, I would suggest a book like Richard Stevens' Unix Network Programming.

Saturday, August 03, 2013

Ruby - Find Phone Numbers corresponding to Words

Remember the 1800 numbers a part of which match up with a name like "1-800 walmart".

I wanted to write a program to convert a name into the corresponding number. I decided to try this first with Ruby. So, here is version 1:

#!/usr/bin/env ruby

print "Enter the name: "
name = gets.chomp

number = downname.gsub(/[abc]/,"2").gsub(/[def]/,"3")
                 .gsub(/ /,"0") 

puts "The Number corresponding #{name} is #{number}"

The output for this is as follows:
$ ./name_2_number.rb 
Enter the name: Karthick
The Number corresponding Karthick is 52784425
$ ./name_2_number.rb 
Enter the name: 1800-walmart
The Number corresponding 1800-walmart is 1800-9256278

To explain the program:
  1. Line number 1 is the sha-bang. I will put in a separate post to explain that one.
  2. Line number 3 prints the message "Enter the name: ".
  3. Line number 4 accepts the input using gets and removes the \n at the end of the accepted by using chomp. Though not essential in this case, it is generally a good practice to do a chomp of the inputs obtained from the user.
  4. Line number 5 converts the name into lower case character. For this purpose, I am using the downcase method of the String in ruby. Changing the text into lower case helps in simplifying the regular expression (regex, for short) in the next line.
  5. Line number 7 is the one that contains the core logic. In this line, I use the gsub method in String class to replace occurrences of each of the letters with the corresponding numbers. Note that the first argument of gsub is a regex, while the second argument is a string. The sequence of gsub calls replace all occurrences of alphabets and spaces with the corresponding numbers.
  6. Line number 13 prints the message containing the original name given and the corresponding number. I have used puts over here because I want a newline to be appended to the end of the message. This is the difference between a print and puts in ruby. Also, I had created the variable downname so that I can use name in this display.
As you can see, this program works fine but there are some basic issues with this script:

  1. Line number 7 is not efficient. Multiple calls to gsub is the culprit. 
  2. Line number 7 is long and unwieldy.
  3. The maintainer of this code must understand the regex. Based on what I have seen, a surprising number of software engineers are not good with regex.

So, here is version 2:
#!/usr/bin/env ruby

print "Enter the name: "
name = gets.chomp

repl = {'a' => '2', 'b' => '2', 'c' => '2',
        'd' => '3', 'e' => '3', 'f' => '3',
        'g' => '4', 'h' => '4', 'i' => '4',
        'j' => '5', 'k' => '5', 'l' => '5',
        'm' => '6', 'n' => '6', 'o' => '6',
        'p' => '7', 'q' => '7', 'r' => '7', 's' => '7',
        't' => '8', 'u' => '8', 'v' => '8',
        'w' => '9', 'x' => '9', 'y' => '9', 'z' => '9',
        ' ' => '0'}

number = downname.gsub(/[a-z ]/) { |m| repl[m] }

puts "The Number corresponding #{name} is #{number}"

Two lines have changed from the original script. Let me explain these two lines alone:
  1. Line number 7 declares a hash of the mapping between the each letter and its corresponding number. Note that this includes a blank space as one of the characters and it is mapped to 0.
  2. Line number 9 calls one gsub and does the replacement of the values by using the hash declared in the line number 7.
This version works fine in ruby 1.8 and 1.9. However, ruby 1.9 has a shortcut for line number 9 in version 2. Here is the modified script (version 3):
#!/usr/bin/env ruby

# Works with ruby version greater than 1.9

print "Enter the name: "
name = gets.chomp

repl = {'a' => '2', 'b' => '2', 'c' => '2',
        'd' => '3', 'e' => '3', 'f' => '3',
        'g' => '4', 'h' => '4', 'i' => '4',
        'j' => '5', 'k' => '5', 'l' => '5',
        'm' => '6', 'n' => '6', 'o' => '6',
        'p' => '7', 'q' => '7', 'r' => '7', 's' => '7',
        't' => '8', 'u' => '8', 'v' => '8',
        'w' => '9', 'x' => '9', 'y' => '9', 'z' => '9',
        ' ' => '0'}

number = downname.gsub(/[a-z]/, repl)

puts "The Number corresponding #{name} is #{number}"
The new version of gsub does the replacement. It does this by taking the regular expression as the first argument and the second argument as the hash.

Karthick S.