Wednesday, September 04, 2013

Bash & Sed - Display Unix Directory Structure as a Tree

Recently, I wanted to display a directory structure as a tree. However, I did not want to do programming in a high level language like Java or Ruby and parse the file structure etc. I wanted to use something like Bash.

So, here it goes:
find . -name '*' | sed -e 's/^/|-/' -e 's/[^-][^\/]*\//|   /g' -e 's/|   \([A-Za-z0-9_.]\)/|   +--\1/'

The output of this command will be like this:
|-.
|   +--a.out
|   +--arraydecl.c
|   +--coverage
|   |   +--a.out
|   |   +--cov.c
|   +--interposn
|   |   +--cmain.c
|   |   +--cparts.c
|   |   +--cparts.o
|   |   +--libwithso.so
|   |   +--main
|   |   +--main.c
|   |   +--main.o
|   |   +--withso.c
|   +--IPC
|   |   +--pipe.c
|   |   +--pipe_impl.c
|   +--KandR
|   |   +--detab.c
|   |   +--entab.c
|   |   +--Plan.txt
|   |   +--Plan.txt1
|   +--notbitand.c
|   +--ptrjoin.c
|   +--telewords.c

Now to explain this:
First use the find command:
find . -name '*'
This displays the name of all the files and directories under the current directory. Remember that using find would display the relative path from the current directory and does not follow symbolic links, by default.

Next, we need to split the sed command to understand it better.
-e 's/^/|-/'
This replaces the beginning of every line with the characters |-.

Next we have a more complex looking sed expression.
-e 's/[^-][^\/]*\//|   /g'
This replaces any element in the path to the file except the basename with three spaces.

Finally, the expression
-e 's/|   \([A-Za-z0-9_.]\)/|   +--\1/
This replaces the two spaces followed by an alphabet/number/underscore/dot with a "+--" followed by the same alphabet/number/underscore/dot. This gives the "+--" in the structure.

Wednesday, August 14, 2013

Arrays in Java are not equivalent to pointers+malloc in C

Based on some of the comments, I feel that I did not make my intentions clear with this post. So, here is the back story for this post:

While discussing something with my friend, he mentioned that 'Arrays in Java are like Pointers in C/C++ and that the best way for a C/C++ programmer to understand it is by taking the declaration as a declaration of a pointer and the allocation of memory as a malloc statement. In fact, the syntax is an exact replica with different symbols.'

My intention was to show that they were different and show that there are subtle but important differences in the syntax. Hope this makes the rest of the post clear.

In Java, is there a difference between declaring an array as given in line 1 and line 2:
int [] arr1, arr2;
int arr3[], arr4[];
To test this, I wrote the following program:
public class ArrayDecl
{
    public static void main(String[] args)
    {
        int[] arr1, arr2;
        int arr3[], arr4[];

        arr1 = new int[]{0, 1, 2, 3, 4};
        arr2 = new int[]{5, 6, 7, 8, 9};
        arr3 = new int[]{10, 11, 12, 13, 14};
        arr4 = new int[]{15, 16, 17, 18, 19};

        System.out.println(arr1.getClass());
        System.out.println(arr2.getClass());
        System.out.println(arr3.getClass());
        System.out.println(arr4.getClass());
    }
}
Turns out, there is no difference at all:
class [I
class [I
class [I
class [I 
But why did I think about this? It was because of the following program in C:
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int *ptr1, ptr2;
    int *ptr3, *ptr4;

    ptr1 = (int*)malloc(5 * sizeof (int));
    ptr2 = (int*)malloc(5 * sizeof (int));
    ptr3 = (int*)malloc(5 * sizeof (int));
    ptr4 = (int*)malloc(5 * sizeof (int));

    return 0;
}
This does not even compile:
$ gcc arraydecl.c 
arraydecl.c: In function ‘main’:
arraydecl.c:10: warning: assignment makes integer from pointer without a cast

So, the next time someone tells them that arrays in Java are similar to using pointers and malloc, I have a program that proves otherwise.

Cheers!
Karthick S.

Sunday, August 04, 2013

Sha-bang

Scripts are text files which need to be executed as if they are executables. One way to do this is to execute the script by giving the name of the interpreter and the name of the script like this:
$ inter path/to/file
Here, inter should be replaced by the name of the interpreter. These would be the corresponding executables for the languages like ruby, perl, bash etc. But what if you don't want to expose which language the script is in?
The first thing that you need to do is provide executable permissions on the script. Once this is done, the script becomes eligible to be executed. In other words, it can be executed as follows:

$ path/to/file

A simplistic explanation of what happens is as follows1:
  1. The current process is forked and a copy of itself is created.
  2. The corresponding interpreter is exec'ed to execute the script with the right interpreter.

But how will the OS1 know the correct interpreter to use?

Enter Sha-bang!

Sha-bang is the term for the symbol #!. This has to be given as the first two characters in the script and should be followed by the path to the interpreter in the same line. Some examples are as follows:

#!/bin/bash

#!/usr/bin/ruby

The OS1 takes whatever is given after #! till the end of the line and does the exec using that.

It is important to remember that the rest of the line is totally taken and used. This means that you can have spaces before pass arguments to the interpreter. For example, -d can be passed to perl like this:

#!/usr/bin/perl -d

This would interpret the script as a Perl script and execute it in debug mode.

There is one catch in this method. What if the interpreter is installed in a different location from what is specified in the sha-bang?

The env command is used for this purpose. Here is the usage:

#!/usr/bin/env perl -d

This command finds the interpreter found in the environment variable $PATH and executes the command with the interpreter found.

1Please note that I am not trying to be technically correct. I am trying to explain things simply. For a proper understanding of this, I would suggest a book like Richard Stevens' Unix Network Programming.

Saturday, August 03, 2013

Ruby - Find Phone Numbers corresponding to Words

Remember the 1800 numbers a part of which match up with a name like "1-800 walmart".

I wanted to write a program to convert a name into the corresponding number. I decided to try this first with Ruby. So, here is version 1:

#!/usr/bin/env ruby

print "Enter the name: "
name = gets.chomp
downname=name.downcase

number = downname.gsub(/[abc]/,"2").gsub(/[def]/,"3")
                 .gsub(/[ghi]/,"4").gsub(/[jkl]/,"5")
                 .gsub(/[mno]/,"6").gsub(/[pqrs]/,"7")
                 .gsub(/[tuv]/,"8").gsub(/[wxyz]/,"9")
                 .gsub(/ /,"0") 

puts "The Number corresponding #{name} is #{number}"

The output for this is as follows:
$ ./name_2_number.rb 
Enter the name: Karthick
The Number corresponding Karthick is 52784425
$ ./name_2_number.rb 
Enter the name: 1800-walmart
The Number corresponding 1800-walmart is 1800-9256278

To explain the program:
  1. Line number 1 is the sha-bang. I will put in a separate post to explain that one.
  2. Line number 3 prints the message "Enter the name: ".
  3. Line number 4 accepts the input using gets and removes the \n at the end of the accepted by using chomp. Though not essential in this case, it is generally a good practice to do a chomp of the inputs obtained from the user.
  4. Line number 5 converts the name into lower case character. For this purpose, I am using the downcase method of the String in ruby. Changing the text into lower case helps in simplifying the regular expression (regex, for short) in the next line.
  5. Line number 7 is the one that contains the core logic. In this line, I use the gsub method in String class to replace occurrences of each of the letters with the corresponding numbers. Note that the first argument of gsub is a regex, while the second argument is a string. The sequence of gsub calls replace all occurrences of alphabets and spaces with the corresponding numbers.
  6. Line number 13 prints the message containing the original name given and the corresponding number. I have used puts over here because I want a newline to be appended to the end of the message. This is the difference between a print and puts in ruby. Also, I had created the variable downname so that I can use name in this display.
As you can see, this program works fine but there are some basic issues with this script:

  1. Line number 7 is not efficient. Multiple calls to gsub is the culprit. 
  2. Line number 7 is long and unwieldy.
  3. The maintainer of this code must understand the regex. Based on what I have seen, a surprising number of software engineers are not good with regex.

So, here is version 2:
#!/usr/bin/env ruby

print "Enter the name: "
name = gets.chomp
downname=name.downcase

repl = {'a' => '2', 'b' => '2', 'c' => '2',
        'd' => '3', 'e' => '3', 'f' => '3',
        'g' => '4', 'h' => '4', 'i' => '4',
        'j' => '5', 'k' => '5', 'l' => '5',
        'm' => '6', 'n' => '6', 'o' => '6',
        'p' => '7', 'q' => '7', 'r' => '7', 's' => '7',
        't' => '8', 'u' => '8', 'v' => '8',
        'w' => '9', 'x' => '9', 'y' => '9', 'z' => '9',
        ' ' => '0'}

number = downname.gsub(/[a-z ]/) { |m| repl[m] }

puts "The Number corresponding #{name} is #{number}"

Two lines have changed from the original script. Let me explain these two lines alone:
  1. Line number 7 declares a hash of the mapping between the each letter and its corresponding number. Note that this includes a blank space as one of the characters and it is mapped to 0.
  2. Line number 9 calls one gsub and does the replacement of the values by using the hash declared in the line number 7.
This version works fine in ruby 1.8 and 1.9. However, ruby 1.9 has a shortcut for line number 9 in version 2. Here is the modified script (version 3):
#!/usr/bin/env ruby

# Works with ruby version greater than 1.9

print "Enter the name: "
name = gets.chomp
downname=name.downcase

repl = {'a' => '2', 'b' => '2', 'c' => '2',
        'd' => '3', 'e' => '3', 'f' => '3',
        'g' => '4', 'h' => '4', 'i' => '4',
        'j' => '5', 'k' => '5', 'l' => '5',
        'm' => '6', 'n' => '6', 'o' => '6',
        'p' => '7', 'q' => '7', 'r' => '7', 's' => '7',
        't' => '8', 'u' => '8', 'v' => '8',
        'w' => '9', 'x' => '9', 'y' => '9', 'z' => '9',
        ' ' => '0'}

number = downname.gsub(/[a-z]/, repl)

puts "The Number corresponding #{name} is #{number}"
The new version of gsub does the replacement. It does this by taking the regular expression as the first argument and the second argument as the hash.

Cheers!
Karthick S.

Sunday, September 30, 2012

Two ways to find the max of 5 numbers


It started as an experiment to find out which of the two ways (I could think of) how to find the max of 5 numbers. I wanted to make this language agnostic, in other words, no library functions or language specific features.

I used Java for this. Here is the program:

//LargestIn5.java
//Accept 5 numbers and find the largest of the 5 numbers

import java.util.Scanner;

public class LargestIn5
{
public static void main(String[] args)
{
Scanner input = new Scanner(System.in);

int a, b, c, d, e; // The five numbers
int nCount = 0;

System.out.print("Enter the first number: ");
a = input.nextInt();

System.out.print("Enter the second number: ");
b = input.nextInt();

System.out.print("Enter the third number: ");
c = input.nextInt();

System.out.print("Enter the fourth number: ");
d = input.nextInt();

System.out.print("Enter the fifth number: ");
e = input.nextInt();

if ((a > b) && (a > c) && (a > d) && (a > e))
{
System.out.println(a + " is the largest");
System.out.println("nCount = 4");
}
else if ((b > a) && (b > c) && (b > d) && (b > e))
{
System.out.println(b + " is the largest");
System.out.println("nCount = 4");
}
else if ((c > a) && (c > b) && (c > d) && (c > e))
{
System.out.println(c + " is the largest");
System.out.println("nCount = 4");
}
else if ((d > a) && (d > b) && (d > c) && (d > e))
{
System.out.println(d + " is the largest");
System.out.println("nCount = 4");
}
else if ((e > a) && (e > b) && (e > c) && (e > d))
{
System.out.println(e + " is the largest");
System.out.println("nCount = 4");
}
else
{
System.out.println("There is something wrong...");
System.out.println("nCount = 0");
}

if (a > b)
{
nCount++;
if (a > c)
{ //a > b & c
nCount++;
if (a > d)
{ // a > b & c & d
nCount++;
if (a > e)
{ // a > b & c & d & e
nCount++;
System.out.println(a + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > a > b & c & d
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
else
{ // d > a > b & c
nCount++;
if (d > e)
{ // d > e & a > b & c
nCount++;
System.out.println(d + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > d > a > b & c
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
}
else
{ // c > a > b
nCount++;
if (c > d)
{ // c > d & a > b
nCount++;
if (c > e)
{ // c > d & e & a > b
nCount++;
System.out.println(c + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > c > d & a > b
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
else
{ // d > c > a > b
nCount++;
if (d > e)
{ // d > e & c > a > b
nCount++;
System.out.println(d + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > d > c > a > b
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
}
}
else
{ // b > a
nCount++;
if (b > c)
{ // b > a & c
nCount++;
if (b > d)
{ // b > a & c & d
nCount++;
if (b > e)
{ // b > a & c & d & e
nCount++;
System.out.println(b + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > b > a & c & d
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
else
{ // d > b > c
nCount++;
if (d > a)
{ // d > a & b > c
nCount++;
if (d > e)
{ // d > e & a & b > c
nCount++;
System.out.println(d + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > d > a & b > c
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
else
{ // a > d > b > c
nCount++;
if (a > e)
{ // a > e & d > b > c
nCount++;
System.out.println(a + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > a > d > b > c
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
}
}
else
{ // c > b > a
nCount++;
if (c > d)
{ // c > d & b > a
nCount++;
if (c > e)
{
nCount++;
System.out.println(c + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
else
{ // d > c > b > a
nCount++;
if (d > e)
{ // d > e & c > b > a
nCount++;
System.out.println(d + " is the largest");
System.out.println("nCount = " + nCount);
}
else
{ // e > d > c > b > a
nCount++;
System.out.println(e + " is the largest");
System.out.println("nCount = " + nCount);
}
}
}
}
}
}


The variable nCount was to find out how many conditions are being executed. Method 2 seems more complex. But is it better in terms of number of conditions executed to find the answer.

Output from few runs:

$ java LargestIn5
Enter the first number: 1
Enter the second number: 2
Enter the third number: 3
Enter the fourth number: 4
Enter the fifth number: 5
5 is the largest
nCount = 4 
5 is the largest
nCount = 4 
$ java LargestIn5
Enter the first number: 2
Enter the second number: 3
Enter the third number: 4
Enter the fourth number: 5
Enter the fifth number: 1
5 is the largest
nCount = 4 
5 is the largest
nCount = 4 
$ java LargestIn5
Enter the first number: 3
Enter the second number: 4
Enter the third number: 5
Enter the fourth number: 1
Enter the fifth number: 2
5 is the largest
nCount = 4 
5 is the largest
nCount = 4 
$ java LargestIn5
Enter the first number: 4
Enter the second number: 5
Enter the third number: 1
Enter the fourth number: 2
Enter the fifth number: 3
5 is the largest
nCount = 4 
5 is the largest
nCount = 4 
$ java LargestIn5
Enter the first number: 5
Enter the second number: 1
Enter the third number: 2
Enter the fourth number: 3
Enter the fifth number: 4
5 is the largest
nCount = 4 
5 is the largest
nCount = 4 


Sometimes, the more complex looking code is not necessarily the more optimal one.

Saturday, March 31, 2012

Handling Return Values

Recently, I was discussing Defensive Programming with a relative newcomer to the industry. I mentioned that the concept was not dependent on languages and more logical in nature. In other words, though a language does not have specific language facilities for Defensive Programming.

To give an example, I used the following:

Consider a function is_data_found() which returns one of the following values:

#define SUCCESS 0
#define FAILURE 1
#define NO_DATA_FOUND 2

int fun1()
{
    //search for data in DB
    //If data not found return NO_DATA_FOUND
    //else if operation failed, then return FAILURE
    //Return SUCCESS
}

Now, one of the common (wrong) ways for a caller fun2() to handle the return values from the call to fun1() is:

void fun2()
{
    //Call fun1
    //if return value is not SUCCESS, continue process
    //else, throw exception
}

Based on code review feedback or testing, they might realize that NO_DATA_FOUND might be a valid scenario. In other words, fun2() might need to do create_if_not_found operation.

To accommodate this, they might change the caller to:

void fun3()
{
    //Call fun1
    //if return value is NO_DATA_FOUND, then create row
    //else if return value is SUCCESS, then continue process
    //else, throw exception
}

This would be fine in the current scenario. But what if a new return value is added to fun1(), say WARNING_POTENTIAL_STALE_DATA:

#define SUCCESS 0
#define FAILURE 1
#define NO_DATA_FOUND 2
#define WARNING_POTENTIAL_STALE_DATA 3

int fun1()
{
    //search for data in DB
    //If data not found return NO_DATA_FOUND
    //else if operation failed but previously fetched data available, then return WARNING_POTENTIAL_STALE_DATA
    //else if operation failed and no previously fetched data available, then return FAILURE
    //Return SUCCESS
}

Now, fun2() would not provide the complete functionality since it does not know about the new functionality. However, no one would notice this, since fun2() does not have the right code for handling such scenarios.

A better way to write the caller is:

void fun4()
{
    //Call fun1
    //if return value is NO_DATA_FOUND, then create row
    //else if return value is SUCCESS, then continue process
    //else if return value is FAILURE, then throw DATA_FETCH_FAILED_EXCEPTION
    //else, return UNKNOWN_RETURN_VALUE_EXCEPTION
}

Now, fun4() will throw an exception when WARNING_POTENTIAL_STALE_DATA is thrown.

Cheers!
Karthick S.

Sunday, December 25, 2011

Installing rvm in mac os x lion using bash

They are giving MacBook Pro as our laptop in our office. I wanted to install ruby. For this purpose, I wanted to install rvm. Based on instructions I found in the rvm site, I gave the command to download the installer and do the installation:

$ bash < <(curl -s https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer)
-sh: syntax error near unexpected token `<'

Since this can be split into two commands, I did the same:

$ curl -s https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer > rvm-installer
$ bash rvm-installer

When I did this, it threw out a lot of output. Hidden in this output was a line:

Installation of RVM to /Users/[user_name]/.rvm/ is complete.

Once I saw this, I did not bother looking further to see if things have failed or if there are any warnings. Turns out it did not work fine. There was an error in the output:

is_a_function: command not found

When I went ahead and added this path in .bash_profile, the error surfaced again. Google did not help either. Finally, with the help of an answer for my question at StackOverflow, I cracked it:
$ curl -s https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer | bash -s stable
Now to figure out how to install ruby. :)

Cheers!
Karthick S.

Saturday, October 22, 2011

Special options while printing in Python 3.x

One more post about printing values in Python 3.x and how it differs with Python 2.x.

To print the end of line using Python 2.x, all you need to do is add a comma to the end of the print list.

print "Hello ",
print "World"

The statements above print "Hello World" on the screen.

But with Python 3.x, it is different

print ("Hello ", end='')
print ("World")

This prints "Hello World" on the screen.

Another way to print the same would be:

print ("Hello", end=" ")
print ("World")

In other words, whatever is sent as a value for end is appended to the end of whatever is printed. You can even specify multi-character strings as the value for end.

In addition to this, there is another argument sep, which specifies the separator between the items to be printed.

print ("Hello", "World", sep = " ")

This would help in creating output as a CSV or a TSV files for example:

print ("FirstName", "LastName", sep=",")

Cheers!
Karthick S.

Tuesday, October 18, 2011

Differences in print between Python 2.x and Python 3.x

Recently, I started learning Python and installed the latest version available online - Python 3.2. I did this in spite of the fact that the book I was using Practical Programming: An Introduction to Computer Science using Python.

After completing a few exercises using the Python command line, I decided to save my script and execute them from command line. That is when things started to fail. Python would not allow me to print the output. It kept showing the variable to be printed and said SyntaxError: invalid syntax. To illustrate in command line, it was something like:

>>> i=5
>>> print i
File "", line 1
print i
^
SyntaxError: invalid syntax

Google could not help me with this. Finally, I went with intuition and solved it. I needed to enclose the arguments to print using parenthesis. But the book did not mention it.

This is because the enclosing parenthesis was not a requirement in Python 2.x, but is required in Python 3.x.

It kind of pissed me off (and I am sure I am not the first Python newbie to feel this way), that the language designers did not give a reasonably clear error message for this one.

Cheers!
Karthick S.

Wednesday, December 08, 2010

Moving from rspec 2.0.0.beta.18 to rspec 2.0.0.beta.22

This post is primarily for anyone who is using rspec 2.0.0.beta.18 as specified in the Ruby on Rails Tutorial and find that they hit upon the following problem:

Failure/Error: Unable to find C to read failed line
undefined method `get' for # do
include RSpec::Rails::RequestExampleGroup

Whenever they run the tests:
$ rspec spec/requests/layout_links_spec.rb
To fix this, I installed rspec 2.0.0.beta.22 by using the command:
$ gem install rspec -v 2.0.0.beta.22 --pre
Then I changed the version of rspec specified in my Gemfile as:
$ git diff Gemfile
WARNING: terminal is not fully functional
diff --git a/Gemfile b/Gemfile
index 28af226..865b9de 100644
--- a/Gemfile
+++ b/Gemfile
@@ -30,10 +30,11 @@ gem 'sqlite3-ruby', :require => 'sqlite3'
# end

group :development do
- gem 'rspec-rails', '2.0.0.beta.18'
+ gem 'rspec-rails', '2.0.0.beta.22'
end

group :test do
- gem 'rspec', '2.0.0.beta.18'
+ gem 'rspec', '2.0.0.beta.22'
end

warning: LF will be replaced by CRLF in Gemfile.
The file will have its original line endings in your working directory.
Then I installed this version of rspec using the command:
$ bundle install

Now I got a new error:
1) LayoutLinks should have a Home page at '/'
Failure/Error: response.should have_selector('title', :content => "Home")
undefined method `has_selector?' for #
# ./spec/requests/layout_links_spec.rb:6:in `block (2 levels) in '
Google suggested the solution in this page: Add "webrat" the Gemfile, do a bundle install and that should do it.

In my case, I had to uncomment the entry in the Gemfile.
$ git diff Gemfile
WARNING: terminal is not fully functional
diff --git a/Gemfile b/Gemfile
index 28af226..ea7d729 100644
--- a/Gemfile
+++ b/Gemfile
@@ -25,15 +25,16 @@ gem 'sqlite3-ruby', :require => 'sqlite3'
# Bundle gems for the local environment. Make sure to
# put test-only gems in this group so their generators
# and rake tasks are available in development mode:
-# group :development, :test do
-# gem 'webrat'
-# end
+group :development, :test do
+ gem 'webrat'
+end

group :development do
- gem 'rspec-rails', '2.0.0.beta.18'
+ gem 'rspec-rails', '2.0.0.beta.22'
+# gem 'annotate-models', '1.0.4'
end

group :test do
- gem 'rspec', '2.0.0.beta.18'
+ gem 'rspec', '2.0.0.beta.22'
end

warning: LF will be replaced by CRLF in Gemfile.
The file will have its original line endings in your working directory.
That fixed it!

Cheers!
Karthick S.