June 4, 2006

Why Binary Searches Can Break?

Filed under: commons

Binary searches and mergesorts can break, in fact, most of them will break because they follow the same method of calculating the mid-point for the search. It is usually calculated as average of low and high values. Here is what Josh Bloch says (the example uses Java):

The bug is in this line:

6: int mid =(low + high) / 2;

In Programming Pearls Bentley says that the analogous line “sets m to the average of l and u, truncated down to the nearest integer.” On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

Very interesting, whatever the size of the integer, there will always be a condition when the sum of low and high will overflow it. Josh also gives some suggestions:

So what’s the best way to fix the bug? Here’s one way:

6: int mid = low + ((high - low) / 2);

Probably faster, and arguably as clear is:

6: int mid = (low + high) >>> 1;

In C and C++ (where you don’t have the >>> operator), you can do this:

6: mid = ((unsigned) (low + high)) >> 1;

A classic example where the code design can do the right or the wrong. It is important to realise that the code works in all the cases. While all is sometimes impossible to consider and we can use a subset, there is no guarantee that the subset will remain the same in future. That is why, in my opinion, as part of the maintenance of the application, the code should be verified against the new computing environment. Today, the software is re-released for change in business requirements or business environment, but it is also important to consider the changing computing environment.

Tags:

May 28, 2006

Libraries, Toolkits And Frameworks

Filed under: commons

We come across a mesh of nomenclature regarding libraries everyday - Application Programming Interface (API), libraries, toolkits, frameworks, and probably some more. So, what is the difference?

Some of these terms do overlap. API is the interface provided by the library, toolkit or the framework. This interface, usually, works like a contract between the creator and the user. Library is a generic term for a group of functionality provided. These libraries together can form a toolkit or a framework. Gang of Four (GoF) explain the difference between toolkits and frameworks in Design Patterns - Elements of Reusable Object-Oriented Software:

A toolkit is a set of related and reusable classes designed to provide useful, general purpose functionality. Toolkits don’t impose a particular design on your application; they just provide the functionality that can help your application do its job. They let you as an implementer avoid recoding common functionality. Toolkits emphasize code reuse. They are the object-oriented equivalent of subroutine libraries.

A framework is a set of cooperating classes that makeup a reusable designed for a specific class of software. The framework dictates the architecture of your application. It will define the overall structure, its partitioning into classes and objects, the key responsibilities, thereof, how the classes and objects, the key responsibilities thereof, how the classes and objects collaborate, and the thread of control. A framework predefines these design parameters so that you, the application designer/implementer, can concentrate on th specifics of your application. The framework captures the design decisions that are common to its application domain. Frameworks thus emphasized design reuse over code reuse, though a framework will usually include concrete subclasses you can put to work immediately.

You commit to a functionality when you use a toolkit, whereas you commit to the design when you use a framework. That is why it is of prime importance to choose the right option for the job. A wrong choice can ruin the entire effort.

This is also important for the creators. Since a framework provides design reuse, it is imperative that it is flexible. Otherwise, it might limit the programmer in more ways than enable him/her.

Tags: , ,

May 26, 2006

API Reference Heaven

Filed under: commons

Found a very useful site for API reference - gotAPI (via Curt’s Comments). It is simply great to get all the searchable documentation for scripts and programming languages at one place.

The unique thing about this is that the documentation is obtained directly from the source and is not duplicated. It covers a multitude of languages, right from markup languages to scripts to high-level language frameworks, and there is more to come. Give it to the syntax seekers!

Tags: , , , ,

May 8, 2006

Endianness

Filed under: C, commons

Just like we have to read from left to right or right to left depending upon what language we are reading, programs should know how are integers stored before reading a binary. This is Endianness. Just like the languages it just a matter of preference. The two schemes are big endian and little endian.

In the big endian version, the most significant byte (MSB) is stored in the lowest memory address, whereas in the little endian version the least significant bit (LSB) is stored in the lowest memory address. Here is an example:

The 32-bit integer 2D441B36 has to be stored at memory address 400 - 2D is stored in 400, 44 in 401 (1 byte offset) and so on in the big endian scheme. In case of the little endian scheme, the bytes are stored in the order 36 1B 44 and 2D.

Unfortunately, both are being actively implemented and programmers need to know the endianness of a system. Here is a program in C to differentiate between big endian and little endian systems:

int isBigEndian()
{
    /* assign 00000001 */
    int sample = 1;
	
    /* convert to equivalent character array */
    char *sample_array = (char *) &sample;
	
    /* if little endian, the lowest memory address will contain
       the least significant byte, i.e., 01 */
    if (sample_array[0] == 1)
    {
        return 0;
    }
    else
    {
        return 1;
    }
}

Endianness becomes a critical issue for portable code when it is targetted for multiple architectures.

In addition to the computer architectures, endianness becomes imperative to consider in network protocols like TCP. If the byte ordering is not considered, it leads to the popular NUXI problem. Here is a function to swap between the two schemes by using bit-wise operations:

int swapByteOrder(int sample)
{
    int sample_reverse = ((sample & 0xff000000) >> 24) | /* MSB */
                         ((sample & 0x00ff0000) >> 8) |
                         ((sample & 0x0000ff00) << 8) |
                         ((sample & 0x000000ff) << 24);  /* LSB */
	
    return sample_reverse;
}

More reading

Tags: , , , , , .

« Newer Posts
Creative Commons License Copyright © Abhijit Nadgouda. Hosted on Blogsome, powered by Wordpress