December 13, 2005

Something to laugh about

If you are in need of some laughter, I highly recommend reading the MSF for Agile Software Development Process Guidance. It is comical!
Some of my favourites:


Posted by Damian at 11:05 AM | Comments (1)

December 07, 2005

Cool Threads

The first servers based on Sun's Cool Threads Technology have been released. For those interested, some ex-colleagues (and extremely clever and great guys) have written up some useful information:


Sun is giving companies the opportunity to try one of these servers for 60 days. Wish they would extend that to "people who just want to play"

Posted by Damian at 04:49 PM | Comments (0)

November 30, 2005

Niagra

New chip from Sun sounds quite good Niagra.
8 cores, 4 threads per core. Wouldn't mind a couple of them for my mac!

Posted by Damian at 04:26 PM | Comments (3)

November 16, 2005

'Tis a great day

Australia made the 2006 world cup and England lost to Pakistan in the cricket! Marvelous!

Posted by Damian at 02:44 PM | Comments (1)

February 27, 2005

Grid Computing

For the past couple of months I have been working in an Investment Bank helping them to evaluate and choose a Grid computing solution for some of their strategic applications. It has been quite an interesting couple of months, we have seen grid products of all shapes and sizes.

So, what is a Grid? That is an excellent question! Unfortunately there is no one size fits all definition for what a Grid is, it has a slightly different meaning depending on who you talk to. In its most basic form, a Grid is just a batch scheduler with a bunch of compute resources. Not very exciting I know!

Types of Grids

As the previous paragraph suggested there are a few different types of grids. Here i define those that I know about. Note some of these approaches may be used together.

CPU scavengers

Most people are familiar with the SETI@home project, it works by stealing CPU cycles from Computers attached to the internet. Well it is not exactly stealing! You download and install a screensaver on your PC, which when activated, i.e., your PC is idle, will connect to a server and download some data to analyze. So whilst your computer is idle it will steal CPU cyles from it, but as soon as you de-avctivate the screen saver the analysis stops. This is a very specific example, but the same approach can be used in a more general form. For example: You install a very small "agent" program on corporate desktops, that, when the cpu is idle, connect to a scheduler and asks for some work to do. The work performed by the agent could potentially be any arbitraty program. Of course there are environmental specifics such as Operating Systems etc to take into consideration.

Batch Scheduling

Quite a lot of the grid products in the market are little more than batch schedulers. This type of Grid usually consists of some central scheduling/management server and a bunch of agents on dedicated compute resources. The programs executed on the agents are typically any executable program.

The batch scheduler grids I have looked at so far have a significant overhead associated with getting your work scheduled to run on the grid. This usually means that you need to have fairly long running tasks, and quite a few of them, before you actually see any benefit of using the grid. Of course this overhead leads to under utilization of the resources on the grid, which is a little alarming as most corporates want a grid to make better use of these resources.

Service Oriented

Ah yes, SOA had to pop up somewhere! This is the way the grid standards are heading.

Suprisingly, at least to me, most of the grid products I have come across do not consider this approach. I have come across 2 products that have some sort of Service model, one of them is quite good where as the other one, well, it is the other one!

There still is some scheduling component as part of these grids, scheduling is core to any grid solution, but now you are invoking a method on a Java/.NET/C++ object instance rather than asking an executable to be run.Of course there are nice ways and not so nice ways to do this. To me the nice way means I can invoke arbitrary methods that take rich objects as parameters. The not so nice way means that everything has to be a string or byte array.

Data Grid

You'll notice that I haven't mentioned data once until now, but surely data access/transfer will quickly become an issue on any grid. The most common way of getting data about on grids seems to be shuffling files about. Shuffling files may be fine for slow changing data, but what about per request/task data? Also, I know that I really don't want to have a file system that is being hit from potentially thousands of compute nodes; bottle neck anyone?

One of the alternatives we have looked at is to turn the problem on its head and think about a grid from a data movement point of view. After all a scheduled task can really just be a few bits of data, i.e, invoke methodA on class B with paramaters c,d,e.. This lead us to look at JavaSpaces as an approach for building a Grid.

JavaSpaces is so simple, yet so powerful. Four very basic methods, read, write, take, and notify. With these four methods and a simple Master/Worker pattern, you can build a very basic grid in a couple of hours. Guess what? Unsuprisingly, JavaSpaces is pretty darn good at moving data about. It enables you to schedule tasks as well as move data about all with the same API. The main problem we have had with JavaSpaces exists on the resource allocation and management/admin side. The products, both commercial and OSS, currently have management only at the Space Entry level, we need more than this for managing potentially 100s of thousands of running jobs/tasks.

My ideal grid

To me a grid is a completely dynamic compute utility that runs in an heterogenous environment. Heterogenous in terms of Hardware, Operating System, and programming language. The grid should at least support a SOA approach, and if needed, for legacy reasons, batch scheduling. It should also be clever about how data gets into and out of the grid, sorry but shuffling files about just doesn't cut it, i want a transactional cache. There should not be a central scheduler/manager, but rather a set of dynamically allocated resources for this, i.e., the grid uses the grid to run itself. If there has to be a central scheduler, please let it have more than one back up and please don't tell me I need big iron to run it.

There is a bunch of other stuff I haven't even touched on, i.e., dynamic allocation and SLAs etc. Next time!

Summary

Grids come in many different shapes and sizes, there is no one size fits all solution. In fact, most corporates probably want/need a mixture of the grid types mentioned above. Though i have been mostly underwhelmed by the products in the market place, this is nonetheless quite an interesting area to be working in.

Posted by Damian at 03:16 PM | Comments (0)

February 26, 2005

4th time lucky

Will it last for longer than a week? Doubt it!

Posted by Damian at 06:56 PM