Saturday, June 5, 2010

Trends in programming and Amdal's Law

in my previous blog, I explained trends I saw that will affect programmers. The main takeaway is that we are moving into a world of concurrent programming. If we do concurrent programming with memory shared by several threads of execution, life is going to be miserable for programmers. Fortunately, message passing via actors is a viable path to concurrency that is much easier to program.
In any discussion of concurrency, the iron law is Amdahl's Law. If you have a process in which some fraction, s, of the steps are sequential and the rest, 1-s, can be done in parallel, then the most you can gain from concurrency over N processors is to reduce the time to execute from 1 to s + (1-s)/N. So basically, the time to execute a program is limited by s as much as by N. The hardware determines N and the software determines s. So what determines s? Basically, s is the fraction of the algorithm that you spend queueing stuff. If I go to the grocery store, I can shop in parallel with everyone else. It takes a little longer to shop in a busy store, but only a little longer. But when it is time to check out, I have to queue up to pay.
What is exciting is that by using techniques like 'no shared state' and Actors for communication can greatly reduce s, so we can write software that efficiently tanks advantage of concurrent hardware (i.e., big N). If we look at a standard sort of web applications, the bottlenecks are things like writing to the database. The first thing to do is to replace a single server with a cluster of identical servers, so that a given client can access any one of the servers and get the job done.
But there turns out to be a better plan - sharding. In sharding, you still have a cluster of N servers, but you don't have each server having identical data. If I have a large database, I would shard it by creating N servers, but I would shard the data so that any given server has only a fraction of the data. This is much like partitioning data on a RAID array, and the more sophisticated designs take into account 'k safety". Reed-Solomon erasure codes exhibit this behavior, as do Tornado codes. This means that the data is divided in to n+k partitions so that any set of n partitions can reconstruct the entire data set. This means you can loose k partitions before you loose any data. This idea is central to VoltDB, but they implement it by having k+1 copies of each block. It seems that a more efficient design could be built using erasure codes (See exercise 11.9 in David MacKay's Information Theory, Inference and Learning Algorithms.)


No comments:

Post a Comment