Beginner's Mind

Monday, May 26, 2008

Thoughts about Software Transactional Memory

I've been reading and talking about Software Transactional Memory (STM) a bit. Commentators are conflicted as to whether it can perform well, with some claiming its performance degrades terribly under heavy load.

I was wondering, can we provide the same tools that STM provides at the language level with a smarter underlying implementation? The performance complaint centers around the fact that STM is implemented as an optimistic system, where each transaction attempts to perform its work, and detects conflicts with others and retries. With a high level of contention this can mean a lot of wasted effort; it can also make the character of the system hard to understand, with surprise jumps (down) in performance as load approaches certain levels.

Pessimistic locking systems, on the other hand, have predictable performance. A process has to lock a series of locks, does the work, releases them. There's no uncertainty or (much) wasted effort -- just time wasted acquiring locks. (Of course some CPU cycles are spent actually in the locking code, but this shouldn't be significant). The time spent acquiring the locks isn't really opportunity cost if the locking system is well-designed. We would expect with a good system that even if some threads are blocked on locks, overall the system is efficiently used. At least we assume this insofar as the algorithm that the program embodies actually is parallizable in the first place.

So, I wonder, can any regular language provide an atomic { } block? Let's say I provide this feature in a language. I need to generate locks around the blocks to provide correct semantics; I need to optimize so that it runs efficiently.

(We can't do this kind of reasoning in a SQL database environment because database servers don't know in advance what kind of queries they will be asked to evaluate. However, a compiler does know all about the code it will be evaluating, so we would expect the optimal locking algorithm for programs to be different and better than a database server's approach.)

A simple algorithm can provide correct semantics: every atomic block locks a single global lock. This is trivially correct but not very efficient. Can we do better?

Let's say that our processes are represented as {A, B, C} where the block operates on some shared variables A, B, C. The particular operations or order is unimportant since we're guaranteeing that process has a transactional view of all variables. (Well, we could in theory optimize blocks down for the programmer, by detecting that (say) one atomic block actually could be split up into two. Let's ignore that for now or just assume a separate and correct algorithm for minimizing the data shared in atomic blocks.)

We have some set of processes like {A, B} {B, C} {C, D} {D, A}. We humans know that, for example, both process {A, B} and {C, D} could execute in parallel with no conflict -- but they don't, because a process can only execute its atomic block if it's holding the global lock. We want our STM system to be at least smart enough to run parallel tasks in parallel, so this isn't good enough.

Let's approach from another direction. Rather than having a single global lock, let's give every variable its own lock. Thus the process {D, A} must get the lock for D and A before proceeding.

This is efficient, because we allow {A, B} and {C, D} to run in parallel; they will each lock their respective locks and go on their way.

What about guaranteeing progress? Let's say we run all the processes in parallel. Perhaps {D, A} runs far enough to get the lock for D; next the process {C, D} runs, gets C, blocks on D; next {B, C} runs and blocks on C; and {A, B} runs and blocks on B. Let me describe each process and the locks it currently holds:

  • {A, B}: A (waiting on B)

  • {B, C}: B (waiting on C)

  • {C, D}: C (waiting on D)

  • {D, A}: D (waiting on A)

It's easy to see that we have a full deadlock -- all the processes are waiting on one another. Of course, it is not a particularly likely circumstance, but we as professional software engineers don't build our systems upon wishful thinking -- right?! We need a different way to do things.

Really, the problem as described above is the Dining Philosophers problem, and only a small adjustment is needed to prevent deadlock. The trick is in that process {D, A}. What we need is a locking convention; as long as there is a global total order on the order in which processes acquire locks, we are ensured that there is no deadlock. This formulation of the problem naturally lends itself to one: we can lock all locks in alphabetical order. This will work even if processes require many locks, like {A, B, C}, {B, C, D}, etc.

It sounds to me like this solution would be viable, but probably not efficient. One could imagine locking in order based on the memory locations of the objects (assuming they're constant) or imposing some other ordering scheme on object locks.

This would work if the amount of shared state is small (a couple of objects accessed per atomic), but what if the number is large? Could we use this for a game, if we need to (say) iterate through an array of all objects in the world and update their physics? This system breaks down if blocks need to access large numbers of objects.

From here we could imagine improving our algorithm, aggregating locks together to reduce their number but preserve the same atomic semantics. For example, we might say that we only need to lock an array to access objects contained in the array, rather than locking each individual object. (This would require assuming that the objects cannot be referenced any other way.) Where common locks exist, we could imagine coalescing them into a single lock. For example, if many processes access exactly two variables, we could imagine the algorithm determining that they both will be protected by the same lock. Another alternative is to have a single global lock which a process locks, then may obtain at once any number of other locks at once quickly. There are a lot of possible optimizations we could research to make this locking system faster.

A quick poll of some PhD friends suggests that there is probably not much research in this area. There's a lot to explore! We could even imagine a locking system that dynamically profiles itself and re-organizes the locking structure to maximize throughput. If the environment is low-level enough, the scheduler could even be aware of the performance characteristics and could schedule accordingly (schedulers already do know which threads are blocked on locks, e.g.).

When the programming language I am designing gets a bit further along I will attempt to implement a simple atomic block construct. One could imagine a simple naive global lock implementation for development followed by later optimization as necessary; it might be the case that a naive implementation is sufficient for many simple cases (like basic GUIs). If it is a useful way to program, hopefully we can discover solid implementations of the construct.

Is STM the solution? STM is much easier when one is working in a pure functional programming language with little or no shared state in many places. However, given that procedural languages are used widely in the industry, I hope other researchers are as interested in these problems as I am!

Overall, STM feels a lot to me like Garbage Collection (GC). The goal of it isn't to make programs run more efficiently; the goal is to make programming easier and thereby use programmer time more efficiently. Working with locks is difficult and it's easy to make mistakes; if STM can alleviate this with only a 20% performance decrease, it would be a huge win. I say it's like GC because it's a similar tradeoff: we'll give up memory efficiency and even runtime performance, and even yet suffer possible random freezes as the GC kicks in, but in allowing these things we make the much larger gain of freeing programmers from the concern of memory management. Managing memory and managing locks: both meticulous tasks well suited for a computer.

By adopting GC we've freed ourselves from the tiring troubleshooting of obscure memory issues, and become more productive. I wonder whether freeing ourselves from similar concurrent bugs would bring about a similar but even greater productivity increase. If we gain a powerful Software Transactional Memory model, not only will developers free themselves from troubleshooting concurrency bugs, but we may find developers generally being more willing to write concurrent software.

Think about it! Writing concurrent software is hard -- why bother if it will only provide a minor speedup? Commentators often note that software like Microsoft Word won't speed up on multiple processors. It's not because this software couldn't be sped up by multiple processors, it's that the gain in performance is not worth the extra development cost (which is likely to be small for most desktop software). But if the added development cost is low, suddenly concurrent implementations become more viable.

That's where I see things heading -- mainstream programming languages make concurrency easy, not just in the Big Server platform (Erlang) but in the everyday desktop platform. Once there is first-class language support, concurrent software will become a lot more available.

However, developers will still need to develop concurrent algorithms, which would take extra effort. And there's still the question of whether compilers that optimized as I've described above could ever be implemented. Donald Knuth, in a recent interview with Andrew Binstock, expressed doubt about whether the highly concurrent hardware will be well-used:

To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Itanium" approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.

How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I’m wrong.

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years. ...

The machine I use today has dual processors. I get to use them both only when I’m running two independent jobs at the same time; that’s nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn’t be any better off, considering the kind of work I do—even though I’m using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream.


Cheers to all the researchers out there working hard to make this pipe dream reality.

[Edit: I have recently found an interesting paper relevant to the topic: Advanced Contention Management for Dynamic Software Transactional Memory. I'll write a follow-up if it's particularly relevant.]

Sunday, May 25, 2008

Trying out Twitter

Rather than set up some kind of "tumblelog" I thought I'd give Twitter another shot.

If I remember correctly, Twitter came out a couple of years ago, when I was working for Scott Ruthfield at Amazon.com (or at least launched a public beta). I wasn't enthused at the time, but I also hung out on AIM and chatted all the time. Now that I'm more, I suppose, professional, perhaps I will get some functionality out of a site that allows frequent chat-like communication without the distractional burden of being on IM.

I haven't frequented IM in a while; somewhere around the middle or late years of college I realized that it's more of a net negative than positive for getting things done. That's not always my focus, of course (if it were I would have a pretty bleak life!), but having IM open is an invitation for distraction and just a way to make excuses to myself to procrastinate what I would otherwise be doing.

Mingling work and chat seems like a less efficient way to work and a less fun way to chat. Twitter is like IM but cheating; I can send you IMs but you can't directly send them back (I would have to check some page to see them ;-)

Anyway, check out my Twitter page!

Saturday, May 24, 2008

Google Calculator

[Note: This post uses ASCIIMathML.js, a script that makes it easy to typeset mathematical formulas within HTML using MathML. If your browser doesn't support MathML, you will probably just see the text representation of formulas.]

Some friends and I were discussing the feasibility of a mass driver for travel to the Moon (for fun). A mass driver, or catapult, can theoretically launch objects into space by propelling them at such a high speed they escape the atmosphere; no rocket necessary, just a high launch velocity. I thought I would describe how Google Calculator helped me reason about the problem easily (especially all the unit conversions).

Google Calculator (as I am calling it) is a feature built into the regular search on Google.com. If you include a mathematical expression as your search term, Google will evaluate it for you. Here's a good reference. It can do basic arithmetic, but its most helpful feature is its unit conversions. The calculator will convert basic units as necessary to evaluate the arithmetic, but can also be instructed to convert the answer into some specific units. For example: "60 miles per hour to kph": 60 miles per hour = 96.56064 kph.

To catapult into orbit, the launch velocity would have to be greater than escape velocity, of course, or the Earth's gravity would pull you back down. Earth's escape velocity is about 12 km/s, or roughly 26 800 miles per hour. That's pretty fast!

A mass driver would need a track of some kind: imagine a sort of vertical roller coaster that releases you when it comes to the end of its track. It would have to accelerate you pretty strongly! How strongly? Let's say we want to get to a bit faster of a speed than `12 " km/s"`, so we can settle on `13 " km/s"`.

Well, let's say we have a `5 " km"` track. How much acceleration is necessary to get us to 13 km/s? We need some physics to help us solve this one, as classic formula from AP Physics:

`v^2 = v0^2 + 2a(x-x0)`

In our case, `v0 = 0`, `x-x0 = 5 " km"`, and `v = 13 " km/s"`. So we have `v^2 = 2*a*(5 " km")`. Solving this, `(13 " km/s")^2/(2*5" km") = a`.

We can solve this with Google! Enter "(13 km/s)^2/(2*5 km)=". Google returns: ((13 (km / s))^2) / (2 * 5 km) = 16 900 m / s^2.

We can also ask it to express the result in terms of "g": (13 km/s)^2/(2*5 km) in g= produces the answer ((13 (km / s))^2) / (2 * 5 km) = 1 723.3204 g. Ouch! We couldn't survive that kind of acceleration.

How much energy would it take to accelerate us? The physics formula `E = 1/2mv^2` describes the energy of objects with velocity. Let's say the object we're launching is about a ton, the size of a small car. After all, it doesn't need an engine since we're catapulting it into space! Google "1/2*(1 ton)*(13 km/s)^2=" gives us 76 657 110 530 joules.

That's a lot of energy -- about 21,000 kilowatt hours! To help visualize the result, let's convert it to some more tangible quantity. How much water could this energy vaporize? That is, if we have some quantity of water, how much water could this energy bring from 0 degrees C to 100 degrees C?

Wikipedia tells us that water has the highest known specific heat of any compound after ammonia, and a very high heat of vaporization. The specific heat is 4.186 joule/(gram degree C). Thus, to change a gram of water from 0 C to 100 C we need 418.6 joules per gram.

Earlier we found the energy is 76,657,110,530 joules. With that and our 418 joules per gram number, we find that the energy can heat up about 183000 kilograms. We know via references that water's density is 1000 kilograms per cubic meter, so finally we can solve for the volume of water that can be raised from 0 C to 100 C by this amount of energy! About 180 cubic meters or 48,000 gallons.

Unit conversions are an almost immeasurable help when doing tricky physics calculations. Google Calculator's abilities surpass even the respectable TI-89 calculator (which, if you need a calculator, is an excellent purchase).

Saturday, May 17, 2008

Shoshin

I'm christening my blog Beginner's Mind, a term from Zen Buddhism. Beginner's Mind, as described by Wikipedia, is "an attitude of openness, eagerness, and lack of preconceptions when studying a subject, even when studying at an advanced level".

Also known as Shoshin or Zen Mind, the concept encourages all practitioners of any art to consider themselves students, no matter their skill level. This reflects the wisdom that a person can always learn more; an expert should not become smug in his knowledge, less he sabotage his own efforts to continue his mastery.

Software development is an exciting field because it is always changing. Tools invented only a decade ago, such as the programming languages Ruby and Python, revolutionize sectors of the field today. Development feeds back upon itself exponentially: when a programmer releases a tool, it becomes easier and faster for the rest to make more tools, and so they do. We are just now approaching what may be the golden age of programming, where techniques, tools, and computers are widespread and the industry is mature enough to be taken seriously.

Beginner's Mind is essential in the software industry precisely because of this high rate of change. Not only are the tools changing, but also since software development is such a young field, best practices for the same tools are being discovered and discussed continually. A good idea ten years ago might be a terrible anachronism today.

As software developers aspiring to master our craft, we must always knead and fold over our knowledge, ruminate over it, masticate it; occasionally, we must defenestrate it.

Writing today about software development is frequently shortsighted and partisan; we have made a religion of our tools. It is common to see camps of people strongly in favor of some particular tool write tirades in defense of it, or attack another tool. A Buddhist master would say that this is attachment, a mental block one erects to protect his view of the world and avoid cognitive dissonance.

The goal of my writing in this blog will be to treat software development (and other topics worth discussing) with Beginner's Mind as I navigate its nooks and crannies in my daily life as a software developer. More broadly, I aim to take a neutral look at the industry, its tools, and its people.

(While trying to come up with a clever conclusion for the end of this post, I came up with "I hope to offer crumbs of insight from the cake of enlightenment". While it is an evocative metaphor, and I definitely don't plan to write this blog with a deadly serious tone, I'll try to spare you, gentle reader, from the wrath of wearisome wit.)