I wonder...

After reading some posts in my haskell category of blogs I read I found myself wondering the following:


A space leak is different than a memory leak, so it’s not a bug in the garbage collector.

Essentially, a space leak is when your algorithm uses asymptotically (or, perhaps just significantly) more space than you think it should, usually due to the sharing involved in lazy evaluation preventing the garbage collector from claiming intermediate values that might otherwise be discarded.

The classic example is probably the powerset function:

pset []     = [[]]
pset (x:xs) = pset xs ++ map (x:) (pset xs)

pset'leak [] = [[]]
pset'leak (x:xs) = pxs ++ map (x:) pxs
 where pxs = pset'leak xs

So, if you do something like:

length (pset [1..n])

It will run in constant space. However:

length (pset'leak [1..n])

Runs in O(2^n) space (I think).

The reason for this is that in pset'leak, pxs is shared between the first and second halves of the list, whereas in pset, they are distinct lists. So, in pset, once length gets past an element in the first half of the list, the garbage collector can reclaim that portion of the list. However, in pset’leak, it cannot claim bits of pxs until we’ve used them in the second half of the list, because under lazy evaluation, shared values are supposed to be memoized on their first use until all references to them go out of scope. This means that we have a memoized list of size 2^(n-1) or so sitting around in memory at some point.

Of course, it’s not always that easy to see why space leaks occur.

On a side note, the above consequences of lazy evaluation mean that it is possible to write something more like a traditional memory leak in GHC. Namely, if you have a top-level value that potentially takes a lot of memory to store, and you force it, it will not be collected, because it never goes out of scope. So, if you have an infinite list of primes at the top level, and you ask for the trillionth prime, the list of the first trillion primes will sit around in memory for the rest of the life of the program. That’s much more like a traditional memory leak (but it still isn’t something “broken” in the garbage collector, per se).


Thanks, dolio! Just the kind of explanation I was looking for. Sorry for being a bit lazy and posting this rather than go looking myself ;-)

Leave a comment