I’ve finally found some time to solve a few more exercises on exercim.io. One exercise, called Robot Simulation, I solved in a straight forward way and promptly got the feedback that my choice of data type could lead to space leaks. Since it’s a dimension of Haskell programming that I rarely (read “never”) have had to think about I thought I’d take the chance to play a bit.

### A simplified solution

I don’t want to give away the solution to the exercise so I’m using some code of a similar shape.

``````import Data.List

newtype P = P (Int, Int)
deriving (Eq, Show)

step :: P -> Char -> P
step (P c) 'A' = P (c `add` (0, 1))
step _ _ = undefined

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) \$ replicate 5000000 'A'

main :: IO ()
main = print p``````

There’s nothing complicated about the code, but the memory usage is rather ridiculous:

``````   1,132,746,080 bytes allocated in the heap
1,149,823,448 bytes copied during GC
346,747,064 bytes maximum residency (11 sample(s))
3,817,608 bytes maximum slop
600 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0      2148 colls,     0 par    0.58s    0.58s     0.0003s    0.0006s
Gen  1        11 colls,     0 par    0.56s    0.56s     0.0510s    0.2278s

INIT    time    0.00s  (  0.00s elapsed)
MUT     time    0.25s  (  0.25s elapsed)
GC      time    1.15s  (  1.14s elapsed)
EXIT    time    0.02s  (  0.02s elapsed)
Total   time    1.42s  (  1.41s elapsed)

%GC     time      80.9%  (81.0% elapsed)

Alloc rate    4,593,597,274 bytes per MUT second

Productivity  19.0% of total user, 19.1% of total elapsed``````

Surely it ought to be possible to add 5 million numbers in less than 600MB, right?

The reason, AFAIU is that tuples are lazy, which means a lot of thunk building when doing the additions.

### Using a stricter data type

The maybe easiest way to add some strictness is to add a few exclamations marks in the data type. Modifying the data type brings a few other changes though, so here’s first a version without any strictness annotations so I have something to compare with later on1:

``````{-# LANGUAGE RecordWildCards #-}
import Data.List

data P = P { pX :: Int, pY :: Int }
deriving (Eq, Show)

p :: P
p = foldl' step (P 0 0) \$ replicate 5000000 'A'

step :: P -> Char -> P
step q@(P {..}) 'A' = q { pX = pX +1 }
step _ _ = undefined

main :: IO ()
main = print p``````

This solution uses about half as much memory:

``````     766,417,952 bytes allocated in the heap
639,064,600 bytes copied during GC
155,866,064 bytes maximum residency (11 sample(s))
1,914,016 bytes maximum slop
345 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0      1460 colls,     0 par    0.32s    0.32s     0.0002s    0.0006s
Gen  1        11 colls,     0 par    0.33s    0.33s     0.0303s    0.1071s

INIT    time    0.00s  (  0.00s elapsed)
MUT     time    0.15s  (  0.15s elapsed)
GC      time    0.65s  (  0.65s elapsed)
EXIT    time    0.01s  (  0.01s elapsed)
Total   time    0.81s  (  0.81s elapsed)

%GC     time      80.2%  (80.2% elapsed)

Alloc rate    5,212,863,895 bytes per MUT second

Productivity  19.8% of total user, 19.8% of total elapsed``````

Halving the memory usage is rather nice, but adding some strictness ought to make it better still. The code with added exclamation marks:

``````{-# LANGUAGE RecordWildCards #-}
import Data.List

data P = P { pX :: !Int, pY :: !Int }
deriving (Eq, Show)

p :: P
p = foldl' step (P 0 0) \$ replicate 5000000 'A'

step :: P -> Char -> P
step q@(P {..}) 'A' = q { pX = pX +1 }
step _ _ = undefined

main :: IO ()
main = print p``````

The runtime behaviour of this code is remarkably more frugal with memory:

``````     480,054,816 bytes allocated in the heap
57,968 bytes copied during GC
44,312 bytes maximum residency (2 sample(s))
21,224 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0       918 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
Gen  1         2 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s

INIT    time    0.00s  (  0.00s elapsed)
MUT     time    0.09s  (  0.09s elapsed)
GC      time    0.00s  (  0.00s elapsed)
EXIT    time    0.00s  (  0.00s elapsed)
Total   time    0.09s  (  0.09s elapsed)

%GC     time       2.4%  (2.4% elapsed)

Alloc rate    5,630,441,644 bytes per MUT second

Productivity  97.3% of total user, 98.2% of total elapsed``````

Indeed, 1MB is a bit more like it!

### Strictness with the original type

However impressive the results when using a stricter data type, it might still be preferable to keep the original, somewhat simpler, type. It is possible to do just that by placing some explicit strictness in well chosen places. Since my suspicion is that it’s the addition in `step` that causes the space leak we’ll rewrite it:

``````import Data.List

import Control.DeepSeq

newtype P = P (Int, Int)
deriving (Eq, Show)

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) \$ replicate 5000000 'A'

step :: P -> Char -> P
step (P c) 'A' = let np = c `add` (0, 1) in np `deepseq` P np
step _ _ = undefined

main :: IO ()
main = print p``````

Note that it’s not enough to use `seq`; a bit more force is needed, which is what `deepseq` offers. It delivers sure enough:

``````     920,052,632 bytes allocated in the heap
198,736 bytes copied during GC
44,312 bytes maximum residency (2 sample(s))
21,224 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

INIT    time    0.00s  (  0.00s elapsed)
MUT     time    0.29s  (  0.29s elapsed)
GC      time    0.00s  (  0.00s elapsed)
EXIT    time    0.00s  (  0.00s elapsed)
Total   time    0.30s  (  0.30s elapsed)

%GC     time       1.4%  (1.4% elapsed)

Alloc rate    3,124,303,717 bytes per MUT second

Productivity  98.5% of total user, 98.9% of total elapsed``````

A further slight change is possible: implement `NFData` for `P` itself. That way it’s not necessary to pull out and force evaluation of the innards of `P`. Using the default implementation of `rnf` did not cut it though, since it uses `seq` it just isn’t forceful enough:

``````import Data.List

import Control.DeepSeq

newtype P = P (Int, Int)
deriving (Eq, Show)

instance NFData P where
rnf (P q) = deepseq q ()

add :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
add (x0, y0) (x1, y1) = (x0 + x1, y0 + y1)

p :: P
p = foldl' step (P (0, 0)) \$ replicate 5000000 'A'

step :: P -> Char -> P
step (P c) 'A' = let np = P (c `add` (0, 1)) in force np
step _ _ = undefined

main :: IO ()
main = print p``````

The result is close to the previous:

``````     920,052,632 bytes allocated in the heap
198,736 bytes copied during GC
44,312 bytes maximum residency (2 sample(s))
21,224 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

INIT    time    0.00s  (  0.00s elapsed)
MUT     time    0.33s  (  0.33s elapsed)
GC      time    0.00s  (  0.00s elapsed)
EXIT    time    0.00s  (  0.00s elapsed)
Total   time    0.33s  (  0.33s elapsed)

%GC     time       1.2%  (1.2% elapsed)

Alloc rate    2,817,175,186 bytes per MUT second

Productivity  98.7% of total user, 99.0% of total elapsed``````

It looks like it’s slightly slower though, which might be caused by the double call to `deepseq` in this version.

### Conclusion

From this little example it’s clear that a little strictness in well-chosen places can go a long way in reducing memory footprint. The various solutions also show that the results can vary quite a bit depending on the strategy for introducing strictness. In this particular case I think the stricter data type (see Using a stricter data type above) is the best choice, it’s both faster and uses less memory than any of the others.

1. I’m using `RecordWildCards` to simplify the code a bit, which possibly might influence evaluation as well. I’m simply not sure.↩︎