Adding some strictness

I’ve finally found some time to solve a few more exercises on exercim.io. One exercise, called Robot Simulation, I solved in a straight forward way and promptly got the feedback that my choice of data type could lead to space leaks. Since it’s a dimension of Haskell programming that I rarely (read “never”) have had to think about I thought I’d take the chance to play a bit.

A simplified solution

I don’t want to give away the solution to the exercise so I’m using some code of a similar shape.

There’s nothing complicated about the code, but the memory usage is rather ridiculous:

   1,132,746,080 bytes allocated in the heap
   1,149,823,448 bytes copied during GC
     346,747,064 bytes maximum residency (11 sample(s))
       3,817,608 bytes maximum slop
             600 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2148 colls,     0 par    0.58s    0.58s     0.0003s    0.0006s
  Gen  1        11 colls,     0 par    0.56s    0.56s     0.0510s    0.2278s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.25s  (  0.25s elapsed)
  GC      time    1.15s  (  1.14s elapsed)
  EXIT    time    0.02s  (  0.02s elapsed)
  Total   time    1.42s  (  1.41s elapsed)

  %GC     time      80.9%  (81.0% elapsed)

  Alloc rate    4,593,597,274 bytes per MUT second

  Productivity  19.0% of total user, 19.1% of total elapsed

Surely it ought to be possible to add 5 million numbers in less than 600MB, right?

The reason, AFAIU is that tuples are lazy, which means a lot of thunk building when doing the additions.

Using a stricter data type

The maybe easiest way to add some strictness is to add a few exclamations marks in the data type. Modifying the data type brings a few other changes though, so here’s first a version without any strictness annotations so I have something to compare with later on1:

This solution uses about half as much memory:

     766,417,952 bytes allocated in the heap
     639,064,600 bytes copied during GC
     155,866,064 bytes maximum residency (11 sample(s))
       1,914,016 bytes maximum slop
             345 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1460 colls,     0 par    0.32s    0.32s     0.0002s    0.0006s
  Gen  1        11 colls,     0 par    0.33s    0.33s     0.0303s    0.1071s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.15s  (  0.15s elapsed)
  GC      time    0.65s  (  0.65s elapsed)
  EXIT    time    0.01s  (  0.01s elapsed)
  Total   time    0.81s  (  0.81s elapsed)

  %GC     time      80.2%  (80.2% elapsed)

  Alloc rate    5,212,863,895 bytes per MUT second

  Productivity  19.8% of total user, 19.8% of total elapsed

Halving the memory usage is rather nice, but adding some strictness ought to make it better still. The code with added exclamation marks:

The runtime behaviour of this code is remarkably more frugal with memory:

     480,054,816 bytes allocated in the heap
          57,968 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       918 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.09s  (  0.09s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.09s  (  0.09s elapsed)

  %GC     time       2.4%  (2.4% elapsed)

  Alloc rate    5,630,441,644 bytes per MUT second

  Productivity  97.3% of total user, 98.2% of total elapsed

Indeed, 1MB is a bit more like it!

Strictness with the original type

However impressive the results when using a stricter data type, it might still be preferable to keep the original, somewhat simpler, type. It is possible to do just that by placing some explicit strictness in well chosen places. Since my suspicion is that it’s the addition in step that causes the space leak we’ll rewrite it:

Note that it’s not enough to use seq; a bit more force is needed, which is what deepseq offers. It delivers sure enough:

     920,052,632 bytes allocated in the heap
         198,736 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.29s  (  0.29s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.30s  (  0.30s elapsed)

  %GC     time       1.4%  (1.4% elapsed)

  Alloc rate    3,124,303,717 bytes per MUT second

  Productivity  98.5% of total user, 98.9% of total elapsed

A further slight change is possible: implement NFData for P itself. That way it’s not necessary to pull out and force evaluation of the innards of P. Using the default implementation of rnf did not cut it though, since it uses seq it just isn’t forceful enough:

The result is close to the previous:

     920,052,632 bytes allocated in the heap
         198,736 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1774 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.33s  (  0.33s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.33s  (  0.33s elapsed)

  %GC     time       1.2%  (1.2% elapsed)

  Alloc rate    2,817,175,186 bytes per MUT second

  Productivity  98.7% of total user, 99.0% of total elapsed

It looks like it’s slightly slower though, which might be caused by the double call to deepseq in this version.

Conclusion

From this little example it’s clear that a little strictness in well-chosen places can go a long way in reducing memory footprint. The various solutions also show that the results can vary quite a bit depending on the strategy for introducing strictness. In this particular case I think the stricter data type (see Using a stricter data type above) is the best choice, it’s both faster and uses less memory than any of the others.


  1. I’m using RecordWildCards to simplify the code a bit, which possibly might influence evaluation as well. I’m simply not sure.

⟸ FP101x completed Simple state machines in Haskell ⟹
Leave a comment