Playing with prefixes

I just realised that it’s been over a month since I posted something. All I can do is blame life ;-) With a daughter of 8 months and a recent switch of positions I just haven’t found time to play around really. Hopefully that’ll change now that the situation at work is “calming down” a bit.

Anyway, here’s a bit of code I played with today. A friend described a problem inside a tool related to reporting of free and used memory. Apparently, depending on the location in the code, memory could be stored in integers in bytes or in kilobytes (kB) or in kibibytes (Ki) or in mebibytes (Mi) etc. In every place the prefix would be ‘recorded’ in the variable name, resulting in a situation where a change of prefix used in one place would force a change of variable names in callers/callees. A bit of a mess. I thought I’d try to explore what Haskell’s type system could produce ;-)

First approach

My first thought was to create a data type for the prefixes:

data Size = Bytes Int
    | KB Int Int
    | KiBi Int Int
    deriving (Show, Eq)

Both for KB and KiBi I record the number of full KBs (Kis) and the remaining bytes. Then I thought that it’d be easiest to do all the calculations in bytes, which required a conversion function:

toBytes (KB i r) = Bytes (10^3 * i + r)
toBytes (KiBi i r) = Bytes $ 2^10 * i + r
toBytes b = b

After that it’s possible to make Size an instance of Num (incomplete, but the rest of the functions in Num are straight forward):

instance Num Size where
    (Bytes b1) + (Bytes b2) = Bytes $ b1 + b2
    s1  + s2 = (toBytes s1) + (toBytes s2)
    (Bytes b1) - (Bytes b2) = Bytes $ b1 - b2
    s1  - s2 = (toBytes s1) - (toBytes s2)

Unfortunately the result will always be in Bytes so a few other conversion functions are necessary:

toKB (Bytes i) = uncurry KB $ divMod i (10^3)
toKB s = toKB $ toBytes s

toKiBi (Bytes i) = uncurry KiBi $ divMod i (2^10)
toKiBi s = toKiBi $ toBytes s

Now it’s easy to create a type for bookeeping of memory and making it an instance of Num as well:

data MemUsage = Free Size | Used Size
    deriving (Eq, Show)

instance Num MemUsage where
    (Free i) + (Free j) = Free $ i + j
    (Used i) + (Used j) = Used $ i + j
    (Free i) + (Used j) = Free $ i - j
    (Used i) + (Free j) = Used $ i - j

    (Free i) - (Free j) = Free $ i - j
    (Used i) - (Used j) = Used $ i - j
    (Free i) - (Used j) = Free $ i + j
    (Used i) - (Free j) = Used $ i + j

On some level this ‘algebra’ for memory usage makes sense, though refinements are certainly possible e.g. . I suspect there are some interesting choices to be made regarding the remaining functions in Num, e.g. should negate turn free memory into used memory?

I find this solution somewhat lacking in the extensibility department—as the commonly available amount of memory grows (the code above is stuck in the 80’s it seems) changes have to be made to Size. The same goes for changes to support non-standard prefixes, probably not a common problem but It would be nice to have a solution that is “independently extensible”.

Second approach

Here my idea was to create one type per prefix, use a separate type for the calculations and use a type class for the conversions. Quite straightforward when put like that, though it took me a while to come up with it ;-)

Here is the internal representation used during calculations:

data IB = IB Int
    deriving (Eq, Show)

Simple enough. Then the class for conversions to and from IB:

class CIB a where
    toIB :: a -> IB
    fromIB :: IB -> a

No surprises there either. Now I want to add and subtract, but due to the definition of Num (it requires all arguments to be of the same type, e.g. the type of addition is (+) :: a -> a -> a) I’ll need two new functions for that (at least I don’t know a way around this):

(.+.) :: (CIB a, CIB b, CIB c) => a -> b -> c
i .+. j = fromIB $ (toIB i) `addIB` (toIB j)
        addIB (IB i1) (IB i2) = IB (i1 + i2)

(.-.) :: (CIB a, CIB b, CIB c) => a -> b -> c
i .-. j = fromIB $ (toIB i) `subIB` (toIB j)
        subIB (IB i1) (IB i2) = IB (i1 - i2)

One interesting observation is that due to the type of .+. and .-. it is possible, and often required, to specify the type of the result, somewhat similar to HSH. Now the stage is set for the prefix types to be defined, starting with plain bytes:

data Bytes = Bytes Int
    deriving (Eq, Show)

instance CIB Bytes where
    toIB (Bytes i) = IB i
    fromIB (IB i) = Bytes i

Similarly for kilobytes and kibibytes:

data KBytes = KBytes Int Int
    deriving (Eq, Show)

instance CIB KBytes where
    toIB (KBytes i j) = IB $ i * 10^3 + j
    fromIB (IB i) = uncurry KBytes $ divMod i (10^3)

data KiBytes = KiBytes Int Int
    deriving (Eq, Show)

instance CIB KiBytes where
    toIB (KiBytes i j) = IB $ i * 2^10 + j
    fromIB (IB i) = uncurry KiBytes $ divMod i (2^10)

The data type for memory usage wrap the memory usage:

data MemUsage a = Free a | Used a
    deriving (Eq, Show)

Again I can’t make MemUsage a an instance of Num, instead there are two new operators:

(~+~) :: (CIB a, CIB b, CIB c) => MemUsage a -> MemUsage b -> MemUsage c
(Free i) ~+~ (Free j) = Free $ i .+. j
(Used i) ~+~ (Used j) = Used $ i .+. j
(Free i) ~+~ (Used j) = Free $ i .-. j
(Used i) ~+~ (Free j) = Used $ i .-. j

(~-~) :: (CIB a, CIB b, CIB c) => MemUsage a -> MemUsage b -> MemUsage c
(Free i) ~-~ (Free j) = Free $ i .-. j
(Used i) ~-~ (Used j) = Used $ i .-. j
(Free i) ~-~ (Used j) = Free $ i .+. j
(Used i) ~-~ (Free j) = Used $ i .+. j

End thoughts

There is one thing I like about the first approach and one thing I don’t like. That both the types are instances of Num is something I like since I am interested in adding and subtracting memory, though arguably there are functions in Num that don’t make much sense for memory (e.g. multiplication and negation). The thing I don’t like is the need for explicit conversion functions (toKB and friends).

There are also one thing that I like about the second approach and one I don’t like. I would have liked to use the standard functions in Num, introducing new (type-specific) operators is something that I’m never particularly keen on. Though I suppose that implementing separate operators does emphasize that memory isn’t quite like regular integers. I really do like using Haskell’s own syntax to do the conversion, it’s more readable than having to call conversion functions.

That’s it! Comments and suggestions for improvements are always welcome of course.

Edward Kmett

Your (.+.) is going to lead to a lot of explicit type signatures because what is the type of the intermediate result generated in ((a .+. b) .+. c) ?

Two possible approaches spring to mind.

One thought is to take the second approach, but supply each prefix an instance of Num and rely on fromIntegral whenever you mix values of different units. This has the benefit of simplicity and recovering Num sugar.

Another thought is to define a meet semi-lattice of measures such that if one measure subsumes the other. i.e. I can define everything with a K that I can with an M (assuming you use Integer rather than Int). then that measure is lower than the other in the lattice. Then take the meet of the two values you supply to (.+./.-.) in the lattice.

Since KiB and K can only be added by lowering to your unprefixed representation IB. Then you get V shaped meet semi-lattice:

  GiB -> MiB -> KiB -> IB <- K <- M <- G

and you can add a chain dangling down from your IB for smaller units, Deci, Centi, Milli, Micro, Nano, etc to obtain a Y shaped figure.

And you can define the semi-lattice at the type level with functional dependencies so that you get forward type inference to figure out the type of the lvalue of an assignment at least.

For large type-level (semi)lattices like this, I tend to just let template haskell define all of the instances for me, because you’ll wind up with a number of instances equal to the square of the number of prefixes you define unless you exploit the structure of the lattice.

Leave a comment