Posts tagged "haskell":

08 May 2022

A little Haskell: epoch timestamp

A need of getting the current UNIX time is something that comes up every now and then. Just this week I needed it in order to add a k8s liveness probe1.

While it's often rather straight forward to get the Unix time as an integer in other languages2, in Haskell there's a bit of type tetris involved.

  1. getPOSIXTime gives me a POSIXTime, which is an alias for NominalDiffTime.
  2. NominalDiffTime implements RealFrac and can thus be converted to anything implementing Integral (I wanted it as Int64).
  3. NominalDiffTime also implements Num, so if the timestamp needs better precision than seconds it's easy to do (I needed milliseconds).

The combination of the above is something like

truncate <$> getPOSIXTime

In my case the full function of writing the timestamp to a file looks like this

writeTimestampFile :: MonadIO m => Path Abs File -> m ()
writeTimestampFile afn = liftIO $ do
    truncate @_ @Int64 . (* 1000) <$> getPOSIXTime >>= writeFile (fromAbsFile afn) . show



Over the last few days I've looked into k8s probes. Since we're using Istio TCP probes are of very limited use, and as the service in question doesn't offer an HTTP API I decided to use a liveness command that checks that the contents of a file is a sufficiently recent epoch timestamp.


Rust's Chrono package has Utc.timestamp(t). Python has time.time(). Golang has Time.Unix.

Tags: haskell k8s

Simple nix flake for Haskell development

Recently I've moved over to using flakes in my Haskell development projects. It took me a little while to arrive at a pattern a flake for Haskell development that I like. I'm hoping sharing it might help others when doing the same change

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs";
    flake-utils.url = "github:numtide/flake-utils";

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      with nixpkgs.legacyPackages.${system};
        t = lib.trivial;
        hl = haskell.lib;

        name = "project-name";

        project = devTools: # [1]
          let addBuildTools = (t.flip hl.addBuildTools) devTools;
          in haskellPackages.developPackage {
            root = lib.sourceFilesBySuffices ./. [ ".cabal" ".hs" ];
            name = name;
            returnShellEnv = !(devTools == [ ]); # [2]

            modifier = (t.flip t.pipe) [

      in {
        packages.pkg = project [ ]; # [3]

        defaultPackage = self.packages.${system}.pkg;

        devShell = project (with haskellPackages; [ # [4]

The main issue I ran into is getting a development shell out of haskellPackages.developPackage, it requires returnShellEnv to be true. Something that isn't too easy to find out. This means that the only solution I've found to getting a development shell is to have separate expressions for building and getting a shell. In the above flake the build expression, [3], passes an empty list of development tools, the argument devTools at [1], while the development shell expression, [4], passes in a list of tools needed for development only. The decision of whether the expression is for building or for a development shell, [2], then looks at the list of development tools passed in.

Fallback of actions

In a tool I'm writing I want to load a file that may reside on the local disk, but if it isn't there I want to fetch it from the web. Basically it's very similar to having a cache and dealing with a miss, except in my case I don't populate the cache.

Let me first define the functions to play with

loadFromDisk :: String -> IO (Either String Int)
loadFromDisk k@"bad key" = do
    putStrLn $ "local: " <> k
    pure $ Left $ "no such local key: " <> k
loadFromDisk k = do
    putStrLn $ "local: " <> k
    pure $ Right $ length k

loadFromWeb :: String -> IO (Either String Int)
loadFromWeb k@"bad key" = do
    putStrLn $ "web: " <> k
    pure $ Left $ "no such remote key: " <> k
loadFromWeb k = do
    putStrLn $ "web: " <> k
    pure $ Right $ length k

Discarded solution: using the Alternative of IO directly

It's fairly easy to get the desired behaviour but Alternative of IO is based on exceptions which doesn't strike me as a good idea unless one is using IO directly. That is fine in a smallish application, but in my case it makes sense to use tagless style (or ReaderT pattern) so I'll skip exploring this option completely.

First attempt: lifting into the Alternative of Either e

There's an instance of Alternative for Either e in version 0.5 of transformers. It's deprecated and it's gone in newer versions of the library as one really should use Except or ExceptT instead. Even if I don't think it's where I want to end up, it's not an altogether bad place to start.

Now let's define a function using liftA2 (<|>) to make it easy to see what the behaviour is

fallBack ::
    Applicative m =>
    m (Either String res) ->
    m (Either String res) ->
    m (Either String res)
fallBack = liftA2 (<|>)
λ> loadFromDisk "bad key" `fallBack` loadFromWeb "good key"
local: bad key
web: good key
Right 8

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "bad key"
local: bad key
web: bad key
Left "no such remote key: bad key"

The first example shows that it falls back to loading form the web, and the second one shows that it's only the last failure that survives. The latter part, that only the last failure survives, isn't ideal but I think I can live with that. If I were interested in collecting all failures I would reach for Validation from validation-selective (there's one in validation that should work too).

So far so good, but the next example shows a behaviour I don't want

λ> loadFromDisk "good key" `fallBack` loadFromWeb "good key"
local: good key
web: good key
Right 8

or to make it even more explicit

λ> loadFromDisk "good key" `fallBack` undefined
local: good key
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at <interactive>:451:36 in interactive:Ghci4

There's no short-circuiting!1

The behaviour I want is of course that if the first action is successful, then the second action shouldn't take place at all.

It looks like either <|> is strict in its second argument, or maybe it's liftA2 that forces it. I've not bothered digging into the details, it's enough to observe it to realise that this approach isn't good enough.

Second attempt: cutting it short, manually

Fixing the lack of short-circuiting the evaluation after the first success isn't too difficult to do manually. Something like this does it

fallBack ::
    Monad m =>
    m (Either String a) ->
    m (Either String a) ->
    m (Either String a)
fallBack first other = do
    first >>= \case
        r@(Right _) -> pure r
        r@(Left _) -> (r <|>) <$> other

It does indeed show the behaviour I want

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "good key"
local: bad key
web: good key
Right 8

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "bad key"
local: bad key
web: bad key
Left "no such remote key: bad key"

λ> loadFromDisk "good key" `fallBack` undefined
local: good key
Right 8

Excellent! And to switch over to use Validation one just have to switch constructors, Right becomes Success and Left becomes Failure. Though collecting the failures by concatenating strings isn't the best idea of course. Switching to some other Monoid (that's the constraint on the failure type) isn't too difficult.

fallBack ::
    (Monad m, Monoid e) =>
    m (Validation e a) ->
    m (Validation e a) ->
    m (Validation e a)
fallBack first other = do
    first >>= \case
        r@(Success _) -> pure r
        r@(Failure _) -> (r <|>) <$> other

Third attempt: pulling failures out to MonadPlus

After writing the fallBack function I still wanted to explore other solutions. There's almost always something more out there in the Haskell eco system, right? So I asked in the #haskell-beginners channel on the Functional Programming Slack. The way I asked the question resulted in answers that iterates over a list of actions and cutting at the first success.

The first suggestion had me a little confused at first, but once I re-organised the helper function a little it made more sense to me.

mFromRight :: MonadPlus m => m (Either err res) -> m res
mFromRight = (either (const mzero) return =<<)

To use it put the actions in a list, map the helper above, and finally run asum on it all2. I think it makes it a little clearer what happens if it's rewritten like this.

firstRightM :: MonadPlus m => [m (Either err res)] -> m res
firstRightM = asum . fmap go
    go m = m >>= either (const mzero) return
λ> firstRightM [loadFromDisk "bad key", loadFromWeb "good key"]
local: bad key
web: good key

λ> firstRightM [loadFromDisk "good key", undefined]
local: good key

So far so good, but I left out the case where both fail, because that's sort of the fly in the ointment here

λ> firstRightM [loadFromDisk "bad key", loadFromWeb "bad key"]
local: bad key
web: bad key
*** Exception: user error (mzero)

It's not nice to be back to deal with exceptions, but it's possible to recover, e.g. by appending <|> pure 0.

λ> firstRightM [loadFromDisk "bad key", loadFromWeb "bad key"] <|> pure 0
local: bad key
web: bad key

However that removes the ability to deal with the situation where all actions fail. Not nice! Add to that the difficulty of coming up with a good MonadPlus instance for an application monad; one basically have to resort to the same thing as for IO, i.e. to throw an exception. Also not nice!

Fourth attempt: wrapping in ExceptT to get its Alternative behaviour

This was another suggestion from the Slack channel, and it is the one I like the most. Again it was suggested as a way to stop at the first successful action in a list of actions.

firstRightM ::
    (Foldable t, Functor t, Monad m, Monoid err) =>
    t (m (Either err res)) ->
    m (Either err res)
firstRightM = runExceptT . asum . fmap ExceptT

Which can be used similarly to the previous one. It's also easy to write a variant of fallBack for it.

fallBack ::
    (Monad m, Monoid err) =>
    m (Either err res) ->
    m (Either err res) ->
    m (Either err res)
fallBack first other = runExceptT $ ExceptT first <|> ExceptT other
λ> loadFromDisk "bad key" `fallBack` loadFromWeb "good key"
local: bad key
web: good key
Right 8

λ> loadFromDisk "good key" `fallBack` undefined
local: good key
Right 8

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "bad key"
local: bad key
web: bad key
Left "no such local key: bad keyno such remote key: bad key"

Yay! This solution has the short-circuiting behaviour I want, as well as collecting all errors on failure.


I'm still a little disappointed that liftA2 (<|>) isn't short-circuiting as I still think it's the easiest of the approaches. However, it's a problem that one has to rely on a deprecated instance of Alternative for Either String, but switching to use Validation would be only a minor change.

Manually writing the fallBack function, as I did in the second attempt, results in very explicit code which is nice as it often reduces the cognitive load for the reader. It's a contender, but using the deprecated Alternative instance is problematic and introducing Validition, an arguably not very common type, takes away a little of the appeal.

In the end I prefer the fourth attempt. It behaves exactly like I want and even though ExpectT lives in transformers I feel that it (I pull it in via mtl) is in such wide use that most Haskell programmers will be familiar with it.

One final thing to add is that the documentation of Validation is an excellent inspiration when it comes to the behaviour of its instances. I wish that the documentation of other packages, in particular commonly used ones like base, transformers, and mtl, would be more like it.

Comments, feedback, and questions

[2021-11-28 Sun] Dustin Sallings

Dustin sent me a comment via email a while ago, it's now March 2022 so it's taken me embarrassingly long to publish it here.

I removed a bit from the beginning of the email as it doesn't relate to this post.

… a thing I've written code for before that I was reasonably pleased with. I have a suite of software for managing my GoPro media which involves doing some metadata extraction from images and video. There will be multiple transcodings of each medium with each that contains the metadata having it completely intact (i.e., low quality encodings do not lose metadata fidelity). I also run this on multiple machines and store a cache of Metadata in S3.

Sometimes, I've already processed the metadata on another machine. Often, I can get it from the lowest quality. Sometimes, there's no metadata at all. The core of my extraction looks like this:

ms <- asum [
  Just . BL.toStrict <$> getMetaBlob mid,
  fv "mp4_low" (fn "low"),
  fv "high_res_proxy_mp4" (fn "high"),
  fv "source" (fn "src"),
  pure Nothing]

The first version grabs the processed blob from S3. The next three fetch (and process) increasingly larger variants of the uploaded media. The last one just gives up and says there's no metadata available (and memoizes that in the local DB and S3).

Some of these objects are in the tens of gigs, and I had a really bad internet connection when I first wrote this software, so I needed it to work.



I'm not sure if it's a good term to use in this case as Wikipedia says it's for Boolean operators. I hope it's not too far a stretch to use it in this context too.


In the version of base I'm using there is no asum, so I simply copied the implementation from a later version:

asum :: (Foldable t, Alternative f) => t (f a) -> f a
asum = foldr (<|>) empty

Using lens to set a value based on another

I started writing a small tool for work that consumes YAML files and combines the data into a single YAML file. To be specific it consumes YAML files containing snippets of service specification for Docker Compose and it produces a YAML file for use with docker-compose. Besides being useful to me, I thought it'd also be a good way to get some experience with lens.

The first transformation I wanted to write was one that puts in the correct image name. So, only slightly simplified, it is transforming

    x-image: panda
    x-image: goat
    image: incorrent
    x-image: tapir


    image: panda:latest
    x-image: panda
    image: goat:latest
    x-image: goat
    image: tapir:latest
    x-image: tapir

That is, it creates a new key/value pair in each object based on the value of x-image in the same object.

First approach

The first approach I came up with was to traverse the sub-objects and apply a function that adds the image key.

setImage :: Value -> Value
setImage y = y & members %~ setImg
    setImg o =
            & _Object . at "image"
            ?~ String (o ^. key "x-image" . _String <> ":latest")

It did make me wonder if this kind of problem, setting a value based on another value, isn't so common that there's a nicer solution to it. Perhaps coded up in a combinator that isn't mentioned in Optics By Example (or mabye I've forgot it was mentioned). That lead me to ask around a bit, which leads to approach two.

Second approach

Arguably there isn't much difference, it's still traversing the sub-objects and applying a function. The function makes use of view being run in a monad and ASetter being defined with Identity (a monad).

setImage' :: Value -> Value
setImage' y =
        & members . _Object
        %~ (set (at "image") . (_Just . _String %~ (<> ":latest")) =<< view (at "x-image"))

I haven't made up my mind on whether I like this better than the first. It's disappointingly similar to the first one.

Third approach

Then I it might be nice to split the fetching of x-image values from the addition of image key/value pairs. By extracting with an index it's possible to keep track of what sub-object each x-image value comes from. Then two steps can be combined using foldl.

setImage'' :: Value -> Value
setImage'' y = foldl setOne y vals
    vals = y ^@.. members <. key "x-image" . _String
    setOne y' (objKey, value) =
            & key objKey . _Object . at "image"
            ?~ String (value <> ":latest")

I'm not convinced though. I guess I'm still holding out for a brilliant combinator that fits my problem perfectly.

Please point me to "the perfect solution" if you have one, or if you just have some general tips on optics that would make my code clearer, or shorter, or more elegant, or maybe just more lens-y.

The timeout manager exception

The other day I bumped the dependencies of a Haskell project at work and noticed a new exception being thrown:

Thread killed by timeout manager

After a couple of false starts (it wasn't the connection pool, nor was it servant) I realised that a better approach would be to look at the list of packages that were updated as part of the dependency bumping.1 Most of them I thought would be very unlikely sources of it, but two in the list stood out:

Package Pre Post
unliftio 0.2.14 0.2.18
warp 3.3.15 3.3.16

warp since the exception seemed to be thrown shortly after handling an HTTP request, and unliftio since the exception was caught by the handler for uncaught exceptions and its description contains "thread". Also, when looking at the code changes in warp on GitHub2 I found that some of the changes introduced was increased use of unliftio for async stuff. The changes contain mentions of TimeoutThread and System.TimeManager. That sounded promising, and it lead me to the TimeoutThread exception in time-manager.

With that knowledge I could quickly adjust the handler for uncaught exceptions to not log TimeoutThread as fatal:

lastExceptionHandler :: LoggerSet -> SomeException -> IO ()
lastExceptionHandler logger e
  | Just TimeoutThread <- fromException e = return ()
  | otherwise = do
      logFatalIoS logger $ pack $ "uncaught exception: " <> displayException e
      flushLogStr logger

I have to say it was a bit more work to arrive at this than I'd have liked. I reckon there are easier ways to track down the information I needed. So I'd love to hear what tricks and tips others have.



As a bonus it gave me a good reason to reach for comm, a command that I rarely use but for some reason always enjoy.


GitHub's compare feature isn't very easy to discover, but a URL like this…warp-3.3.16 (note the 3 dots!) does the trick.

A first look at HMock

The other day I found Chris Smith's HMock: First Rate Mocks in Haskell (link to hackage) and thought it could be nice see if it can clear up some of the tests I have in a few of the Haskell projects at work. All the projects follow the pattern of defining custom monads for effects (something like final tagless) with instances implemented on a stack of monads from MTL. It's a pretty standard thing in Haskell I'd say, especially since the monad stack very often ends up being ReaderT MyConfig IO.

I decided to try it first on a single such custom monad, one for making HTTP requests:

class Monad m => MonadHttpClient m where
  mHttpGet :: String -> m (Status, ByteString)
  mHttpPost :: (Typeable a, Postable a) => String -> a -> m (Status, ByteString)

Yes, the underlying implementation uses wreq, but I'm not too bothered by that shining through. Also, initially I didn't have that Typeable a constraint on mHttpPost, it got added after a short exchange about KnownSymbol with Chris.

To dip a toe in the water I thought I'd simply write tests for the two effects themselves. First of all there's an impressive list of extensions needed, and then the monad needs to be made mockable:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeFamilies #-}

makeMockable ''MonadHttpClient

After that, writing a test with HMock for mHttpGet was fairly straight forward, I could simply follow the examples in the package's documentation. I'm using tasty for organising the tests though:

httpGetTest :: TestTree
httpGetTest = testCase "Get" $ do
  (s, b) <- runMockT $ do
    expect $ MHttpGet "url" |-> (status200, "result")
    mHttpGet "url"
  status200 @=? s
  "result" @=? b

The effect for sending a POST request was slightly trickier, as can be seen in the issue linked above, but with some help I came up with the following:

httpPostTest :: TestTree
httpPostTest = testCase "Post" $ do
  (s, b) <- runMockT $ do
    expect $ MHttpPost_ (eq "url") (typed @ByteString anything) |-> (status201, "result")
    mHttpPost "url" ("hello" :: ByteString)
  status201 @=? s
  "result" @=? b

Next step

My hope is that using HMock will remove the need for creating a bunch of test implementations for the various custom monads for effects1 in the projects, thereby reducing the amount of test code overall. I also suspect that it will make the tests clearer and easier to read, as the behaviour of the mocks are closer to the tests using the mocks.



Basically they could be looked at as hand-written mocks.

Working with Hedis

I'm now writing the second Haskell service using Redis to store data. There are a few packages on Hackage related to Redis but I only found 2 client libraries, redis-io and hedis. I must say I like the API of redis-io better, but it breaks a rule I hold very dear:

Libraries should never log, that's the responsibility of the application.

So, hedis it is. I tried using the API as is, but found it really cumbersome so looked around and after some inspiration from hedis-simple I came up with the following functions.

First a wrapper around a Redis function that put everything into ExceptionT with a function that transforms a reply into an Exception.

lpush :: Exception e => (Reply -> e) -> ByteString -> [ByteString] -> ExceptionT Redis Integer
lpush mapper key element = ExceptionT $ replyToExc <$> R.lpush key element
    replyToExc = first (toException . mapper)

I found wrapping up functions like this is simple, but repetitive.

Finally I need a way to run the whole thing and unwrap it all back to IO:

runRedis :: Connection -> ExceptionT Redis a -> IO (Either SomeException a)
runRedis conn = R.runRedis conn . runExceptionT

First contribution to nixpkgs.haskellPackages

Nothing much to be proud of, but yesterday I found out that servant-docs was marked broken in nixpkgs even though it builds just fine and this morning I decided to do something about it.

So, with the help of a post on the NixOS discourse I put together my first PR.

Barbie and KenJSON

After higher-kinded data (HKD) and barbies were mentioned in episode 35 of Haskell Weekly I've been wondering if it could be used in combination with aeson to do validation when implementing web services.

TLDR; I think it'd work, but I have a feeling I'd have to spend some more time on it to get an API with nice ergonomics.

Defining a type to play with

I opted to use barbies-th to save on the typing a bit. Defining a simple type holding a name and an age can then look like this

   data Person = Person {name :: Text, age :: Int}

deriving instance Show (Person Covered Identity)
deriving instance Show (Person Covered Maybe)
deriving instance Show (Person Covered (Either Text))

The two functions from the Barbies module documentation, addDefaults and check, can then be written like this

addDefaults :: Person Covered Maybe -> Person Covered Identity -> Person Covered Identity
addDefaults = bzipWith trans
    trans m d = maybe d pure m

check :: Person Covered (Either Text) -> Either [Text] (Person Covered Identity)
check pe = case btraverse (either (const Nothing) (Just . Identity)) pe of
  Just pin -> Right pin
  Nothing -> Left $ bfoldMap (either (: []) (const [])) pe

I found it straight forward to define some instances and play with those functions a bit.

Adding in JSON

The bit that wasn't immediately obvious to me was how to use aeson to parse into a type like Person Covered (Either Text).

First off I needed some data to test things out with.

bs0, bs1 :: BSL.ByteString
bs0 = "{\"name\": \"the name\", \"age\": 17}"
bs1 = "{\"name\": \"the name\", \"age\": true}"

To keep things simple I took baby steps, first I tried parsing into Person Covered Identity. It turns out that the FromJSON instance from that doesn't need much thought at all. (It's a bit of a pain to have to specify types in GHCi all the time, so I'm throwing in a specialised decoding function for each type too.)

instance FromJSON (Person Covered Identity) where
  parseJSON = withObject "Person" $
    \o -> Person <$> o .: "name"
      <*> o .: "age"

decodePI :: BSL.ByteString -> Maybe (Person Covered Identity)
decodePI = decode

Trying it out on the test data gives the expected results

λ> let i0 = decodePI bs0
λ> i0
Just (Person {name = Identity "the name", age = Identity 17})
λ> let i1 = decodePI bs1
λ> i1

So far so good! Moving onto Person Covered Maybe. I spent some time trying to use the combinators in Data.Aeson for dealing with parser failures, but in the end I had to resort to using <|> from Alternative.

instance FromJSON (Person Covered Maybe) where
  parseJSON = withObject "Person" $
    \o -> Person <$> (o .: "name" <|> pure Nothing)
      <*> (o .: "age" <|> pure Nothing)

decodePM :: BSL.ByteString -> Maybe (Person Covered Maybe)
decodePM = decode

Trying that out I saw exactly the behaviour I expected, i.e. that parsing won't fail. (Well, at least not as long as it's a valid JSON object to being with.)

λ> let m0 = decodePM bs0
λ> m0
Just (Person {name = Just "the name", age = Just 17})
λ> let m1 = decodePM bs1
λ> m1
Just (Person {name = Just "the name", age = Nothing})

With that done I found that the instance for Person Covered (Either Text) followed quite naturally. I had to spend a little time on getting the types right to parse the fields properly. Somewhat disappointingly I didn't get type errors when the behaviour of the code turned out to be wrong. I'm gussing aeson's Parser was a little too willing to give me parser failures. Anyway, I ended up with this instance

instance FromJSON (Person Covered (Either Text)) where
  parseJSON = withObject "Person" $
    \o -> Person <$> ((Right <$> o .: "name") <|> pure (Left "A name is most needed"))
      <*> ((Right <$> o .: "age") <|> pure (Left "An integer age is needed"))

decodePE :: BSL.ByteString -> Maybe (Person Covered (Either Text))
decodePE = decode

That does exhibit the behaviour I want

λ> let e0 = decodePE bs0
λ> e0
Just (Person {name = Right "the name", age = Right 17})
λ> let e1 = decodePE bs1
λ> e1
Just (Person {name = Right "the name", age = Left "An integer age is needed"})

In closing

I think everyone will agree that the FromJSON instances are increasingly messy. I think that can be fixed by putting some thought into what a more pleasing API should look like.

I'd also like to mix in validation beyond what aeson offers out-of-the-box, which really only is "is the field present?" and "does the value have the correct type?". For instance, Once we know there is a field called age, and that it's an Int, then we might want to make sure it's non-negitive, or that the person is at least 18. I'm guessing that wouldn't be too difficult.

Finally, I'd love to see examples of using HKDs for parsing/validation in the wild. It's probably easiest to reach me at

Custom monad with servant and throwing errors

In the past I've always used scotty when writing web services. This was mostly due to laziness, I found working out how to use scotty a lot easier than servant, so basically I was being lazy. Fairly quickly I bumped into some limitations in scotty, but at first the workarounds didn't add too much complexity and were acceptable. A few weeks ago they started weighing on me though and I decided to look into servant and since I really liked what I found I've started moving all projects to use servant.

In several of the projects I've used tagless final style and defined a type based on ReaderT holding configuration over IO, that is something like

newtype AppM a = AppM {unAppM : ReaderT Config IO a}
    ( Functor,
      MonadReader Config

runAppM :: AppM a -> Config -> IO a
runAppM app = runReaderT (unAppM app)

I found that servant is very well suited to this style through hoistServer and there are several examples on how to use it with a ReaderT-based type like above. The first one I found is in the servant cookbook. However, as I realised a bit later, using a simple type like this doesn't make it easy to trigger responses with status other than 200 OK. When I looked at the definition of the type for writing handlers that ships with servant, Handler, I decided to try to use the following type in my service

newtype AppM a = AppM {unAppM : ReaderT Config (ExceptT ServerError IO) a}
    ( Functor,
      MonadReader Config

runAppM :: AppM a -> Config -> IO (Either ServerError a)
runAppM app = runExceptT . runReaderT (unAppM app)

The natural transformation required by hoistServer can then be written like

nt :: AppM a -> Handler a
nt x =
  liftIO (runAppM x cfg) >>= \case
    Right v -> pure v
    Left err -> throwError err

I particularly like how clearly this suggests a way to add custom errors if I want that.

  1. Swap out ServerError for my custom error type in AppM.
  2. Write a function to transform my custom error type into a ServerError, transformCustomError :: CustomError -> ServerError.
  3. use throwError $ transformCustomError err in the Left branch of nt.

A slight complication with MonadUnliftIO

I was using unliftio in my service, and as long as I based my monad stack only on ReaderT that worked fine. I even got the MonadUnliftIO instance for free through automatic deriving. ExceptT isn't a stateless monad though, so using unliftio is out of the question, instead I had to switch to MonadBaseControl and the packages that work with it. Defining and instance of MonadBaseControl looked a bit daunting, but luckily Handler has an instance of it that I used as inspiration.

First off MonadBaseControl requires the type to also be an instance of MonadBase. There's an explicit implementation for Handler, but I found that it can be derived automatically, so I took the lazy route.

The instance of MonadBaseControl for AppM ended up looking like this

instance MonadBaseControl IO AppM where
  type StM AppM a = Either ServerError a

  liftBaseWith f = AppM (liftBaseWith (\g -> f (g . unAppM)))
  restoreM = AppM . restoreM

I can't claim to really understand what's going on in that definition, but I have Alexis King's article on Demystifying MonadBaseControl on my list of things to read.

Flycheck and HLS

I've been using LSP for most programming languages for a while now. HLS is really very good now, but I've found that it doesn't warn on quite all things I'd like it to so I find myself having to swap between the 'lsp and 'haskell-ghc checkers. However, since flycheck supports chaining checkers I thought there must be a way to have both checkers active at the same time.

The naive approach didn't work due to load order of things in Spacemacs so I had to experiment a bit to find something that works.

The first issue was to make sure that HLS is available at all. I use shell.nix together with direnv extensively and I had noticed that lsp-mode tried to load HLS before direnv had put it in the $PATH. I think the 'lsp-beforeinitialize-hook is the hook to use for this:

(add-hook 'lsp-before-initialize-hook #'direnv-update-environment))

I made a several attempt to chain the checkers but kept on getting errors due to the 'lsp checker not being defined yet. Another problem I ran into was that the checkers were chained too late, resulting in having to manually run flycheck-buffer on the first file I opened. (Deferred loading is a brilliant thing, but make some things really difficult to debug.) After quite a bit of experimenting and reading the description of various hooks I did find something that works:

(with-eval-after-load 'lsp-mode
  (defun magthe:lsp-next-checker ()
    (flycheck-add-next-checker 'lsp '(warning . haskell-ghc)))
  (add-hook 'lsp-lsp-haskell-after-open-hook

Of course I have no idea if this is the easiest or most elegant solution but it does work for my testcases:

  1. Open a file in a project, SPC p l - choose project - choose a Haskell file.
  2. Open a project, SPC p l followed by C-d, and then open a Haskell file.

Suggestions for improvements are more than welcome, of course.

Haskell, Nix and using packages from GitHub

The other day I bumped into what turned out to be a bug in Amazonka where sockets weren't closed in a timely fashion and thus the process ran out of file descriptors. Some more digging and an issue later I found that a fix most likely already in place (mine was possibly a duplicate of an older issue). Now I only had to verify if that was the case by using the most recent, and unreleased code on the develop branch of Amazonka.

My first thought was to attempt to instruct Cabal to build the bits of Amazonka I need by putting a few source-repository-package stanzas in my config. That quickly started to look like a bit of a rabbit hole, so I decided to use Nix instead. After finding the perfect SO post and looking up yet again how to do overrides for Haskell I ran cabal2nix for the three packages I need:

cabal2nix --no-haddock --no-check --subpath amazonka \
  git:// > amazonka.nix
cabal2nix --no-haddock --no-check --subpath core \
  git:// > amazonka-core.nix
cabal2nix --no-haddock --no-check --subpath amazonka-sqs \
  git:// > amazonka-sqs.nix

The relevant part of the old Nix expression looked like this:

thePkg = haskellPackages.developPackage {
  root = lib.cleanSource ./.;
  name = name;

  modifier = (t.flip t.pipe)

After adding the overrides it looked like this

hp = haskellPackages.override {
  overrides = self: super: {
    amazonka-core = self.callPackage ./amazonka-core.nix {};
    amazonka = self.callPackage ./amazonka.nix {};
    amazonka-sqs = self.callPackage ./amazonka-sqs.nix {};

thePkg = hp.developPackage {
  root = lib.cleanSource ./.;
  name = name;

  modifier = (t.flip t.pipe)

After a somewhat longer-than-usual build I could verify that I had indeed bumped into the same issue and my issue was a duplicate.

Combining Amazonka and Conduit

Combining amazonka and conduit turned out to be easier than I had expected.

Here's an SNS sink I put together today

snsSink :: (MonadAWS m, MonadIO m) => T.Text -> C.ConduitT Value C.Void m ()
snsSink topic = do
  C.await >>= \case
    Nothing -> pure ()
    Just msg -> do
      _ <- C.lift $ publishSNS topic (TL.toStrict $ TL.decodeUtf8 $ encode msg)
      snsSink topic

Putting it to use can be done with something like

foo = do
  awsEnv <- newEnv Discover
  runAWSCond awsEnv $
    <source producing Value> .| snsSink topicArn

    runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit

X-Ray and WAI

For a while we've been planning on introducing AWS X-Ray into our system at work. There's official support for a few languages, but not too surprisingly Haskell isn't on that list. I found freckle/aws-xray-client on GitHub, which is so unofficial that it isn't even published on Hackage. While it looks very good, I suspect it does more than I need and since it lacks licensing information I decided to instead implement a version tailored to our needs.

As a first step I implemented a WAI middleware that wraps an HTTP request and reports the time it took to produce a response. Between the X-Ray Developer Guide and the code in Freckle's git repo it turned out to be fairly simple.

First off, this is the first step towards X-Ray nirvana, so all I'm aiming for is minimal support. That means all I want is to send minimal X-Ray segments, with the small addition that I want to support parent_id from the start.

The first step then is to parse the HTTP header containing the X-Ray information – X-Amzn-Trace-Id. For now I'm only interested in two parts, Root and Parent, so for simplicity's sake I use a tuple to keep them in. The idea is to take the header's value, split on ; to get the parts, then split each part in two, a key and a value, and put them into an association list ([(Text, Text)]) for easy lookup using, well lookup.

parseXRayTraceIdHdr :: Text -> Maybe (Text, Maybe Text)
parseXRayTraceIdHdr hdr = do
  bits <- traverse parseHeaderComponent $ T.split (== ';') hdr
  traceId <- lookup "Root" bits
  let parent = lookup "Parent" bits
  pure (traceId, parent)

parseHeaderComponent :: Text -> Maybe (Text, Text)
parseHeaderComponent cmp = case T.split (== '=') cmp of
                            [name, value] -> Just (name, value)
                            _ -> Nothing

The start and end times for processing a request are also required. The docs say that using at least millisecond resolution is a good idea, so I decided to do exactly that. NominalDiffTime, which is what getPOSIXTime produces, supports a resolution of picoseconds (though I doubt my system's clock does) which requires a bit of (type-based) converting.

mkTimeInMilli :: IO Milli
mkTimeInMilli = ndfToMilli <$> getPOSIXTime
    ndfToMilli = fromRational . toRational

The last support function needed is one that creates the segment. Just building the JSON object, using aeson's object, is enough at this point.

mkSegment :: Text -> Text -> Milli -> Milli -> (Text, Maybe Text) -> Value
mkSegment name id startTime endTime (root, parent) =
  object $ [ "name" .= name
           , "id" .= id
           , "trace_id" .= root
           , "start_time" .= startTime
           , "end_time" .= endTime
           ] <> p
    p = maybe [] (\ v -> ["parent_id" .= v]) parent

Armed with all this, I can now put together a WAI middleware that

  1. records the start time of the call
  2. processes the request
  3. sends off the response and keeps the result of it
  4. records the end time
  5. parses the tracing header
  6. builds the segment prepended with the X-Ray daemon header
  7. sends the segment to the X-Ray daemon
traceId :: Text -> Middleware
traceId xrayName app req sendResponse = do
  startTime <- mkTimeInMilli
  app req $ \ res -> do
    rr <- sendResponse res
    endTime <- mkTimeInMilli
    theId <- T.pack . (\ v -> showHex v "") <$> randomIO @Word64
    let traceParts = (decodeUtf8 <$> requestHeaderTraceId req) >>= parseXRayTraceIdHdr
        segment = mkSegment xrayName theId startTime endTime <$> traceParts
    case segment of
      Nothing -> pure ()
      Just segment' -> sendXRayPayload $ toStrict $ prepareXRayPayload segment'
    pure rr

    prepareXRayPayload segment =
      let header = object ["format" .= ("json" :: String), "version" .= (1 :: Int)]
      in encode header <> "\n" <> encode segment

    sendXRayPayload payload = do
      addrInfos <- S.getAddrInfo Nothing (Just "") (Just "2000")
      case addrInfos of
        [] -> pure () -- silently skip
        (xrayAddr:_) -> do
          sock <- S.socket (S.addrFamily xrayAddr) S.Datagram S.defaultProtocol
          S.connect sock (S.addrAddress xrayAddr)
          sendAll sock payload
          S.close sock

The next step will be to instrument the actual processing. The service I'm instrumenting is asynchronous, so all the work happens after the response has been sent. My plan for this is to use subsegments to record it. That means I'll have to

I'm saving that work for a rainy day though, or rather, for a day when I'm so upset at Clojure that I don't want to see another parenthesis.

Edit (2020-04-10): Corrected the segment field name for the parent ID, it should be parent_id.

My ghcide build for Nix

I was slightly disappointed to find out that not all packages on Hackage that are marked as present in Nix(pkgs) actually are available. Quite a few of them are marked broken and hence not installable. One of these packages is ghcide.

There are of course expressions available for getting a working ghcide executable installed, like ghcide-nix. However, since I have rather simple needs for my Haskell projects I thought I'd play with my own approach to it.

What I care about is:

  1. availability of the development tools I use, at the moment it's mainly ghcide but I'm planning on making use of ormolu in the near future
  2. pre-built packages
  3. ease of use

So, I put together ghcide-for-nix. It's basically just a constumized Nixpkgs where the packages needed to un-break ghcide are present.

Usage is a simple import away:

import (builtins.fetchGit {
  name = "ghcide-for-nix";
  url =;
  rev = "927a8caa62cece60d9d66dbdfc62b7738d61d75f";

and it'll give you a superset of Nixpkgs. Pre-built packages are available on Cachix.

It's not sophisticated, but it's rather easy to use and suffices for my purposes.

Haskell, ghcide, and Spacemacs

The other day I read Chris Penner's post on Haskell IDE Support and thought I'd make an attempt to use it with Spacemacs.

After running stack build hie-bios ghcide haskell-lsp --copy-compiler-tool I had a look at the instructions on using haskell-ide-engine with Spacemacs. After a bit of trial and error I came up with these changes to my ~/.spacemacs:

(defun dotspacemacs/layers ()
    (haskell :variables
             haskell-completion-backend 'lsp
(defun dotspacemacs/user-config ()
  (setq lsp-haskell-process-args-hie '("exec" "ghcide" "--" "--lsp")
        lsp-haskell-process-path-hie "stack"
        lsp-haskell-process-wrapper-function (lambda (argv) (cons (car argv) (cddr argv)))
  (add-hook 'haskell-mode-hook

The slightly weird looking lsp-haskell-process-wrapper-function is removing the pesky --lsp inserted by this line.

That seems to work. Though I have to say I'm not ready to switch from intero just yet. Two things in particular didn't work with =ghcide=/LSP:

  1. Switching from one the Main.hs in one executable to the Main.hs of another executable in the same project didn't work as expected – I had hints and types in the first, but nothing in the second.
  2. Jump to the definition of a function defined in the package didn't work – I'm not willing to use GNU GLOBAL or some other source tagging system.

Hedgehog on a REST API, part 3

In my previous post on using Hedgehog on a REST API, Hedgehog on a REST API, part 2 I ran the test a few times and adjusted the model to deal with the incorrect assumptions I had initially made. In particular, I had to adjust how I modelled the User ID. Because of the simplicity of the API that wasn't too difficult. However, that kind of completely predictable ID isn't found in all APIs. In fact, it's not uncommon to have completely random IDs in API (often they are UUIDs).

So, I set out to try to deal with that. I'm still using the simple API from the previous posts, but this time I'm pretending that I can't build the ID into the model myself, or, put another way, I'm capturing the ID from the responses.

The model state

When capturing the ID it's no longer possible to use a simple Map Int Text for the state, because I don't actually have the ID until I have an HTTP response. However, the ID is playing an important role in the constructing of a sequence of actions. The trick is to use Var Int v instead of an ordinary Int. As I understand it, and I believe that's a good enough understanding to make use of Hedgehog possible, is that this way the ID is an opaque blob in the construction phase, and it's turned into a concrete value during execution. When in the opaque state it implements enough type classes to be useful for my purposes.

newtype State (v :: * -> *)= State (M.Map (Var Int v) Text)
  deriving (Eq, Show)

The API calls: add user

When taking a closer look at the Callback type not all the callbacks will get the state in the same form, opaque or concrete, and one of them, Update actually receives the state in both states depending on the phase of execution. This has the most impact on the add user action. To deal with it there's a need to rearrange the code a bit, to be specific, commandExecute can no longer return a tuple of both the ID and the status of the HTTP response because the update function can't reach into the tuple, which it needs to update the state.

That means the commandExecute function will have to do tests too. It is nice to keep all tests in the callbacks, but by sticking a MonadTest m constraint on the commandExecute it turns into a nice solution anyway.

addUser :: (MonadGen n, MonadIO m, MonadTest m) => Command n m State
addUser = Command gen exec [ Update u
    gen _ = Just $ AddUser <$> Gen.text (Range.linear 0 42) Gen.alpha

    exec (AddUser n) = do
      (s, ui) <- liftIO $ do
        mgr <- newManager defaultManagerSettings
        addReq <- parseRequest "POST http://localhost:3000/users"
        let addReq' = addReq { requestBody = RequestBodyLBS (encode $ User 0 n)}
        addResp <- httpLbs addReq' mgr
        let user = decode (responseBody addResp) :: Maybe User
        return (responseStatus addResp, user)
      status201 === s
      assert $ isJust ui
      (userName <$> ui) === Just n
      return $ userId $ fromJust ui

    u (State m) (AddUser n) o = State (M.insert o n m)

I found that once I'd come around to folding the Ensure callback into the commandExecute function the rest fell out from the types.

The API calls: delete user

The other actions, deleting a user and getting a user, required only minor changes and the changes were rather similar in both cases.

Not the type for the action needs to take a Var Int v instead of just a plain Int.

newtype DeleteUser (v :: * -> *) = DeleteUser (Var Int v)
  deriving (Eq, Show)

Which in turn affect the implementation of HTraversable

instance HTraversable DeleteUser where
  htraverse f (DeleteUser vi) = DeleteUser <$> htraverse f vi

Then the changes to the Command mostly comprise use of concrete in places where the real ID is needed.

deleteUser :: (MonadGen n, MonadIO m) => Command n m State
deleteUser = Command gen exec [ Update u
                              , Require r
                              , Ensure e
    gen (State m) = case M.keys m of
      [] -> Nothing
      ks -> Just $ DeleteUser <$> Gen.element ks

    exec (DeleteUser vi) = liftIO $ do
      mgr <- newManager defaultManagerSettings
      delReq <- parseRequest $ "DELETE http://localhost:3000/users/" ++ show (concrete vi)
      delResp <- httpNoBody delReq mgr
      return $ responseStatus delResp

    u (State m) (DeleteUser i) _ = State $ M.delete i m

    r (State m) (DeleteUser i) = i `elem` M.keys m

    e _ _ (DeleteUser _) r = r === status200


This post concludes my playing around with state machines in Hedgehog for this time. I certainly hope I find the time to put it to use on some larger API soon. In particular I'd love to put it to use at work; I think it'd be an excellent addition to the integration tests we currently have.

Architecture of a service

Early this summer it was finally time to put this one service I've been working on into our sandbox environment. It's been running without hickups so last week I turned it on for production as well. In this post I thought I'd document the how and why of the service in the hope that someone will find it useful.

The service functions as an interface to external SMS-sending services, offering a single place to change if we find that we are unhappy with the service we're using.1 This service replaces an older one, written in Ruby and no one really dares touch it. Hopefully the Haskell version will prove to be a joy to work with over time.

Overview of the architecture

The service is split into two parts, one web server using scotty, and streaming data processing using conduit. Persistent storage is provided by a PostgreSQL database. The general idea is that events are picked up from the database, acted upon, which in turn results in other events which written to the database. Those are then picked up and round and round we go. The web service accepts requests, turns them into events and writes the to the database.

Hopefully this crude diagram clarifies it somewhat.

Figure 1: Diagram of the service architecture

There are a few things that might need some explanation

  • In the past we've wanted to have the option to use multiple external SMS services at the same time. One is randomly chosen as the request comes in. There's also a possibility to configure the frequency for each external service.

    Picker implements the random picking and I've written about that earlier in Choosing a conduit randomly.

    Success and fail are dummy senders. They don't actually send anything, and the former succeeds at it while the latter fails. I found them useful for manual testing.

  • Successfully sending off a request to an external SMS service, getting status 200 back, doesn't actually mean that the SMS has been sent, or even that it ever will be. Due to the nature of SMS messaging there are no guarantees of timeliness at all. Since we are interested in finding out whether an SMS actually is sent a delayed action is scheduled, which will fetch the status of a sent SMS after a certain time (currently 2 minutes). If an SMS hasn't been sent after that time it might as well never be – it's too slow for our end-users.

    This is what report-fetcher and fetcher-func do.

  • The queue sink and queue src are actually sourceTQueue and sinkTQueue. Splitting the stream like that makes it trivial to push in events by using writeTQueue.
  • I use sequenceConduits in order to send a single event to multiple =Conduit=s and then combine all their results back into a single stream. The ease with which this can be done in conduit is one of the main reasons why I choose to use it.2

Effects and tests

I started out writing everything based on a type like ReaderT <my cfg type> IO and using liftIO for effects that needed lifting. This worked nicely while I was setting up the basic structure of the service, but as soon as I hooked in the database I really wanted to do some testing also of the effectful code.

After reading Introduction to Tagless Final and The ReaderT Design Patter, playing a bit with both approaches, and writing Tagless final and Scotty and The ReaderT design pattern or tagless final?, I finally chose to go down the route of tagless final. There's no strong reason for that decision, maybe it was just because I read about it first and found it very easy to move in that direction in small steps.

There's a split between property tests and unit tests:

  • Data types, their monad instances (like JSON (de-)serialisation), pure functions and a few effects are tested using properties. I'm using QuickCheck for that. I've since looked a little closer at hedgehog and if I were to do a major overhaul of the property tests I might be tempted to rewrite them using that library instead.
  • Most of the =Conduit=s are tested using HUnit.


The service will be run in a container and we try to follow the 12-factor app rules, where the third one says that configuration should be stored in the environment. All previous Haskell projects I've worked on have been command line tools were configuration is done (mostly) using command line argument. For that I usually use optparse-applicative, but it's not applicable in this setting.

After a bit of searching on hackage I settled on etc. It turned out to be nice an easy to work with. The configuration is written in JSON and only specifies environment variables. It's then embedded in the executable using file-embed. The only thing I miss is a ToJSON instance for Config – we've found it quite useful to log the active configuration when starting a service and that log entry would become a bit nicer if the message was JSON rather than the (somewhat difficult to read) string that Config's Show instance produces.


There are two requirements we have when it comes to logging

  1. All log entries tied to a request should have a correlation ID.
  2. Log requests and responses

I've written about correlation ID before, Using a configuration in Scotty.

Logging requests and responses is an area where I'm not very happy with scotty. It feels natural to solve it using middleware (i.e. using middleware) but the representation, especially of responses, is a bit complicated so for the time being I've skipped logging the body of both. I'd be most interested to hear of libraries that could make that easier.

Data storage and picking up new events

The data stream processing depends heavily on being able to pick up when new events are written to the database. Especially when there are more than one instance running (we usually have at least two instance running in the production environment). To get that working I've used postgresql-simple's support for LISTEN and NOTIFY via the function getNotification.

When I wrote about this earlier, Conduit and PostgreSQL I got some really good feedback that made my solution more robust.

Delayed actions

Some things in Haskell feel almost like cheating. The light-weight threading makes me confident that a forkIO followed by a threadDelay (or in my case, the ones from unliftio) will suffice.



It has happened in the past that we've changed SMS service after finding that they weren't living up to our expectations.


A while ago I was experimenting with other streaming libraries, but I gave up on getting re-combination to work – Zipping streams

Hedgehog on a REST API, part 2

This is a short follow-up to Hedgehog on a REST API where I actually run the tests in that post.

Fixing an issue with the model

The first issue I run into is

━━━ Main ━━━
  ✗ sequential failed after 18 tests and 1 shrink.

        ┏━━ tst/test-01.hs ━━━
     89 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     90 ┃ getUser = Command gen exec [ Require r
     91 ┃                            , Ensure e
     92 ┃                            ]
     93 ┃   where
     94 ┃     gen (State m) = case M.keys m of
     95 ┃       [] -> Nothing
     96 ┃       ks -> Just $ GetUser <$> Gen.element ks
     97 ┃
     98 ┃     exec (GetUser i) = liftIO $ do
     99 ┃       mgr <- newManager defaultManagerSettings
    100 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
    101 ┃       getResp <- httpLbs getReq mgr
    102 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    103 ┃       return (status200 == responseStatus getResp, us)
    104 ┃
    105 ┃     r (State m) (GetUser i) = i `elem` M.keys m
    106 ┃
    107 ┃     e _ _ (GetUser _) (r, us) = do
    108 ┃       r === True
    109 ┃       assert $ isJust us
    110 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1

        ┏━━ tst/test-01.hs ━━━
    118 ┃ prop_seq :: Property
    119 ┃ prop_seq = property $ do
    120 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 10) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 1 = GetUser 1
    121 ┃   resetWS
    122 ┃   executeSequential initialState actions

    This failure can be reproduced by running:
    > recheck (Size 17) (Seed 2158538972777046104 (-1442908127347265675)) sequential

  ✗ 1 failed.

It's easy to verify this using httpie:

$ http -p b POST :3000/users userId:=0 "userName="
    "userId": 0,
    "userName": ""
$ http -p b GET :3000/users/1

It's clear that my assumption that User ID starts at 1 is wrong. Luckily fixing that isn't too difficult. Instead of defining the update function for addUser as

u (State m) (AddUser n) _o = State $ M.insert k n m
    k = succ $ foldl max 0 (M.keys m)

I define it as

u (State m) (AddUser n) _o = State $ M.insert k n m
    k = case M.keys m of
      [] -> 0
      ks -> succ $ foldl max 0 ks

The complete code at this point can be found here.

Fixing another issue with the model

With that fix in place another issue with the model shows up

━━━ Main ━━━
  ✗ sequential failed after 74 tests and 2 shrinks.

        ┏━━ tst/test-01.hs ━━━
     91 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     92 ┃ getUser = Command gen exec [ Require r
     93 ┃                            , Ensure e
     94 ┃                            ]
     95 ┃   where
     96 ┃     gen (State m) = case M.keys m of
     97 ┃       [] -> Nothing
     98 ┃       ks -> Just $ GetUser <$> Gen.element ks
     99 ┃
    100 ┃     exec (GetUser i) = liftIO $ do
    101 ┃       mgr <- newManager defaultManagerSettings
    102 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
    103 ┃       getResp <- httpLbs getReq mgr
    104 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    105 ┃       return (status200 == responseStatus getResp, us)
    106 ┃
    107 ┃     r (State m) (GetUser i) = i `elem` M.keys m
    108 ┃
    109 ┃     e _ _ (GetUser _) (r, us) = do
    110 ┃       r === True
    111 ┃       assert $ isJust us
    112 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1

        ┏━━ tst/test-01.hs ━━━
    120 ┃ prop_seq :: Property
    121 ┃ prop_seq = property $ do
    122 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 10) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 1 = DeleteUser 0
        ┃   │ Var 2 = AddUser ""
        ┃   │ Var 3 = GetUser 0
    123 ┃   resetWS
    124 ┃   executeSequential initialState actions

    This failure can be reproduced by running:
    > recheck (Size 73) (Seed 3813043122711576923 (-444438259649958339)) sequential

  ✗ 1 failed.

Again, verifying this using httpie shows what the issue is

$ http -p b POST :3000/users userId:=0 "userName="
    "userId": 0,
    "userName": ""
$ http -p b DELETE :3000/users/0
$ http -p b POST :3000/users userId:=0 "userName="
    "userId": 1,
    "userName": ""
$ http -p b GET :3000/users/0

In other words, the model assumes that the 0 User ID get's re-used.

To fix this I need a bigger change. The central bit is that the state is changed to keep track of the index more explicitly. That is, it changes from

newtype State (v :: * -> *)= State (M.Map Int Text)
  deriving (Eq, Show)


data State (v :: * -> *)= State Int (M.Map Int Text)
  deriving (Eq, Show)

That change does, quite obviously, require a bunch of other changes in the other functions dealing with the state. The complete file can be viewed here.

All is well, or is it?

After this the tests pass, so all is good in the world, right?

In the test I defined the property over rather short sequences of commands. What happens if I increase the (maximum) length of the sequences a bit? Instead using Range.linear 1 10 I'll use Range.linear 1 1000. Well, besides taking slightly longer to run I get another sequence of commands that triggers an issue:

━━━ Main ━━━
  ✗ sequential failed after 13 tests and 29 shrinks.

        ┏━━ tst/test-01.hs ━━━
     87 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     88 ┃ getUser = Command gen exec [ Require r
     89 ┃                            , Ensure e
     90 ┃                            ]
     91 ┃   where
     92 ┃     gen (State _ m) = case M.keys m of
     93 ┃       [] -> Nothing
     94 ┃       ks -> Just $ GetUser <$> Gen.element ks
     95 ┃
     96 ┃     exec (GetUser i) = liftIO $ do
     97 ┃       mgr <- newManager defaultManagerSettings
     98 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
     99 ┃       getResp <- httpLbs getReq mgr
    100 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    101 ┃       return (status200 == responseStatus getResp, us)
    102 ┃
    103 ┃     r (State _ m) (GetUser i) = i `elem` M.keys m
    104 ┃
    105 ┃     e _ _ (GetUser _) (r, us) = do
    106 ┃       r === True
    107 ┃       assert $ isJust us
    108 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1

        ┏━━ tst/test-01.hs ━━━
    116 ┃ prop_seq :: Property
    117 ┃ prop_seq = property $ do
    118 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 1000) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 2 = AddUser ""
        ┃   │ Var 5 = AddUser ""
        ┃   │ Var 7 = AddUser ""
        ┃   │ Var 9 = AddUser ""
        ┃   │ Var 11 = AddUser ""
        ┃   │ Var 20 = AddUser ""
        ┃   │ Var 28 = AddUser ""
        ┃   │ Var 30 = AddUser ""
        ┃   │ Var 32 = AddUser ""
        ┃   │ Var 33 = AddUser ""
        ┃   │ Var 34 = AddUser ""
        ┃   │ Var 37 = AddUser ""
        ┃   │ Var 38 = AddUser ""
        ┃   │ Var 41 = AddUser ""
        ┃   │ Var 45 = AddUser ""
        ┃   │ Var 47 = GetUser 15
    119 ┃   resetWS
    120 ┃   executeSequential initialState actions

    This failure can be reproduced by running:
    > recheck (Size 12) (Seed 2976784816810995551 (-47094630645854485)) sequential

  ✗ 1 failed.

That is, after inserting 16 users, we don't see any user when trying to get that 16th user (User ID 15). That's a proper bug in the server.

As a matter of fact, this is the bug I put into the server and was hoping to find. In particular, I wanted hedgehog to find the minimal sequence leading to this bug.1 Which it clearly has!



If you recall from the previous post, I was interested in the integrated shrinking offered by hedgehog.

Hedgehog on a REST API

Last year I wrote a little bit about my attempt to use QuickCheck to test a REST API. Back then I got as far as generating test programs, running them, and validating an in-test model against the observed behaviour of the web service under test. One thing that I didn't implement was shrinking. I had some ideas, and got some better ideas in a comment on that post, but I've not taken the time to actually sit down and work it out. Then, during this spring, a couple of blog posts from Oskar Wickström (intro, part 1, part 2) made me aware of another library for doing property-based testing, hedgehog. It differs quite a bit from QuickCheck, most notably the way it uses to generate random data, and, this is the bit that made me sit up and pay attention, it has integrated shrinking.

My first plan was to use the same approach as I used with QuickCheck, but after finding out that there's explicit support for state machine tests everything turned out to be a bit easier than I had expected.

Well, it still wasn't exactly easy to work out the details, but the registry example in the hedgehog source repo together with a (slightly dated) example I managed to work it out (I think).


The API is the same as in the post on using QuickCheck, with one little difference, I've been lazy when implementing GET /users/:id and return a list of users (that makes it easy to represent a missing :id).

Method Route Example in Example out
POST /users {"userId": 0, "userName": "Yogi Berra"} {"userId": 42, "userName": "Yogi Berra"}
DELETE /users/:id    
GET /users   [0,3,7]
GET /users/:id   [{"userId": 42, "userName": "Yogi Berra"}]
GET /users/:id   [] (when there's no user with :id)
POST /reset    

The model state

Just like last time I'm using as simple a model as I think I can get away with, based on the API above:

newtype State (v :: * -> *)= State (M.Map Int Text)
  deriving (Eq, Show)

initialState :: State v
initialState = State M.empty

That extra v is something that hedgehog requires. Why? I don't really know, and luckily I don't have to care to make it all work. One thing though, the language pragma KindSignatures is necessary to use that kind of syntax.

Representing API calls

Representing an API call requires three things

  1. a type
  2. an implementation of HTraversable for the type
  3. a function producing a Command for the type

I represent the three API calls with these three types

newtype AddUser (v :: * -> *) = AddUser Text
  deriving (Eq, Show)

newtype DeleteUser (v :: * -> *) = DeleteUser Int
  deriving (Eq, Show)

newtype GetUser (v :: * -> *) = GetUser Int
  deriving (Eq, Show)

Again that v pops up, but as with the model state, there's no need to pay any attention to it.

For the implementation of HTraversable I was greatly helped by the registry example. Their implementations are fairly straight forward, which is a good thing since the need for them is internal to hedgehog.

instance HTraversable AddUser where
  htraverse _ (AddUser n) = AddUser <$> pure n

instance HTraversable DeleteUser where
  htraverse _ (DeleteUser i) = DeleteUser <$> pure i

instance HTraversable GetUser where
  htraverse _ (GetUser i) = GetUser <$> pure i

Once these two things are out of the way we get to the meat of the implementation of the API calls, a function creating a Command instance for each type of API call. The exact type for all three functions will be

(MonadGen n, MonadIO m) => Command n m State

which doesn't say a whole lot, I think. After reading the documentation I found it a little clearer, but the two examples, state machine testing and registry, was what cleared things up for me.1 In an attempt at being overly explicit I wrote these functions in the same style. This is what it ended up looking like for the AddUser type:

addUser :: (MonadGen n, MonadIO m) => Command n m State
addUser = Command gen exec [ Update u
                           , Ensure e
    gen _ = Just $ AddUser <$> Gen.text (Range.linear 0 42) Gen.alpha

    exec (AddUser n) = liftIO $ do
      mgr <- newManager defaultManagerSettings
      addReq <- parseRequest "POST http://localhost:3000/users"
      let addReq' = addReq { requestBody = RequestBodyLBS (encode $ User 0 n)}
      addResp <- httpLbs addReq' mgr
      let user = decode (responseBody addResp) :: Maybe User
      return (responseStatus addResp, user)

    u (State m) (AddUser n) _o = State $ M.insert k n m
        k = succ $ foldl max 0 (M.keys m)

    e _ _ (AddUser n) (r, ui) = do
      r === status201
      assert $ isJust ui
      (userName <$> ui) === Just n

Piece by piece:

  1. gen is the generator of data. It takes one argument, the current state, but for AddUser I have no use for it. The user name is generated using a generator for Text, and rather arbitrarily I limit the names to 42 characters.
  2. exec is the action that calls the web service. Here I'm using http-client to make the call and aeson to parse the response into a User. It produces output.
  3. u is a function for updating the model state. It's given the current state, the command and the output. All I need to to do for AddUser is to pick a userId and associate it with the generated name.
  4. e is a function for checking post-conditions, in other words checking properties that must hold after exec has run and the state has been updated. It's given four arguments, the previous state, the updated state, the command and the output. The tests here are on the HTTP response code and the returned user name. I think that will do for the time being.

The function for DeleteUser follows the same pattern

deleteUser :: (MonadGen n, MonadIO m) => Command n m State
deleteUser = Command gen exec [ Update u
                              , Require r
                              , Ensure e
    gen (State m) = case M.keys m of
      [] -> Nothing
      ks -> Just $ DeleteUser <$> Gen.element ks

    exec (DeleteUser i) = liftIO $ do
      mgr <- newManager defaultManagerSettings
      delReq <- parseRequest $ "DELETE http://localhost:3000/users/" ++ show i
      delResp <- httpNoBody delReq mgr
      return $ responseStatus delResp

    u (State m) (DeleteUser i) _ = State $ M.delete i m

    r (State m) (DeleteUser i) = i `elem` M.keys m

    e _ _ (DeleteUser _) r = r === status200

I think only two pieces need further explanation:

  1. gen only returns a DeleteUser with an index actually present in the model state. If there are no users in the model then Nothing is returned. As far as I understand that means that generated programs will only make calls to delete existing users.2
  2. r is a pre-condition that programs only delete users that exist. At first I had skipped this pre-condition, thinking that it'd be enough to have gen only create delete calls for existing users. However, after reading the documentation of Command and Callback a bit more closely I realised that I might need a pre-condition to make sure that this holds true also while shrinking.

The final function, for GetUser requires no further explanation so I only present it here

getUser :: (MonadGen n, MonadIO m) => Command n m State
getUser = Command gen exec [ Require r
                           , Ensure e
    gen (State m) = case M.keys m of
      [] -> Nothing
      ks -> Just $ GetUser <$> Gen.element ks

    exec (GetUser i) = liftIO $ do
      mgr <- newManager defaultManagerSettings
      getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
      getResp <- httpLbs getReq mgr
      let us = decode $ responseBody getResp :: Maybe [User]
      return (status200 == responseStatus getResp, us)

    r (State m) (GetUser i) = i `elem` M.keys m

    e _ _ (GetUser _) (r, us) = do
      r === True
      assert $ isJust us
      (length <$> us) === Just 1

The property and test

It looks like there are two obvious top-level properties

  1. the web service works as expected when all calls are made one at a time (sequential), and
  2. the web service works as expected when all calls are made in parallel.

Hedgehog provides two pairs of functions for this

  1. a sequential generator with executeSequential, and
  2. a parallel generator with executeParallel.

I started with the former only

prop_seq :: Property
prop_seq = property $ do
  actions <- forAll $ Gen.sequential (Range.linear 1 10) initialState [addUser, deleteUser, getUser]
  executeSequential initialState actions

This first creates a generator of programs of at most length 103, then turning that into a Sequential which can be passed to executeSequential to turn into a Property.

The function resetWS clears out the web service to make sure that the tests start with a clean slate each time. Its definition is

resetWS :: MonadIO m => m ()
resetWS = liftIO $ do
  mgr <- newManager defaultManagerSettings
  resetReq <- parseRequest "POST http://localhost:3000/reset"
  void $ httpNoBody resetReq mgr

The final bit is the main function, which I wrote like this

main :: IO ()
main = do
  res <- checkSequential $ Group "Main" [("sequential", prop_seq)]
  unless res exitFailure

That is, first run the property sequentially (checkSequential) and if that fails exit with failure.

Running the test

When running the test fails and gives me a program that breaks the property, and exactly what fails:

━━━ Main ━━━
  ✗ sequential failed after 13 tests and 1 shrink.

        ┏━━ tst/test-01.hs ━━━
     89 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     90 ┃ getUser = Command gen exec [ Require r
     91 ┃                            , Ensure e
     92 ┃                            ]
     93 ┃   where
     94 ┃     gen (State m) = case M.keys m of
     95 ┃       [] -> Nothing
     96 ┃       ks -> Just $ GetUser <$> Gen.element ks
     97 ┃
     98 ┃     exec (GetUser i) = liftIO $ do
     99 ┃       mgr <- newManager defaultManagerSettings
    100 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
    101 ┃       getResp <- httpLbs getReq mgr
    102 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    103 ┃       return (status200 == responseStatus getResp, us)
    104 ┃
    105 ┃     r (State m) (GetUser i) = i `elem` M.keys m
    106 ┃
    107 ┃     e _ _ (GetUser _) (r, us) = do
    108 ┃       r === True
    109 ┃       assert $ isJust us
    110 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1

        ┏━━ tst/test-01.hs ━━━
    118 ┃ prop_seq :: Property
    119 ┃ prop_seq = property $ do
    120 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 10) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 1 = GetUser 1
    121 ┃   resetWS
    122 ┃   executeSequential initialState actions

    This failure can be reproduced by running:
    > recheck (Size 12) (Seed 6041776208714975061 (-2279196309322888437)) sequential

  ✗ 1 failed.

My goodness, that is pretty output!

Anyway, I'd say that the failing program has been shrunk to be minimal so I'd say that all in all this is a big step up from what I had earlier. Sure, using the hedgehog state machine API is slightly involved, but once worked out I find it fairly straight-forward and it most likely is written by people much more knowledgable than me and better than anything I could produce. Having to use generators explicitly (the hedgehog way) is neither easier nor more complicated than defining a few type class instances (the QuickCheck way). Finally, the integrated shrinking is rather brilliant and not having to implement that myself is definitely a big benefit.

Now I only have to fix the errors in the web service that the test reveal. This post is already rather long, so I'll keep that for a future post.



There is still one thing that's unclear to me though, and that's how to get to the output in an update function.


Put another way, programs will never test how the web service behaves when asking for non-existing users. I think that, if I want to test that, I'll opt for using a separate API call type for it.


At least that's my understanding of the impact of Range.linear 1 10.

Comonadic builders, minor addition

When reading about Comonadic builders the other day I reacted to this comment:

The comonad package has the Traced newtype wrapper around the function (->). The Comonad instance for this newtype gives us the desired behaviour. However, dealing with the newtype wrapping and unwrapping makes our code noisy and truly harder to understand, so let's use the Comonad instance for the arrow (->) itself

So, just for fun I thought I work out the "noisy and truly harder" bits.

To begin with I needed two language extensions and two imports

{-# LANGUAGE OverloadedStrings#-}
{-# LANGUAGE RecordWildCards #-}

import Control.Comonad.Traced
import Data.Text

After that I could copy quite a bit of stuff directly from the other post

After this everything had only minor changes. First off the ProjectBuilder type had to be changed to

type ProjectBuilder = Traced Settings Project

With that done the types of all the functions can actually be left as they are, but of course the definitions have to modified. However, it turned out that the necessary modifications were rather smaller than I had expected. First out buildProject which I decided to call buildProjectW to make it possible to keep the original code and the new code in the same file without causing name clashes:

buildProjectW :: Text -> ProjectBuilder
buildProjectW = traced . buildProject
    buildProject projectName Settings{..} = Project
      { projectHasLibrary = getAny settingsHasLibrary
      , projectGitHub     = getAny settingsGitHub
      , projectTravis     = getAny settingsTravis
      , ..

The only difference is the addition of traced . to wrap it up in the newtype, the rest is copied straight from the original article.

The two simple project combinator functions, which I call hasLibraryBW and gitHubBW, needed a bit of tweaking. In the original version combinators take a builder which is an ordinary function, so it can just be called. Now however, the function is wrapped in a newtype so a bit of unwrapping is necessary:

hasLibraryBW :: ProjectBuilder -> Project
hasLibraryBW builder = runTraced builder $ mempty { settingsHasLibrary = Any True }

gitHubBW :: ProjectBuilder -> Project
gitHubBW builder = runTraced builder $ mempty { settingsGitHub = Any True }

Once again it's rather small differences from the code in the article.

As for the final combinator, which I call travisBW, actually needed no changes at all. I only rewrote it using a when clause, because I prefer that style over let:

travisBW :: ProjectBuilder -> Project
travisBW builder = project { projectTravis = projectGitHub project }
    project = extract builder

Finally, to show that this implementation hasn't really changed the behaviour

λ extract $ buildProjectW "travis" =>> travisBW
Project { projectName = "travis"
        , projectHasLibrary = False
        , projectGitHub = False
        , projectTravis = False

λ extract $ buildProjectW "github-travis" =>> gitHubBW =>> travisBW
Project { projectName = "github-travis"
        , projectHasLibrary = False
        , projectGitHub = True
        , projectTravis = True

λ extract $ buildProjectW "travis-github" =>> travisBW =>> gitHubBW
Project { projectName = "travis-github"
        , projectHasLibrary = False
        , projectGitHub = True
        , projectTravis = True

Conduit and PostgreSQL

For a while now I've been playing around with an event-drive software design (EDA) using conduit for processing of events. For this post the processing can basically be viewed as the following diagram

+-----------+   +------------+   +---------+
|           |   |            |   |         |
| PG source |-->| Processing |-->| PG sink |
|           |   |            |   |         |
+-----------+   +------------+   +---------+
     ^                                |
     |            +------+            |
     |            |      |            |
     |            |  PG  |            |
     +------------|  DB  |<-----------+
                  |      |

I started out looking for Conduit components for PostgreSQL on Hackage but failed to find something fitting so I started looking into writing them myself using postgresql-simple.

The sink wasn't much of a problem, use await to get an event (a tuple) and write it to the database. My almost complete ignorance of using databases resulted in a first version of the source was rather naive and used busy-waiting. Then I stumbled on PostgreSQL's support for notifications through the LISTEN and NOTIFY commands. I rather like the result and it seems to work well.1

It looks like this

import           Control.Monad.IO.Class (MonadIO, liftIO)
import           Data.Aeson (Value)
import qualified Data.Conduit as C
import qualified Data.Conduit.Combinators as CC
import           Data.Text (Text)
import           Data.Time.Clock (UTCTime)
import           Data.UUID (UUID)
import           Database.PostgreSQL.Simple (Connection, Only(..), execute, execute_, query)
import           Database.PostgreSQL.Simple.Notification (getNotification)

fst8 :: (a, b, c, d, e, f, g, h) -> a
fst8 (a, _, _, _, _, _, _, _) = a

dbSource :: MonadIO m => Connection -> Int -> C.ConduitT () (Int, UTCTime, Int, Int, Bool, UUID, Text, Value) m ()
dbSource conn ver = do
  res <- liftIO $ query conn "SELECT * from events where id > (?) ORDER BY id" (Only ver)
  case res of
    [] -> do
      liftIO $ execute_ conn "LISTEN MyEvent"
      liftIO $ getNotification conn
      dbSource conn ver
    _ -> do
      let ver' = maximum $ map fst8 res
      CC.yieldMany res
      dbSource conn ver'

dbSink :: MonadIO m => Connection -> C.ConduitT (Int, Int, Bool, UUID, Text, Value) C.Void m ()
dbSink conn = do
  evt <- C.await
  case evt of
    Nothing -> return ()
    Just event -> do
      liftIO $ execute conn "INSERT INTO events \
                            \(srv_id, stream_id, cmd, cmd_id, correlation_id, event_data) \
                            \VALUES (?, ?, ?, ?, ?, ?)" event
      liftIO $ execute_ conn "NOTIFY MyEvent"
      dbSink conn



If I've missed something crucial I would of course love to hear about it.

Choosing a conduit randomly

Lately I've been playing around conduit. One thing I wanted to try out was to set up processing where one processing step was chosen on random from a number of components, based on weights. In short I guess I wanted a function with a type something like this

foo :: [(Int, ConduitT i o m r)] -> ConduitT i o m r

I have to admit I don't even know where to start writing such a function1 but after a little bit of thinking I realised I could get the same effect by controlling how chunks of data is routed. That is, instead of choosing a component randomly, I can choose a route randomly. It would look something like when choosing from three components

                        +---------+   +----------+   +-------------+
                        | Filter  |   | Drop tag |   | Component A |
                    +-->| Value-0 |-->|          |-->|             |--+
                    |   +---------+   +----------+   +-------------+  |
+----------------+  |   +---------+   +----------+   +-------------+  |
| Choose random  |  |   | Filter  |   | Drop tag |   | Component B |  |
| value based on +----->| Value-1 |-->|          |-->|             |----->
| weights        |  |   +---------+   +----------+   +-------------+  |
+----------------+  |   +---------+   +----------+   +-------------+  |
                    |   | Filter  |   | Drop tag |   | Component C |  |
                    +-->| Value-2 |-->|          |-->|             |--+
                        +---------+   +----------+   +-------------+

That is

  1. For each chunk that comes in, choose a value randomly based on weights and tag the chunk with the choosen value, then
  2. split the processing into one route for each component,
  3. in each route filter out chunks tagged with a single value, and
  4. remove the tag, then
  5. pass the chunk to the component, and finally
  6. bring the routes back together again.

Out of these steps all but the very first one are already available in conduit:

What's left is the beginning. I started with a function to pick a value on random based on weights2

pickByWeight :: [(Int, b)] -> IO b
pickByWeight xs = randomRIO (1, tot) >>= \ n -> return (pick n xs)
    tot = sum $ map fst xs

    pick n ((k, x):xs)
      | n <= k = x
      | otherwise = pick (n - k) xs
    pick _ _ = error "pick error"

Using that I then made a component that tags chunks

picker ws = do
  evt <- await
  case evt of
    Nothing -> return ()
    Just e -> do
      p <- liftIO $ pickByWeight ws
      yield (p, e)
      picker ws

I was rather happy with this…

@snoyberg just have to let you know, conduit is a joy to use. Thanks for sharing it.

– Magnus Therning (@magthe) February 6, 2019



Except maybe by using Template Haskell to generate the code I did come up with.


I used Quickcheck's frequency as inspiration for writing it.

Using stack to get around upstream bugs

Recently I bumped into a bug in amazonka.1 I can't really sit around waiting for Amazon to fix it, and then for amazonka to use the fixed documentation to generate the code and make another release.

Luckily stack contains features that make it fairly simple to work around this bug until it's properly fixed. Here's how.

  1. Put the upstream code in a git repository of your own. In my case I simply forked the amazonka repository on github (my fork is here).
  2. Fix the bug and commit the change. My change to amazonka-codepipeline was simply to remove the missing fields – it was easier than trying to make them optional (i.e. wrapping them in =Maybe=s).
  3. Tell slack to use the code from your modified git repository. In my case I added the following to my slack.yaml:
  - github: magthe/amazonka
    commit: 1543b65e3a8b692aa9038ada68aaed9967752983
      - amazonka-codepipeline

That's it!



The guilty party is Amazon, not amazonka, though I was a little surprised that there doesn't seem to be any established way to modify the Amazon API documentation before it's used to autogenerate the Haskell code.

The ReaderT design pattern or tagless final?

The other week I read V. Kevroletin's Introduction to Tagless Final and realised that a couple of my projects, both at work and at home, would benefit from a refactoring to that approach. All in all I was happy with the changes I made, even though I haven't made use of all the way. In particular there I could further improve the tests in a few places by adding more typeclasses. For now it's good enough and I've clearly gotten some value out of it.

I found mr. Kevroletin's article to be a good introduction so I've been passing it on when people on the Functional programming slack bring up questions about how to organize their code as applications grow. In particular if they mention that they're using monad transformers. I did exactly that just the other day @solomon wrote

so i've created a rats nest of IO where almost all the functions in my program are in ReaderT Env IO () and I'm not sure how to purify everything and move the IO to the edge of the program

I proposed tagless final and passed the URL on, and then I got a pointer to the article The ReaderT Design Patter which I hadn't seen before.

The two approches are similar, at least to me, and I can't really judge if one's better than the other. Just to get a feel for it I thought I'd try to rewrite the example in the ReaderT article in a tagless final style.

A slightly changed example of ReaderT design pattern

I decided to make a few changes to the example in the article:

  • I removed the modify function, instead the code uses the typeclass function modifyBalance directly.
  • I separated the instances needed for the tests spatially in the code just to make it easier to see what's "production" code and what's test code.
  • I combined the main functions from the various examples to that both an example (main0) and the test (main1) are run.
  • I switched from Control.Concurrent.Async.Lifted.Safe (from monad-control) to UnliftIO.Async (from unliftio)

After that the code looks like this

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}

import           Control.Concurrent.STM
import           Control.Monad.Reader
import qualified Control.Monad.State.Strict as State
import           Say
import           Test.Hspec
import           UnliftIO.Async

data Env = Env
  { envLog :: !(String -> IO ())
  , envBalance :: !(TVar Int)

class HasLog a where
  getLog :: a -> (String -> IO ())

instance HasLog Env where
  getLog = envLog

class HasBalance a where
  getBalance :: a -> TVar Int

instance HasBalance Env where
  getBalance = envBalance

class Monad m => MonadBalance m where
  modifyBalance :: (Int -> Int) -> m ()

instance (HasBalance env, MonadIO m) => MonadBalance (ReaderT env m) where
  modifyBalance f = do
    env <- ask
    liftIO $ atomically $ modifyTVar' (getBalance env) f

logSomething :: (MonadReader env m, HasLog env, MonadIO m) => String -> m ()
logSomething msg = do
  env <- ask
  liftIO $ getLog env msg

main0 :: IO ()
main0 = do
  ref <- newTVarIO 4
  let env = Env { envLog = sayString , envBalance = ref }
      (modifyBalance (+ 1))
      (logSomething "Increasing account balance"))
  balance <- readTVarIO ref
  sayString $ "Final balance: " ++ show balance

instance HasLog (String -> IO ()) where
  getLog = id

instance HasBalance (TVar Int) where
  getBalance = id

instance Monad m => MonadBalance (State.StateT Int m) where
  modifyBalance = State.modify

main1 :: IO ()
main1 = hspec $ do
  describe "modify" $ do
    it "works, IO" $ do
      var <- newTVarIO (1 :: Int)
      runReaderT (modifyBalance (+ 2)) var
      res <- readTVarIO var
      res `shouldBe` 3
    it "works, pure" $ do
      let res = State.execState (modifyBalance (+ 2)) (1 :: Int)
      res `shouldBe` 3
  describe "logSomething" $
    it "works" $ do
      var <- newTVarIO ""
      let logFunc msg = atomically $ modifyTVar var (++ msg)
          msg1 = "Hello "
          msg2 = "World\n"
      runReaderT (logSomething msg1 >> logSomething msg2) logFunc
      res <- readTVarIO var
      res `shouldBe` (msg1 ++ msg2)

main :: IO ()
main = main0 >> main1

I think the distinguising features are

  • The application environmant, Env will contain configuraiton values (not in this example), state, envBalance, and functions we might want to vary, envLog
  • There is no explicit type representing the execution context
  • Typeclasses are used to abstract over application environment, HasLog and HasBalance
  • Typeclasses are used to abstract over operations, MonadBalance
  • Typeclasses are implemented for both the application environment, HasLog and HasBalance, and the execution context, MonadBalance

In the end this makes for code with very loose couplings; there's not really any single concrete type that implements all the constraints to work in the "real" main function (main0). I could of course introduce a type synonym for it

type App = ReaderT Env IO

but it brings no value – it wouldn't be used explicitly anywhere.

A tagless final version

In order to compare the ReaderT design pattern to tagless final (as I understand it) I made an attempt to translate the code above. The code below is the result.1

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}

import           Control.Concurrent.STM
import qualified Control.Monad.Identity as Id
import           Control.Monad.Reader
import qualified Control.Monad.State.Strict as State
import           Say
import           Test.Hspec
import           UnliftIO (MonadUnliftIO)
import           UnliftIO.Async

newtype Env = Env {envBalance :: TVar Int}

newtype AppM a = AppM {unAppM :: ReaderT Env IO a}
  deriving (Functor, Applicative, Monad, MonadIO, MonadReader Env, MonadUnliftIO)

runAppM :: Env -> AppM a -> IO a
runAppM env app = runReaderT (unAppM app) env

class Monad m => ModifyM m where
  mModify :: (Int -> Int) -> m ()

class Monad m => LogSomethingM m where
  mLogSomething :: String -> m()

instance ModifyM AppM where
  mModify f = do
    ref <- asks envBalance
    liftIO $ atomically $ modifyTVar' ref f

instance LogSomethingM AppM where
  mLogSomething = liftIO . sayString

main0 :: IO ()
main0 = do
  ref <- newTVarIO 4
  let env = Env ref
  runAppM env
      (mModify (+ 1))
      (mLogSomething "Increasing account balance"))
  balance <- readTVarIO ref
  sayString $ "Final balance: " ++ show balance

newtype ModifyAppM a = ModifyAppM {unModifyAppM :: State.StateT Int Id.Identity a}
  deriving (Functor, Applicative, Monad, State.MonadState Int)

runModifyAppM :: Int -> ModifyAppM a -> (a, Int)
runModifyAppM s app = Id.runIdentity $ State.runStateT (unModifyAppM app) s

instance ModifyM ModifyAppM where
  mModify = State.modify'

newtype LogAppM a = LogAppM {unLogAppM :: ReaderT (TVar String) IO a}
  deriving (Functor, Applicative, Monad, MonadIO, MonadReader (TVar String))

runLogAppM :: TVar String -> LogAppM a -> IO a
runLogAppM env app = runReaderT (unLogAppM app) env

instance LogSomethingM LogAppM where
  mLogSomething msg = do
    var <- ask
    liftIO $ atomically $ modifyTVar var (++ msg)

main1 :: IO ()
main1 = hspec $ do
  describe "mModify" $ do
    it "works, IO" $ do
      var <- newTVarIO 1
      runAppM (Env var) (mModify (+ 2))
      res <- readTVarIO var
      res `shouldBe` 3
    it "works, pure" $ do
      let (_, res) = runModifyAppM 1 (mModify (+ 2))
      res `shouldBe` 3
  describe "mLogSomething" $
    it "works" $ do
      var <- newTVarIO ""
      runLogAppM var (mLogSomething "Hello" >> mLogSomething "World!")
      res <- readTVarIO var
      res `shouldBe` "HelloWorld!"

main :: IO ()
main = main0 >> main1

The steps for the "real" part of the program were

  1. Introduce an execution type, AppM, with a convenience function for running it, runAppM
  2. Remove the log function from the environment type, envLog in Env
  3. Remove all the HasX classes
  4. Create a new operations typeclass for logging, LogSomethingM
  5. Rename the operations typeclass for modifying the balance to match the naming found in the tagless article a bit better, ModifyM
  6. Implement instances of both operations typeclasses for AppM

For testing the steps were

  1. Define an execution type for each test, ModifyAppM and LogAppM, with some convenience functions for running them, runModifyAppM and runLogAppM
  2. Write instances for the operations typeclasses, one for each

So I think the distinguising features are

  • There's both an environment type, Env, and an execution type AppM that wraps it
  • The environment holds only configuration values (none in this example), and state (envBalance)
  • Typeclasses are used to abstract over operations, LogSomethingM and ModifyM
  • Typeclasses are only implemented for the execution type

This version has slightly more coupling, the execution type specifies the environment to use, and the operations are tied directly to the execution type. However, this coupling doesn't really make a big difference – looking at the pure modify test the amount of code don't differ by much.

A short note (mostly to myself)

I did write it using monad-control first, and then I needed an instance for MonadBaseControl IO. Deriving it automatically requires UndecidableInstances and I didn't really dare turn that on, so I ended up writing the instance. After some help on haskell-cafe it ended up looking like this

instance MonadBaseControl IO AppM where
  type StM AppM a = a
  liftBaseWith f = AppM (liftBaseWith $ \ run -> f (run . unAppM))
  restoreM = return


My theoretical knowledge isn't anywhere near good enough to say anything objectively about the difference in expressiveness of the two design patterns. That means that my conclusion comes down to taste, do you like the readerT patter or tagless final better?

I like the slightly looser coupling I get with the ReaderT pattern. Loose coupling is (almost) always a desirable goal. However, I can see that tying the typeclass instances directly to a concrete execution type results in the intent being communicated a little more clearly. Clearly communicating intent in code is also a desirable goal. In particular I suspect it'll result in more actionable error messages when making changes to the code – the error will tell me that my execution type lacks an instance of a specific typeclass, instead of it telling me that a particular transformer stack does. On the other hand, in the ReaderT pattern that stack is very shallow.

One possibility would be that one pattern is better suited for libraries and the other for applications. I don't think that's the case though as in both cases the library code would be written in a style that results in typeclass constraints on the caller and providing instances for those typeclasses is roughly an equal amount of work for both styles.



Please do point out any mistakes I've made in this, in particular if they stem from me misunderstanding tagless final completely.

A missing piece in my Emacs/Spacemacs setup for Haskell development

With the help of a work mate I've finally found this gem that's been missing from my Spacemacs setup

(with-eval-after-load 'intero
  (flycheck-add-next-checker 'intero '(warning . haskell-hlint))
  (flycheck-add-next-checker 'intero '(warning . haskell-stack-ghc)))

Tagless final and Scotty

For a little while I've been playing around with event sourcing in Haskell using Conduit and Scotty. I've come far enough that the basic functionality I'm after is there together with all those little bits that make it a piece of software that's fit for deployment in production (configuration, logging, etc.). There's just one thing that's been nagging me, testability.

The app is built of two main parts, a web server (Scotty) and a pipeline of stream processing components (Conduit). The part using Scotty is utilising a simple monad stack, ReaderT Config IO, and the Conduit part is using Conduit In Out IO. This means that in both parts the outer edge, the part dealing with the outside world, is running in IO directly. Something that isn't really aiding in testing.

I started out thinking that I'd rewrite what I have using a free monad with a bunch of interpreters. Then I remembered that I have "check out tagless final". This post is a record of the small experiments I did to see how to use it with Scotty to achieve (and actually improve) on the code I have in my production-ready code.

1 - Use tagless final with Scotty

As a first simple little experiment I wrote a tiny little web server that would print a string to stdout when receiving the request to GET /route0.

The printing to stdout is the operation I want to make abstract.

class Monad m => MonadPrinter m where
  mPutStrLn :: Text -> m ()

I then created an application type that is an instance of that class.

newtype AppM a = AppM { unAppM :: IO a }
  deriving (Functor, Applicative, Monad, MonadIO)

instance MonadPrinter AppM where
  mPutStrLn t = liftIO $ putStrLn (unpack t)

Then I added a bit of Scotty boilerplate. It's not strictly necessary, but does make the code a bit nicer to read.

type FooM = ScottyT Text AppM
type FooActionM = ActionT Text AppM

foo :: MonadIO m => Port -> ScottyT Text AppM () -> m ()
foo port = scottyT port unAppM

With that in place the web server itself is just a matter of tying it all together.

main :: IO ()
main = do
  foo 3000 $ do
    get "/route0" $ do
      lift $ mPutStrLn "getting /route0"
      json $ object ["route0" .= ("ok" :: String)]
    notFound $ json $ object ["error" .= ("not found" :: String)]

That was simple enough.

2 - Add configuration

In order to try out how to deal with configuration I added a class for doing some simple logging

class Monad m => MonadLogger m where
  mLog :: Text -> m ()

The straight forward way to deal with configuration is to create a monad stack with ReaderT and since it's logging I want to do the configuration consists of a single LoggerSet (from fast-logger).

newtype AppM a = AppM { unAppM :: ReaderT LoggerSet IO a }
  deriving (Functor, Applicative, Monad, MonadIO, MonadReader LoggerSet)

That means the class instance can be implemented like this

instance MonadLogger AppM where
  mLog msg = do
    ls <- ask
    liftIO $ pushLogStrLn ls $ toLogStr msg

Of course foo has to be changed too, and it becomes a little easier with a wrapper for runReaderT and unAppM.

foo :: MonadIO m => LoggerSet -> Port -> ScottyT Text AppM () -> m ()
foo ls port = scottyT port (`runAppM` ls)

runAppM :: AppM a -> LoggerSet -> IO a
runAppM app ls = runReaderT (unAppM app) ls

With that in place the printing to stdout can be replaced by a writing to the log.

main :: IO ()
main = do
  ls <- newStdoutLoggerSet defaultBufSize
  foo ls 3000 $ do
    get "/route0" $ do
      lift $ mLog "log: getting /route0"
      json $ object ["route0" .= ("ok" :: String)]
    notFound $ json $ object ["error" .= ("not found" :: String)]

Not really a big change, I'd say. Extending the configuration is clearly straight forward too.

3 - Per-request configuration

At work we use correlation IDs1 and I think that the most convenient way to deal with it is to put the correlation ID into the configuration after extracting it. That is, I want to modify the configuration on each request. Luckily it turns out to be possible to do that, despite using ReaderT for holding the configuration.

I can't be bothered with a full implementation of correlation ID for this little experiment, but as long as I can get a new AppM by running a function on the configuration it's just a matter of extracting the correct header from the request. For this experiment it'll do to just modify an integer in the configuration.

I start with defining a type for the configuration and changing AppM.

type Config = (LoggerSet, Int)

newtype AppM a = AppM { unAppM :: ReaderT Config IO a }
  deriving (Functor, Applicative, Monad, MonadIO, MonadReader Config)

The logger instance has to be changed accordingly of course.

instance MonadLogger AppM where
  mLog msg = do
    (ls, i) <- ask
    liftIO $ pushLogStrLn ls $ toLogStr msg <> toLogStr (":" :: String) <> toLogStr (show i)

The get function that comes with scotty isn't going to cut it, since it has no way of modifying the configuration, so I'll need a new one.

mGet :: ScottyError e => RoutePattern -> ActionT e AppM () -> ScottyT e AppM ()
mGet p a = get p $ do
  withCfg (\ (ls, i) -> (ls, succ i)) a

The tricky bit is in the withCfg function. It's indeed not very easy to read, I think

withCfg = mapActionT . withAppM
    mapActionT f (ActionT a) = ActionT $ (mapExceptT . mapReaderT . mapStateT) f a
    withAppM f a = AppM $ withReaderT f (unAppM a)

Basically it reaches into the guts of scotty's ActionT type (the details are exposed in Web.Scotty.Internal.Types, thanks for not hiding it completely), and modifies the ReaderT Config I've supplied.

The new server has two routes, the original one and a new one at GET /route1.

main :: IO ()
main = do
  putStrLn "Starting"
  ls <- newStdoutLoggerSet defaultBufSize
  foo (ls, 0) 3000 $ do
    get "/route0" $ do
      lift $ mLog "log: getting /route0"
      json $ object ["route0" .= ("ok" :: String)]
    mGet "/route1" $ do
      lift $ mLog "log: getting /route1"
      json $ object ["route1" .= ("bar" :: String)]
    notFound $ json $ object ["error" .= ("not found" :: String)]

It's now easy to verify that the original route, GET /route0, logs a string containing the integer '0', while the new route, GET /route1, logs a string containing the integer '1'.



If you don't know what it is you'll find multiple sources by searching for "http correlation-id". A consistent approach to track correlation IDs through microservices is as good a place to start as any.

Zipping streams

Writing the following is easy after glancing through the documentation for conduit:

foo = let src = mapM_ C.yield [0..9 :: Int]
          p0 = (\ i -> ("p0", succ i))
          p1 = CC.filter odd .| (\ i -> ("p1", i))
          p = C.getZipConduit $ C.ZipConduit p0 <* C.ZipConduit p1
          sink = CC.mapM_ print
      in C.runConduit $ src .| p .| sink

Neither pipes nor streaming make it as easy to figure out. I must be missing something! What functions should I be looking at?

Using a configuration in Scotty

At work we're only now getting around to put correlation IDs into use. We write most our code in Clojure but since I'd really like to use more Haskell at work I thought I'd dive into Scotty and see how to deal with logging and then especially how to get correlation IDs into the logs.

The types

For configuration it decided to use the reader monad inside ActionT from Scotty. Enter Chell:

type ChellM c = ScottyT Text (ReaderT c IO)
type ChellActionM c = ActionT Text (ReaderT c IO)

In order to run it I wrote a function corresponding to scotty:

chell :: c -> Port -> ChellM () -> IO ()
chell cfg port a = scottyOptsT opts (flip runReaderT cfg) a
    opts = def { verbose = 0
               , settings = (settings def) { settingsPort = port }

Correlation ID

To deal with the correlation ID each incoming request should be checked for the HTTP header X-Correlation-Id and if present it should be used during logging. If no such header is present then a new correlation ID should be created. Since it's per request it feels natural to create a WAI middleware for this.

The easiest way I could come up with was to push the correlation ID into the request's headers before it's passed on:

requestHeaderCorrelationId :: Request -> Maybe ByteString
requestHeaderCorrelationId = lookup "X-Correlation-Id" . requestHeaders

correlationId ::  Middleware
correlationId app req sendResponse = do
  u <- (randomIO :: IO UUID)
  let corrId = maybe (toASCIIBytes u) id (requestHeaderCorrelationId req)
      newHeaders = ("X-Correlation-Id", corrId) : (requestHeaders req)
  app (req { requestHeaders = newHeaders }) $ \ res -> sendResponse res

It also turns out to be useful to have both a default correlation ID and a function for pulling it out of the headers:

defaultCorrelationString :: ByteString
defaultCorrelationString = "no-correlation-id"

getCorrelationId :: Request -> ByteString
getCorrelationId r = maybe defaultCorrelationString id (requestHeaderCorrelationId r)

Getting the correlation ID into the configuration

Since the correlation ID should be picked out of the request on handling of every request it's useful to have it the configuration when running the ChellActionM actions. However, since the correlation ID isn't available when running the reader (the call to runReaderT in chell) something else is called for. When looking around I found local (and later I was pointed to the more general withReaderT) but it doesn't have a suitable type. After some help on Twitter I arrived at withConfig which allows me to run an action in a modified configuration:

withConfig :: (c -> c') -> ChellActionM c' () -> ChellActionM c ()
withConfig = mapActionT . withReaderT
    mapActionT f (ActionT a) = ActionT $ (mapExceptT . mapReaderT . mapStateT) f a

Making it handy to use

Armed with this I can put together some functions to replace Scotty's get, post, etc. With a configuration type like this:

data Config = Cfg LoggerSet ByteString

The modified get looks like this (Scotty's original is S.get)

get :: RoutePattern -> ChellActionM Config () -> ChellM Config ()
get p a = S.get p $ do
  r <- request
  let corrId = getCorrelationId r
  withConfig (\ (Cfg l _) -> Cfg l corrId) a

With this in place I can use the simpler ReaderT Config IO for inner functions that need to log.

QuickCheck on a REST API

Since I'm working with web stuff nowadays I thought I'd play a little with translating my old post on using QuickCheck to test C APIs to the web.

The goal and how to reach it

I want to use QuickCheck to test a REST API, just like in the case of the C API the idea is to

  1. generate a sequence of API calls (a program), then
  2. run the sequence against a model, as well as
  3. run the sequence against the web service, and finally
  4. compare the resulting model against reality.


I'll use a small web service I'm working on, and then concentrate on only a small part of the API to begin with.

The parts of the API I'll use for the programs at this stage are

Method Route Example in Example out
POST /users {"userId": 0, "userName": "Yogi Berra"} {"userId": 42, "userName": "Yogi Berra"}
DELETE /users/:id    

The following API calls will also be used, but not in the programs

Method Route Example in Example out
GET /users   [0,3,7]
GET /users/:id   {"userId": 42, "userName": "Yogi Berra"}
POST /reset    

Representing API calls

Given the information about the API above it seems the following is enough to represent the two calls of interest together with a constructor representing the end of a program

data ApiCall = AddUser Text
             | DeleteUser Int
             | EndProgram
             deriving (Show)

and a program is just a sequence of calls, so list of ApiCall will do. However, since I want to generate sequences of calls, i.e. implement Arbitrary, I'll wrap it in a newtype

newtype Program = Prog [ApiCall]

Running against a model (simulation)

First of all I need to decide what model to use. Based on the part of the API I'm using I'll use an ordinary dictionary of Int and Text

type Model = M.Map Int Text

Simulating execution of a program is simulating each call against a model that's updated with each step. I expect the final model to correspond to the state of the real service after the program is run for real. The simulation begins with an empty dictionary.

simulateProgram :: Program -> Model
simulateProgram (Prog cs) = foldl simulateCall M.empty cs

The simulation of the API calls must then be a function taking a model and a call, returning an updated model

simulateCall :: Model -> ApiCall -> Model
simulateCall m (AddUser t) = M.insert k t m
    k = succ $ foldl max 0 (M.keys m)
simulateCall m (DeleteUser k) = M.delete k m
simulateCall m EndProgram = m

Here I have to make a few assumptions. First, I assume the indeces for the users start on 1. Second, that the next index used always is the successor of highest currently used index. We'll see how well this holds up to reality later on.

Running against the web service

Running the program against the actual web service follows the same pattern, but here I'm dealing with the real world, so it's a little more messy, i.e. IO is involved. First the running of a single call

runCall :: Manager -> ApiCall -> IO ()
runCall mgr (AddUser t) = do
  ireq <- parseRequest "POST http://localhost:3000/users"
  let req = ireq { requestBody = RequestBodyLBS (encode $ User 0 t)}
  resp <- httpLbs req mgr
  guard (status201 == responseStatus resp)

runCall mgr (DeleteUser k) = do
  req <- parseRequest $ "DELETE http://localhost:3000/users/" ++ show k
  resp <- httpNoBody req mgr
  guard (status200 == responseStatus resp)

runCall _ EndProgram = return ()

The running of a program is slightly more involved. Of course I have to set up the Manager needed for the HTTP calls, but I also need to

  1. ensure that the web service is in a well-known state before starting, and
  2. extract the state of the web service after running the program, so I can compare it to the model
runProgram :: Program -> IO Model
runProgram (Prog cs) = do
  mgr <- newManager defaultManagerSettings
  resetReq <- parseRequest "POST http://localhost:3000/reset"
  httpNoBody resetReq mgr
  mapM_ (runCall mgr) cs
  model <- extractModel mgr
  return model

The call to POST /reset resets the web service. I would have liked to simply restart the service completely, but I failed in automating it. I think I'll have to take a closer look at the implementation of scotty to find a way.

Extracting the web service state and packaging it in a Model is a matter of calling GET /users and then repeatedly calling GET /users/:id with each id gotten from the first call

extractModel :: Manager -> IO Model
extractModel mgr = do
  req <- parseRequest "http://localhost:3000/users"
  resp <- httpLbs req mgr
  let (Just ids) = decode (responseBody resp) :: Maybe [Int]
  users <- forM ids $ \ id -> do
    req <- parseRequest $ "http://localhost:3000/users/" ++ show id
    resp <- httpLbs req mgr
    let (Just (user:_)) = decode (responseBody resp) :: Maybe [User]
    return user
  return $ foldl (\ map (User id name) -> M.insert id name map) M.empty users

Generating programs

My approach to generating a program is based on the idea that given a certain state there is only a limited number of possible calls that make sense. Given a model m it makes sense to make one of the following calls:

  • add a new user
  • delete an existing user
  • end the program

Based on this writing genProgram is rather straight forward

genProgram :: Gen Program
genProgram = Prog <$> go M.empty
    possibleAddUser _ = [AddUser <$> arbitrary]
    possibleDeleteUser m = map (return . DeleteUser) (M.keys m)
    possibleEndProgram _ = [return EndProgram]

    go m = do
      let possibles = possibleDeleteUser m ++ possibleAddUser m ++ possibleEndProgram m
      s <- oneof possibles
      let m' = simulateCall m s
      case s of
        EndProgram -> return []
        _ -> (s:) <$> go m'

Armed with that the Arbitrary instance for Program can be implemented as1

instance Arbitrary Program where
  arbitrary = genProgram
  shrink p = []

The property of an API

The steps in the first section can be used as a recipe for writing the property

prop_progCorrectness :: Program -> Property
prop_progCorrectness program = monadicIO $ do
  let simulatedModel = simulateProgram program
  runModel <- run $ runProgram program
  assert $ simulatedModel == runModel

What next?

There are some improvements that I'd like to make:

  • Make the generation of Program better in the sense that the programs become longer. I think this is important as I start tackling larger APIs.
  • Write an implementation of shrink for Program. With longer programs it's of course more important to actually implement shrink.

I'd love to hear if others are using QuickCheck to test REST APIs in some way, if anyone has suggestions for improvements, and of course ideas for how to implement shrink in a nice way.



Yes, I completely skip the issue of shrinking programs at this point. This is OK at this point though, because the generated =Programs=s do end up to be very short indeed.

Using QuickCheck to test C APIs

Last year at ICFP I attended the tutorial on QuickCheck with John Hughes. We got to use the Erlang implementation of QuickCheck to test a C API. Ever since I've been planning to do the same thing using Haskell. I've put it off for the better part of a year now, but then Francesco Mazzoli wrote about inline-c (Call C functions from Haskell without bindings and I found the motivation to actually start writing some code.

The general idea

Many C APIs are rather stateful beasts so to test it I

  1. generate a sequence of API calls (a program of sorts),
  2. run the sequence against a model,
  3. run the sequence against the real implementation, and
  4. compare the model against the real state each step of the way.


To begin with I hacked up a simple implementation of a stack in C. The "specification" is

 * Create a stack.
void *create();

 * Push a value onto an existing stack.
void push (void *, int);

 * Pop a value off an existing stack.
int pop(void *);

Using inline-c to create bindings for it is amazingly simple:

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

module CApi

import qualified Language.C.Inline as C
import Foreign.Ptr

C.include "stack.h"

create :: IO (Ptr ())
create = [C.exp| void * { create() } |]

push :: Ptr () -> C.CInt -> IO ()
push s i = [C.exp| void { push($(void *s), $(int i)) } |]

pop :: Ptr () -> IO C.CInt
pop s = [C.exp| int { pop($(void *s)) } |]

In the code below I import this module qualified.

Representing a program

To represent a sequence of calls I first used a custom type, but later realised that there really was no reason at all to not use a wrapped list:

newtype Program a = P [a]
    deriving (Eq, Foldable, Functor, Show, Traversable)

Then each of the C API functions can be represented with

data Statement = Create | Push Int | Pop
    deriving (Eq, Show)

Arbitrary for Statement

My implementation of Arbitrary for Statement is very simple:

instance Arbitrary Statement where
    arbitrary = oneof [return Create, return Pop, liftM Push arbitrary]
    shrink (Push i) = Push <$> shrink i
    shrink _ = []

That is, arbitrary just returns one of the constructors of Statement, and shrinking only returns anything for the one constructor that takes an argument, Push.

Prerequisites of Arbitrary for Program Statement

I want to ensure that all Program Statement are valid, which means I need to define the model for running the program and functions for checking the precondition of a statement as well as for updating the model (i.e. for running the Statement).

Based on the C API above it seems necessary to track creation, the contents of the stack, and even if it isn't explicitly mentioned it's probably a good idea to track the popped value. Using record (Record is imported as R, and Record.Lens as RL) I defined it like this:

type ModelContext = [R.r| { created :: Bool, pop :: Maybe Int, stack :: [Int] } |]

Based on the rather informal specification I coded the pre-conditions for the three statements as

preCond :: ModelContext -> Statement -> Bool
preCond ctx Create = not $ RL.view [R.l| created |] ctx
preCond ctx (Push _) = RL.view [R.l| created |] ctx
preCond ctx Pop = RL.view [R.l| created |] ctx

That is

  • Create requires that the stack hasn't been created already.
  • Push i requires that the stack has been created.
  • Pop also requires that the stack has been created.

Furthermore the "specification" suggests the following definition of a function for running a statement:

modelRunStatement :: ModelContext -> Statement -> ModelContext
modelRunStatement ctx Create = RL.set [R.l| created |] True ctx
modelRunStatement ctx (Push i) = RL.over [R.l| stack |] (i :) ctx
modelRunStatement ctx Pop = [R.r| { created = c, pop = headMay s, stack = tail s } |]
        c = RL.view [R.l| created |] ctx
        s = RL.view [R.l| stack |] ctx

(This definition assumes that the model satisfies the pre-conditions, as can be seen in the use of tail.)

Arbitrary for Program Statement

With this in place I can define Arbitrary for Program Statement as follows.

instance Arbitrary (Program Statement) where
    arbitrary = liftM P $ ar baseModelCtx
            ar m = do
                push <- liftM Push arbitrary
                let possible = filter (preCond m) [Create, Pop, push]
                if null possible
                    then return []
                    else do
                        s <- oneof (map return possible)
                        let m' = modelRunStatement m s
                        frequency [(499, liftM2 (:) (return s) (ar m')), (1, return [])]

The idea is to, in each step, choose a valid statement given the provided model and cons it with the result of a recursive call with an updated model. The constant 499 is just an arbitrary one I chose after running arbitrary a few times to see how long the generated programs were.

For shrinking I take advantage of the already existing implementation for lists:

shrink (P p) = filter allowed $ map P (shrink p)
        allowed = and . snd . mapAccumL go baseModelCtx
                go ctx s = (modelRunStatement ctx s, preCond ctx s)

Some thoughts so far

I would love making an implementation of Arbitrary s, where s is something that implements a type class that contains preCond, modelRunStatement and anything else needed. I made an attempt using something like

class S a where
    type Ctx a :: *

    baseCtx :: Ctx a
    preCond :: Ctx a -> a -> Bool

However, when trying to use baseCtx in an implementation of arbitrary I ran into the issue of injectivity. I'm still not entirely sure what that means, or if there is something I can do to work around it. Hopefully someone reading this can offer a solution.

Running the C code

When running the sequence of Statement against the C code I catch the results in

type RealContext = [r| { o :: Ptr (), pop :: Maybe Int } |]

Actually running a statement and capturing the output in a RealContext is easily done using inline-c and record:

realRunStatement :: RealContext -> Statement -> IO RealContext
realRunStatement ctx Create = CApi.create >>= \ ptr -> return $ RL.set [R.l| o |] ptr ctx
realRunStatement ctx (Push i) = CApi.push o (toEnum i) >> return ctx
        o = RL.view [R.l| o |] ctx
realRunStatement ctx Pop = CApi.pop o >>= \ v -> return $ RL.set [R.l| pop |] (Just (fromEnum v)) ctx
        o = RL.view [R.l| o |] ctx

Comparing states

Comparing a ModelContext and a RealContext is easily done:

compCtx :: ModelContext -> RealContext -> Bool
compCtx mc rc = mcC == rcC && mcP == rcP
        mcC = RL.view [R.l| created |] mc
        rcC = RL.view [R.l| o |] rc /= nullPtr
        mcP = RL.view [R.l| pop|] mc
        rcP = RL.view [R.l| pop|] rc

Verifying a Program Statement

With all that in place I can finally write a function for checking the validity of a program:

validProgram :: Program Statement -> IO Bool
validProgram p = and <$> snd <$> mapAccumM go (baseModelCtx, baseRealContext) p
        runSingleStatement mc rc s = realRunStatement rc s >>= \ rc' -> return (modelRunStatement mc s, rc')

        go (mc, rc) s = do
            ctxs@(mc', rc') <- runSingleStatement mc rc s
            return (ctxs, compCtx mc' rc')

(This uses mapAccumM from an earlier post of mine.)

The property, finally!

To wrap this all up I then define the property

prop_program :: Program Statement -> Property
prop_program p = monadicIO $ run (validProgram p) >>= assert

and a main function

main :: IO ()
main = quickCheck prop_program

Edit 2015-07-17: Adjusted the description of the pre-conditions to match the code.

mapAccum in monad

I recently had two functions of very similar shape, only difference was that one was pure and the other need some I/O. The former was easily written using mapAccumL. I failed to find a function like mapAccumL that runs in a monad, so I wrote up the following:

mapAccumM :: (Monad m, Traversable t) => (a -> b -> m (a, c)) -> a -> t b -> m (a, t c)
mapAccumM f a l = swap <$> runStateT (mapM go l) a
        go i = do
            s <- get
            (s', r) <- lift $ f s i
            put s'
            return r

Bring on the comments/suggestions/improvements/etc!