Choosing a conduit randomly

Lately I’ve been playing around conduit. One thing I wanted to try out was to set up processing where one processing step was chosen on random from a number of components, based on weights. In short I guess I wanted a function with a type something like this

I have to admit I don’t even know where to start writing such a function1 but after a little bit of thinking I realised I could get the same effect by controlling how chunks of data is routed. That is, instead of choosing a component randomly, I can choose a route randomly. It would look something like when choosing from three components

                        +---------+   +----------+   +-------------+
                        | Filter  |   | Drop tag |   | Component A |
                    +-->| Value-0 |-->|          |-->|             |--+
                    |   +---------+   +----------+   +-------------+  |
+----------------+  |   +---------+   +----------+   +-------------+  |
| Choose random  |  |   | Filter  |   | Drop tag |   | Component B |  |
| value based on +----->| Value-1 |-->|          |-->|             |----->
| weights        |  |   +---------+   +----------+   +-------------+  |
+----------------+  |   +---------+   +----------+   +-------------+  |
                    |   | Filter  |   | Drop tag |   | Component C |  |
                    +-->| Value-2 |-->|          |-->|             |--+
                        +---------+   +----------+   +-------------+


That is

  1. For each chunk that comes in, choose a value randomly based on weights and tag the chunk with the choosen value, then
  2. split the processing into one route for each component,
  3. in each route filter out chunks tagged with a single value, and
  4. remove the tag, then
  5. pass the chunk to the component, and finally
  6. bring the routes back together again.

Out of these steps all but the very first one are already available in conduit:

What’s left is the beginning. I started with a function to pick a value on random based on weights2

Using that I then made a component that tags chunks

I was rather happy with this…


  1. Except maybe by using Template Haskell to generate the code I did come up with.

  2. I used Quickcheck’s frequency as inspiration for writing it.

Using stack to get around upstream bugs

Recently I bumped into a bug in amazonka.1 I can’t really sit around waiting for Amazon to fix it, and then for amazonka to use the fixed documentation to generate the code and make another release.

Luckily slack contains features that make it fairly simple to work around this bug until it’s properly fixed. Here’s how.

  1. Put the upstream code in a git repository of your own. In my case I simply forked the amazonka repository on github (my fork is here).
  2. Fix the bug and commit the change. My change to amazonka-codepipeline was simply to remove the missing fields – it was easier than trying to make them optional (i.e. wrapping them in Maybes).
  3. Tell slack to use the code from your modified git repository. In my case I added the following to my slack.yaml:

     extra-deps:
       - github: magthe/amazonka
         commit: 1543b65e3a8b692aa9038ada68aaed9967752983
         subdirs:
           - amazonka-codepipeline

That’s it!


  1. The guilty party is Amazon, not amazonka, though I was a little surprised that there doesn’t seem to be any established way to modify the Amazon API documentation before it’s used to autogenerate the Haskell code.

The ReaderT design pattern or tagless final?

The other week I read V. Kevroletin’s Introduction to Tagless Final and realised that a couple of my projects, both at work and at home, would benefit from a refactoring to that approach. All in all I was happy with the changes I made, even though I haven’t made use of all the way. In particular there I could further improve the tests in a few places by adding more typeclasses. For now it’s good enough and I’ve clearly gotten some value out of it.

I found mr. Kevroletin’s article to be a good introduction so I’ve been passing it on when people on the Functional programming slack bring up questions about how to organize their code as applications grow. In particular if they mention that they’re using monad transformers. I did exactly that just the other day _@solomon_ wrote

so i’ve created a rats nest of IO where almost all the functions in my program are in ReaderT Env IO () and I’m not sure how to purify everything and move the IO to the edge of the program

I proposed tagless final and passed the URL on, and then I got a pointer to the article The ReaderT Design Patter which I hadn’t seen before.

The two approches are similar, at least to me, and I can’t really judge if one’s better than the other. Just to get a feel for it I thought I’d try to rewrite the example in the ReaderT article in a tagless final style.

A slightly changed example of ReaderT design pattern

I decided to make a few changes to the example in the article:

  • I removed the modify function, instead the code uses the typeclass function modifyBalance directly.
  • I separated the instances needed for the tests spatially in the code just to make it easier to see what’s “production” code and what’s test code.
  • I combined the main functions from the various examples to that both an example (main0) and the test (main1) are run.
  • I switched from Control.Concurrent.Async.Lifted.Safe (from monad-control) to UnliftIO.Async (from unliftio)

After that the code looks like this

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}

import           Control.Concurrent.STM
import           Control.Monad.Reader
import qualified Control.Monad.State.Strict as State
import           Say
import           Test.Hspec
import           UnliftIO.Async

data Env = Env
  { envLog :: !(String -> IO ())
  , envBalance :: !(TVar Int)
  }

class HasLog a where
  getLog :: a -> (String -> IO ())

instance HasLog Env where
  getLog = envLog

class HasBalance a where
  getBalance :: a -> TVar Int

instance HasBalance Env where
  getBalance = envBalance

class Monad m => MonadBalance m where
  modifyBalance :: (Int -> Int) -> m ()

instance (HasBalance env, MonadIO m) => MonadBalance (ReaderT env m) where
  modifyBalance f = do
    env <- ask
    liftIO $ atomically $ modifyTVar' (getBalance env) f

logSomething :: (MonadReader env m, HasLog env, MonadIO m) => String -> m ()
logSomething msg = do
  env <- ask
  liftIO $ getLog env msg

main0 :: IO ()
main0 = do
  ref <- newTVarIO 4
  let env = Env { envLog = sayString , envBalance = ref }
  runReaderT
    (concurrently_
      (modifyBalance (+ 1))
      (logSomething "Increasing account balance"))
    env
  balance <- readTVarIO ref
  sayString $ "Final balance: " ++ show balance

instance HasLog (String -> IO ()) where
  getLog = id

instance HasBalance (TVar Int) where
  getBalance = id

instance Monad m => MonadBalance (State.StateT Int m) where
  modifyBalance = State.modify

main1 :: IO ()
main1 = hspec $ do
  describe "modify" $ do
    it "works, IO" $ do
      var <- newTVarIO (1 :: Int)
      runReaderT (modifyBalance (+ 2)) var
      res <- readTVarIO var
      res `shouldBe` 3
    it "works, pure" $ do
      let res = State.execState (modifyBalance (+ 2)) (1 :: Int)
      res `shouldBe` 3
  describe "logSomething" $
    it "works" $ do
      var <- newTVarIO ""
      let logFunc msg = atomically $ modifyTVar var (++ msg)
          msg1 = "Hello "
          msg2 = "World\n"
      runReaderT (logSomething msg1 >> logSomething msg2) logFunc
      res <- readTVarIO var
      res `shouldBe` (msg1 ++ msg2)

main :: IO ()
main = main0 >> main1

I think the distinguising features are

  • The application environmant, Env will contain configuraiton values (not in this example), state, envBalance, and functions we might want to vary, envLog
  • There is no explicit type representing the execution context
  • Typeclasses are used to abstract over application environment, HasLog and HasBalance
  • Typeclasses are used to abstract over operations, MonadBalance
  • Typeclasses are implemented for both the application environment, HasLog and HasBalance, and the execution context, MonadBalance

In the end this makes for code with very loose couplings; there’s not really any single concrete type that implements all the constraints to work in the “real” main function (main0). I could of course introduce a type synonym for it

but it brings no value – it wouldn’t be used explicitly anywhere.

A tagless final version

In order to compare the ReaderT design pattern to tagless final (as I understand it) I made an attempt to translate the code above. The code below is the result.1

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}

import           Control.Concurrent.STM
import qualified Control.Monad.Identity as Id
import           Control.Monad.Reader
import qualified Control.Monad.State.Strict as State
import           Say
import           Test.Hspec
import           UnliftIO (MonadUnliftIO)
import           UnliftIO.Async

newtype Env = Env {envBalance :: TVar Int}

newtype AppM a = AppM {unAppM :: ReaderT Env IO a}
  deriving (Functor, Applicative, Monad, MonadIO, MonadReader Env, MonadUnliftIO)

runAppM :: Env -> AppM a -> IO a
runAppM env app = runReaderT (unAppM app) env

class Monad m => ModifyM m where
  mModify :: (Int -> Int) -> m ()

class Monad m => LogSomethingM m where
  mLogSomething :: String -> m()

instance ModifyM AppM where
  mModify f = do
    ref <- asks envBalance
    liftIO $ atomically $ modifyTVar' ref f

instance LogSomethingM AppM where
  mLogSomething = liftIO . sayString

main0 :: IO ()
main0 = do
  ref <- newTVarIO 4
  let env = Env ref
  runAppM env
    (concurrently_
      (mModify (+ 1))
      (mLogSomething "Increasing account balance"))
  balance <- readTVarIO ref
  sayString $ "Final balance: " ++ show balance

newtype ModifyAppM a = ModifyAppM {unModifyAppM :: State.StateT Int Id.Identity a}
  deriving (Functor, Applicative, Monad, State.MonadState Int)

runModifyAppM :: Int -> ModifyAppM a -> (a, Int)
runModifyAppM s app = Id.runIdentity $ State.runStateT (unModifyAppM app) s

instance ModifyM ModifyAppM where
  mModify = State.modify'

newtype LogAppM a = LogAppM {unLogAppM :: ReaderT (TVar String) IO a}
  deriving (Functor, Applicative, Monad, MonadIO, MonadReader (TVar String))

runLogAppM :: TVar String -> LogAppM a -> IO a
runLogAppM env app = runReaderT (unLogAppM app) env

instance LogSomethingM LogAppM where
  mLogSomething msg = do
    var <- ask
    liftIO $ atomically $ modifyTVar var (++ msg)

main1 :: IO ()
main1 = hspec $ do
  describe "mModify" $ do
    it "works, IO" $ do
      var <- newTVarIO 1
      runAppM (Env var) (mModify (+ 2))
      res <- readTVarIO var
      res `shouldBe` 3
    it "works, pure" $ do
      let (_, res) = runModifyAppM 1 (mModify (+ 2))
      res `shouldBe` 3
  describe "mLogSomething" $
    it "works" $ do
      var <- newTVarIO ""
      runLogAppM var (mLogSomething "Hello" >> mLogSomething "World!")
      res <- readTVarIO var
      res `shouldBe` "HelloWorld!"

main :: IO ()
main = main0 >> main1

The steps for the “real” part of the program were

  1. Introduce an execution type, AppM, with a convenience function for running it, runAppM
  2. Remove the log function from the environment type, envLog in Env
  3. Remove all the HasX classes
  4. Create a new operations typeclass for logging, LogSomethingM
  5. Rename the operations typeclass for modifying the balance to match the naming found in the tagless article a bit better, ModifyM
  6. Implement instances of both operations typeclasses for AppM

For testing the steps were

  1. Define an execution type for each test, ModifyAppM and LogAppM, with some convenience functions for running them, runModifyAppM and runLogAppM
  2. Write instances for the operations typeclasses, one for each

So I think the distinguising features are

  • There’s both an environment type, Env, and an execution type AppM that wraps it
  • The environment holds only configuration values (none in this example), and state (envBalance)
  • Typeclasses are used to abstract over operations, LogSomethingM and ModifyM
  • Typeclasses are only implemented for the execution type

This version has slightly more coupling, the execution type specifies the environment to use, and the operations are tied directly to the execution type. However, this coupling doesn’t really make a big difference – looking at the pure modify test the amount of code don’t differ by much.

A short note (mostly to myself)

I did write it using monad-control first, and then I needed an instance for MonadBaseControl IO. Deriving it automatically requires UndecidableInstances and I didn’t really dare turn that on, so I ended up writing the instance. After some help on haskell-cafe it ended up looking like this

Conclusion

My theoretical knowledge isn’t anywhere near good enough to say anything objectively about the difference in expressiveness of the two design patterns. That means that my conclusion comes down to taste, do you like the readerT patter or tagless final better?

I like the slightly looser coupling I get with the ReaderT pattern. Loose coupling is (almost) always a desirable goal. However, I can see that tying the typeclass instances directly to a concrete execution type results in the intent being communicated a little more clearly. Clearly communicating intent in code is also a desirable goal. In particular I suspect it’ll result in more actionable error messages when making changes to the code – the error will tell me that my execution type lacks an instance of a specific typeclass, instead of it telling me that a particular transformer stack does. On the other hand, in the ReaderT pattern that stack is very shallow.

One possibility would be that one pattern is better suited for libraries and the other for applications. I don’t think that’s the case though as in both cases the library code would be written in a style that results in typeclass constraints on the caller and providing instances for those typeclasses is roughly an equal amount of work for both styles.


  1. Please do point out any mistakes I’ve made in this, in particular if they stem from me misunderstanding tagless final completely.

A missing piece in my Emacs/Spacemacs setup for Haskell development

With the help of a work mate I’ve finally found this gem that’s been missing from my Spacemacs setup

(with-eval-after-load 'intero
  (flycheck-add-next-checker 'intero '(warning . haskell-hlint))
  (flycheck-add-next-checker 'intero '(warning . haskell-stack-ghc)))

Tagless final and Scotty

For a little while I’ve been playing around with event sourcing in Haskell using Conduit and Scotty. I’ve come far enough that the basic functionality I’m after is there together with all those little bits that make it a piece of software that’s fit for deployment in production (configuration, logging, etc.). There’s just one thing that’s been nagging me, testability.

The app is built of two main parts, a web server (Scotty) and a pipeline of stream processing components (Conduit). The part using Scotty is utilising a simple monad stack, ReaderT Config IO, and the Conduit part is using Conduit In Out IO. This means that in both parts the outer edge, the part dealing with the outside world, is running in IO directly. Something that isn’t really aiding in testing.

I started out thinking that I’d rewrite what I have using a free monad with a bunch of interpreters. Then I remembered that I have “check out tagless final”. This post is a record of the small experiments I did to see how to use it with Scotty to achieve (and actually improve) on the code I have in my production-ready code.

1 - Use tagless final with Scotty

As a first simple little experiment I wrote a tiny little web server that would print a string to stdout when receiving the request to GET /route0.

The printing to stdout is the operation I want to make abstract.

I then created an application type that is an instance of that class.

Then I added a bit of Scotty boilerplate. It’s not strictly necessary, but does make the code a bit nicer to read.

With that in place the web server itself is just a matter of tying it all together.

That was simple enough.

2 - Add configuration

In order to try out how to deal with configuration I added a class for doing some simple logging

The straight forward way to deal with configuration is to create a monad stack with ReaderT and since it’s logging I want to do the configuration consists of a single LoggerSet (from fast-logger).

That means the class instance can be implemented like this

Of course foo has to be changed too, and it becomes a little easier with a wrapper for runReaderT and unAppM.

With that in place the printing to stdout can be replaced by a writing to the log.

Not really a big change, I’d say. Extending the configuration is clearly straight forward too.

3 - Per-request configuration

At work we use correlation IDs1 and I think that the most convenient way to deal with it is to put the correlation ID into the configuration after extracting it. That is, I want to modify the configuration on each request. Luckily it turns out to be possible to do that, despite using ReaderT for holding the configuration.

I can’t be bothered with a full implementation of correlation ID for this little experiment, but as long as I can get a new AppM by running a function on the configuration it’s just a matter of extracting the correct header from the request. For this experiment it’ll do to just modify an integer in the configuration.

I start with defining a type for the configuration and changing AppM.

The logger instance has to be changed accordingly of course.

The get function that comes with scotty isn’t going to cut it, since it has no way of modifying the configuration, so I’ll need a new one.

The tricky bit is in the withCfg function. It’s indeed not very easy to read, I think

Basically it reaches into the guts of scotty’s ActionT type (the details are exposed in Web.Scotty.Internal.Types, thanks for not hiding it completely), and modifies the ReaderT Config I’ve supplied.

The new server has two routes, the original one and a new one at GET /route1.

It’s now easy to verify that the original route, GET /route0, logs a string containing the integer ‘0’, while the new route, GET /route1, logs a string containing the integer ‘1’.


  1. If you don’t know what it is you’ll find multiple sources by searching for “http correlation-id”. A consistent approach to track correlation IDs through microservices is as good a place to start as any.

Systemd for auto deploy from AWS

Over the last week I’ve completely rebuilt the the only on-premise system we have at work. One of the bits I’m rather happy with is the replacement of M/Monit with just plain systemd. We didn’t use any of the advanced features of M/Monit, we only wanted ordinary process monitoring and restart on a single system. It really was a bit of overkill.

So, instead of

  • using M/Monit to monitor processes
  • a monit configuration for the app (both start and stop instructions)
  • a script to start the app and capture its PID in a file
  • a script to resync the app against the S3 bucket where Travis puts the build artifacts, and if a new version has arrived, remove the PID file thereby triggering M/Monit to restart the app
  • a crontab entry to run the sync/restart script every few minutes

we now have

  • a (simplified) script to start the app1
  • a service unit (app.service) for the app
  • a timer unit (app-sync.timer) to trigger the resync of the app against the S3 bucket
  • a oneshot service unit (app-sync.service), triggered by the timer, to perform the sync of the app with the latest build, i.e. call aws s3 sync
  • a path unit (app-restart.path) to monitor one of the build artifacts, i.e. to pick up that a new version has arrived
  • a oneshot service unit (app-restart.service), triggered by the path unit, calling systemctl restart app.service

There’s one more piece to the setup, but

  • the start script is simplified since it no longer needs to push things to the background and capture the PID
  • the sync/restart script is completely gone (arguably the more complicated of the two scripts in the M/Monit setup)
  • responsibility is cleanly separated leading to a solution that’s easier to understand (as long as you know a bit about systemd of course)

so I think it’s a net improvement.


  1. All it does is set a bunch of environment variables and then start the app, so I’m planning on moving the environment variables into a separate file and put the start command the service unit file instead.

Components feel so not FP

At work we’re writing most of the new stuff using Clojure. That’s been going on since before I started working there, and from the beginning there’s been exploration of style and libraries, yes, even of frameworks (ugh!). Now there’s discussion of standardising. The discussion is still in its infancy, but it prompted me to start thinking about what I’m come across in the code base we have now, what I like and what I dislike. The first thing that came to mind was how our different services are set up. I mean set up internally. Like, what’s actually in the main function (or in -main, since we’re talking Clojure). In many services we use Stuart Sierra’s component, in a growing number of services we use integrant, and in one we use mount.

The current discussion is going in the direction of integrant, and I don’t like it!

AFAICS integrant suffers from all the same things as component (the author of mount has put it into words better than I ever could in his text on mount‘s differences from component’) plus one more thing to boot: systems are “configured into being”. It’s touted as “data-driven architecture”, I tend to see it as architecture defined in a language separate from my functions. The integrant README says that one of its strengths over component is that

In Integrant, systems are created from a configuration data structure, typically loaded from an edn resource. The architecture of the application is defined through data, rather than code.

Somehow that statement makes me think of OO and DI frameworks. I suspect the above paragraph is just as true with the two replacements s/Integrant/Spring/ and s/edn/XML/. I’m not convinced this is a strength at all! My experience with DI frameworks in OO is limited (I’ve never used external configuration), but the enduring impression is that it’s unwieldy. In particular it was very far between cause and effect of errors. So far this is true for integrant as well.

Also, it makes me think of “functional in the small, OO in the large”1, which is a comment coming out of the f-sharp world. Maybe there’s a connection here. Maybe “OO in the large” is something that resonates with OO-turned-FP developers. Maybe that means it’s only a quesiton of time (and exposure to FP) before they embrace “functional all the way”? Or, maybe I’m simply missing something crucial.

In any case I’m going to have to take a closer look at mount in the near future. I’ll also have to take a look at its brother yurt, and at its distant cousin mount-lite.


  1. https://www.johndcook.com/blog/2009/03/23/functional-in-the-small-oo-in-the-large/

Is this a good way to do JSON validation?

At work, where we use Clojure, we’ve been improving our error messages in the public API to

  1. return as many errors as possible in a response, and
  2. be in humanly readable English.

If one adopts spec as we have one gets the former for free, but the output of spec can hardly be called humanly readable. For the latter part we chose to use phrase.

Given that I’d like to see Haskell used more at work (currently there are 2 minor services written in Haskell and around a score in Clojure) I thought I’d take a look at JSON validation in Haskell. I ended up beig less than impressed. We have at least one great library for parsing JSON, aeson, but there are probably a few more that I haven’t noticed. It’s of course possible to mix in validation with the parsing, but since parsers, and this is true for aeson’s parser too, tend to be monads and that means that item 1 above, finding as many errors as possible, isn’t on the table.

A quick look at Hackage gave that

  • there is a package called aeson-better-errors that looked promising but didn’t fi my needs (I explain at the end why it isn’t passing muster)
  • the support for JSON Schema is very lacking in Haskell, hjsonschema is deprecated and aeson-schema only supports version 3 of the draft (the current version is 7) and the authors claim that that hjsonschema is more moderna and more actively maintained

So, a bit disappointed I started playing with the problem myself and found that, just as is stated in the description of the validation library, I want something that’s isomorphic to Either but accumulates on the error side. That is, something like

I decided it was all right to limit validation to proper JSON expressions, i.e. a validator could have the type Value -> JSONValidationResult. I want to combine validators so I decided to wrap it in a newtype and write a SemiGroup instance for it as well:

The function to actually run the validation is rather straight forward

After writing a few validators I realised a few patterns emerged and the following functions simplified things a bit:

With this in place I started writing validators for the basic JSON types:

The number type in JSON is a float (well, in aeson it’s a Scientific), so to check for an integer a bit more than the above is needed

as well as functions that check for the presence of a specific key

With this in place I can now create a validator for a person with a name and an age:

and run it on a Value:

and all failures are picked up

Reflections

  1. I quickly realised I wanted slightly more complex validation of course, so all the validators for basic JSON types above have a version taking a custom validator of type a -> JSONValidationResult (where a is the Haskell type contained in the particulare Value).
  2. I started out thinking that I want an Applicative for my validations, but slowly I relaxed that to SemiGroup. I’m still not sure about this decision, because I can see a real use of or which I don’t really have now. Maybe that means I should switch back towards Applicative, just so I can implement an Alternative instance for validators.
  3. Well, I simply don’t know if this is even a good way to implement validators. I’d love to hear suggestions both for improvements and for completely different ways of tackling the problems.
  4. I would love to find out that there already is a library that does all this in a much better way. Please point me in its direction!

Appendix: A look at aeson-better-errors

The issue with aeson-better-errors is easiest to illustrate using the same example as in its announcement:

and with this loaded in GHCi (and make sure to either pass -XOverloadedStrings on the command line, or :set -XOverloadedStrings in GHCi itself)

*> parse asPerson "{\"name\": \"Alice\", \"age\": 32}"
Right (Person "Alice" 32)
*> parse asPerson "{\"name\": \"Alice\"}"
Left (BadSchema [] (KeyMissing "age"))
*> parse asPerson "{\"nam\": \"Alice\"}"
Left (BadSchema [] (KeyMissing "name"))

Clearly aeson-better-errors isn’t fulfilling the bit about reporting as many errors as possible. Something that I would have realised right away if I had bothered reading its API reference on Hackage a bit more carefully, the parser type ParseT is an instance of Monad!

Zipping streams

Writing the following is easy after glancing through the documentation for conduit:

Neither pipes nor streaming make it as easy to figure out. I must be missing something! What functions should I be looking at?

Picking up new rows in a DB

I’ve been playing around a little with postgresql-simple in combination with streaming for picking up inserted rows in a DB table. What I came up with was

It seems almost too simple so I’m suspecting I missed something.

  • Am I missing something?
  • Should I be using streaming, or is pipes or conduit better (in case they are, in what way)?