My ghcide build for Nix

I was slightly disappointed to find out that not all packages on Hackage that are marked as present in Nix(pkgs) actually are available. Quite a few of them are marked broken and hence not installable. One of these packages is ghcide.

There are of course expressions available for getting a working ghcide executable installed, like ghcide-nix. However, since I have rather simple needs for my Haskell projects I thought I’d play with my own approach to it.

What I care about is:

  1. availability of the development tools I use, at the moment it’s mainly ghcide but I’m planning on making use of ormolu in the near future
  2. pre-built packages
  3. ease of use

So, I put together ghcide-for-nix. It’s basically just a constumized Nixpkgs where the packages needed to un-break ghcide are present.

Usage is a simple import away:

import (builtins.fetchGit {
  name = "ghcide-for-nix";
  url = https://github.com/magthe/ghcide-for-nix;
  rev = "927a8caa62cece60d9d66dbdfc62b7738d61d75f";
}) 

and it’ll give you a superset of Nixpkgs. Pre-built packages are available on Cachix.

It’s not sophisticated, but it’s rather easy to use and suffices for my purposes.

Nix setup for Spacemacs

When using ghcide and LSP, as I wrote about in my post on Haskell, ghcide, and Spacemacs, I found myself ending up recompiling a little too often. This pushed me to finally start looking at Nix. After a bit of a fight I managed to get ghcide from Nix, which brought me the issue of setting up Spacemacs. Inspired by a gist from Samuel Evans-Powell and a guide to setting up an environment for Reflex by Thales Macedo Garitezi I ended up with the following setup:

It seems to work, but please let me know if you have suggestions for improvements.

Populating Projectile's cache

As I track the develop branch of Spacemacs I occasionally clean out my cache of projects known to Projectile. Every time it takes a while before I’m back at a stage where I very rarely have to visit something that isn’t already in the cache.

However, today I found the function projectile-add-known-project, which prompted me to write the following function that’ll help me quickly re-building the cache the next time I need to reset Spacemacs to a known state again.

Ditaa in Org mode

Just found out that Emacs ships with Babel support for ditaa (yes, I’m late to the party).

Sweet! That is yet another argument for converting all our README.mds into README.orgs at work.

Dr. Evil 

The changes I made to my Spacemacs config are

Haskell, ghcide, and Spacemacs

The other day I read Chris Penner’s post on Haskell IDE Support and thought I’d make an attempt to use it with Spacemacs.

After running stack build hie-bios ghcide haskell-lsp --copy-compiler-tool I had a look at the instructions on using haskell-ide-engine with Spacemacs. After a bit of trial and error I came up with these changes to my ~/.spacemacs:

The slightly weird looking lsp-haskell-process-wrapper-function is removing the pesky --lsp inserted by this line.

That seems to work. Though I have to say I’m not ready to switch from intero just yet. Two things in particular didn’t work with ghcide/LSP:

  1. Switching from one the Main.hs in one executable to the Main.hs of another executable in the same project didn’t work as expected – I had hints and types in the first, but nothing in the second.
  2. Jump to the definition of a function defined in the package didn’t work – I’m not willing to use GNU GLOBAL or some other source tagging system.

Nested tmux

I’ve finally gotten around to sorting out running nested tmux instances. I found the base for the configuration in the article Tmux in practice: local and nested remote tmux sessions, which links a few other related resources.

What I ended up with was this:

# Toggle tmux keybindings on/off, for use with inner tmux
# https://is.gd/slxE45
bind -T root F12  \
  set prefix None \;\
  set key-table off \;\
  set status-left "#[fg=black,bg=blue,bold] OFF " \;\
  refresh-client -S

bind -T off F12 \
  set -u prefix \;\
  set -u key-table \;\
  set -u status-left \;\
  refresh-client -S

It’s slightly simpler than what’s in the article above, but it works and it fits rather nicely with the nord theme.

Hedgehog on a REST API, part 3

In my previous post on using Hedgehog on a REST API, Hedgehog on a REST API, part 2 I ran the test a few times and adjusted the model to deal with the incorrect assumptions I had initially made. In particular, I had to adjust how I modelled the User ID. Because of the simplicity of the API that wasn’t too difficult. However, that kind of completely predictable ID isn’t found in all APIs. In fact, it’s not uncommon to have completely random IDs in API (often they are UUIDs).

So, I set out to try to deal with that. I’m still using the simple API from the previous posts, but this time I’m pretending that I can’t build the ID into the model myself, or, put another way, I’m capturing the ID from the responses.

The model state

When capturing the ID it’s no longer possible to use a simple Map Int Text for the state, because I don’t actually have the ID until I have an HTTP response. However, the ID is playing an important role in the constructing of a sequence of actions. The trick is to use Var Int v instead of an ordinary Int. As I understand it, and I believe that’s a good enough understanding to make use of Hedgehog possible, is that this way the ID is an opaque blob in the construction phase, and it’s turned into a concrete value during execution. When in the opaque state it implements enough type classes to be useful for my purposes.

The API calls: add user

When taking a closer look at the Callback type not all the callbacks will get the state in the same form, opaque or concrete, and one of them, Update actually receives the state in both states depending on the phase of execution. This has the most impact on the add user action. To deal with it there’s a need to rearrange the code a bit, to be specific, commandExecute can no longer return a tuple of both the ID and the status of the HTTP response because the update function can’t reach into the tuple, which it needs to update the state.

That means the commandExecute function will have to do tests too. It is nice to keep all tests in the callbacks, but by sticking a MonadTest m constraint on the commandExecute it turns into a nice solution anyway.

I found that once I’d come around to folding the Ensure callback into the commandExecute function the rest fell out from the types.

The API calls: delete user

The other actions, deleting a user and getting a user, required only minor changes and the changes were rather similar in both cases.

Not the type for the action needs to take a Var Int v instead of just a plain Int.

Which in turn affect the implementation of HTraversable

Then the changes to the Command mostly comprise use of concrete in places where the real ID is needed.

deleteUser :: (MonadGen n, MonadIO m) => Command n m State
deleteUser = Command gen exec [ Update u
                              , Require r
                              , Ensure e
                              ]
  where
    gen (State m) = case M.keys m of
      [] -> Nothing
      ks -> Just $ DeleteUser <$> Gen.element ks

    exec (DeleteUser vi) = liftIO $ do
      mgr <- newManager defaultManagerSettings
      delReq <- parseRequest $ "DELETE http://localhost:3000/users/" ++ show (concrete vi)
      delResp <- httpNoBody delReq mgr
      return $ responseStatus delResp

    u (State m) (DeleteUser i) _ = State $ M.delete i m

    r (State m) (DeleteUser i) = i `elem` M.keys m

    e _ _ (DeleteUser _) r = r === status200

Conclusion

This post concludes my playing around with state machines in Hedgehog for this time. I certainly hope I find the time to put it to use on some larger API soon. In particular I’d love to put it to use at work; I think it’d be an excellent addition to the integration tests we currently have.

Architecture of a service

Early this summer it was finally time to put this one service I’ve been working on into our sandbox environment. It’s been running without hickups so last week I turned it on for production as well. In this post I thought I’d document the how and why of the service in the hope that someone will find it useful.

The service functions as an interface to external SMS-sending services, offering a single place to change if we find that we are unhappy with the service we’re using.1 This service replaces an older one, written in Ruby and no one really dares touch it. Hopefully the Haskell version will prove to be a joy to work with over time.

Overview of the architecture

The service is split into two parts, one web server using scotty, and streaming data processing using conduit. Persistent storage is provided by a PostgreSQL database. The general idea is that events are picked up from the database, acted upon, which in turn results in other events which written to the database. Those are then picked up and round and round we go. The web service accepts requests, turns them into events and writes the to the database.

Hopefully this crude diagram clarifies it somewhat.

Diagram of the service architecture

There are a few things that might need some explanation

  • In the past we’ve wanted to have the option to use multiple external SMS services at the same time. One is randomly chosen as the request comes in. There’s also a possibility to configure the frequency for each external service.

    Picker implements the random picking and I’ve written about that earlier in Choosing a conduit randomly.

    Success and fail are dummy senders. They don’t actually send anything, and the former succeeds at it while the latter fails. I found them useful for manual testing.

  • Successfully sending off a request to an external SMS service, getting status 200 back, doesn’t actually mean that the SMS has been sent, or even that it ever will be. Due to the nature of SMS messaging there are no guarantees of timeliness at all. Since we are interested in finding out whether an SMS actually is sent a delayed action is scheduled, which will fetch the status of a sent SMS after a certain time (currently 2 minutes). If an SMS hasn’t been sent after that time it might as well never be – it’s too slow for our end-users.

    This is what report-fetcher and fetcher-func do.

  • The queue sink and queue src are actually sourceTQueue and sinkTQueue. Splitting the stream like that makes it trivial to push in events by using writeTQueue.

  • I use sequenceConduits in order to send a single event to multiple Conduits and then combine all their results back into a single stream. The ease with which this can be done in conduit is one of the main reasons why I choose to use it.2

Effects and tests

I started out writing everything based on a type like ReaderT <my cfg type> IO and using liftIO for effects that needed lifting. This worked nicely while I was setting up the basic structure of the service, but as soon as I hooked in the database I really wanted to do some testing also of the effectful code.

After reading Introduction to Tagless Final and The ReaderT Design Patter, playing a bit with both approaches, and writing Tagless final and Scotty and The ReaderT design pattern or tagless final?, I finally chose to go down the route of tagless final. There’s no strong reason for that decision, maybe it was just because I read about it first and found it very easy to move in that direction in small steps.

There’s a split between property tests and unit tests:

  • Data types, their monad instances (like JSON (de-)serialisation), pure functions and a few effects are tested using properties. I’m using QuickCheck for that. I’ve since looked a little closer at hedgehog and if I were to do a major overhaul of the property tests I might be tempted to rewrite them using that library instead.

  • Most of the Conduits are tested using HUnit.

Configuration

The service will be run in a container and we try to follow the 12-factor app rules, where the third one says that configuration should be stored in the environment. All previous Haskell projects I’ve worked on have been command line tools were configuration is done (mostly) using command line argument. For that I usually use optparse-applicative, but it’s not applicable in this setting.

After a bit of searching on hackage I settled on etc. It turned out to be nice an easy to work with. The configuration is written in JSON and only specifies environment variables. It’s then embedded in the executable using file-embed. The only thing I miss is a ToJSON instance for Config – we’ve found it quite useful to log the active configuration when starting a service and that log entry would become a bit nicer if the message was JSON rather than the (somewhat difficult to read) string that Config’s Show instance produces.

Logging

There are two requirements we have when it comes to logging

  1. All log entries tied to a request should have a correlation ID.
  2. Log requests and responses

I’ve written about correlation ID before, Using a configuration in Scotty.

Logging requests and responses is an area where I’m not very happy with scotty. It feels natural to solve it using middleware (i.e. using middleware) but the representation, especially of responses, is a bit complicated so for the time being I’ve skipped logging the body of both. I’d be most interested to hear of libraries that could make that easier.

Data storage and picking up new events

The data stream processing depends heavily on being able to pick up when new events are written to the database. Especially when there are more than one instance running (we usually have at least two instance running in the production environment). To get that working I’ve used postgresql-simple’s support for LISTEN and NOTIFY via the function getNotification.

When I wrote about this earlier, Conduit and PostgreSQL I got some really good feedback that made my solution more robust.

Delayed actions

Some things in Haskell feel almost like cheating. The light-weight threading makes me confident that a forkIO followed by a threadDelay (or in my case, the ones from unliftio) will suffice.


  1. It has happened in the past that we’ve changed SMS service after finding that they weren’t living up to our expectations.

  2. A while ago I was experimenting with other streaming libraries, but I gave up on getting re-combination to work – Zipping streams

Elasticsearch, types and indices

The other day I added some more logging into a service at work, but not all logs appeared in Kibana. Some messages got lost between CloudWatch Logs and Elasticsearch. After turning up the logging in the Lambda shuffling log messages I was in for a bit of learning about Elasticsearch.

Running the following in a Kibana console will show what the issue was

Executing them in order results in the following error on the second command

The reason for this is that a schema for the data is built up dynamically as documents are pushed in.1 It is possible to turn off dynamic schema building for an index using a mapping. For the documents above it’d look something lik this

Now it’s possible to push both documents, however searching is not possible, because, as the documentation for dynamic says:

fields will not be indexed so will not be searchable but will still appear in the _source field of returned hits

If there’s something that determines the value of logs it’s them being searchable.

As far as I understand one solution to all of this would have been mapping types, but that’s being removed (see removal of mapping types) so isn’t a solution. I’m not sure if Elasticsearch offers any good solution to it nowadays. There’s however a workaround, more indices.

Using two indices instead of one does work. So modifying the first commands to use separate indices works.

When creating an index pattern for idx-* there’s a warning about many analysis functions not working due to the type conflict. However, searching does work and that’s all I really care about in this case.

When shuffling the logs from CloudWatch Logs to Elasticsearch we already use multiple indices. They’re constructed based on service name, deploy environment (staging, production) and date (a new index each day). To deal with these type conflicts I added a log type that’s taken out of the log message itself. It’s not an elegant solution – it puts the solution into the services themselves – but it’s acceptable.


  1. Something that makes me wonder what the definition of schema-free is. I sure didn’t expect there to ever be a type constraint preventing pushing a document into something that’s called schema-free (see the Wikipedia article). (The initiated say it’s Lucene, not Elasticsearch, but to me that doesn’t make any difference at all.)

Hedgehog on a REST API, part 2

This is a short follow-up to Hedgehog on a REST API where I actually run the tests in that post.

Fixing an issue with the model

The first issue I run into is

━━━ Main ━━━
  ✗ sequential failed after 18 tests and 1 shrink.
  
        ┏━━ tst/test-01.hs ━━━
     89 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     90 ┃ getUser = Command gen exec [ Require r
     91 ┃                            , Ensure e
     92 ┃                            ]
     93 ┃   where
     94 ┃     gen (State m) = case M.keys m of
     95 ┃       [] -> Nothing
     96 ┃       ks -> Just $ GetUser <$> Gen.element ks
     97 ┃ 
     98 ┃     exec (GetUser i) = liftIO $ do
     99 ┃       mgr <- newManager defaultManagerSettings
    100 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
    101 ┃       getResp <- httpLbs getReq mgr
    102 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    103 ┃       return (status200 == responseStatus getResp, us)
    104 ┃ 
    105 ┃     r (State m) (GetUser i) = i `elem` M.keys m
    106 ┃ 
    107 ┃     e _ _ (GetUser _) (r, us) = do
    108 ┃       r === True
    109 ┃       assert $ isJust us
    110 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1
    
        ┏━━ tst/test-01.hs ━━━
    118 ┃ prop_seq :: Property
    119 ┃ prop_seq = property $ do
    120 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 10) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 1 = GetUser 1
    121 ┃   resetWS
    122 ┃   executeSequential initialState actions
    
    This failure can be reproduced by running:
    > recheck (Size 17) (Seed 2158538972777046104 (-1442908127347265675)) sequential
  
  ✗ 1 failed.

It’s easy to verify this using httpie:

It’s clear that my assumption that User ID starts at 1 is wrong. Luckily fixing that isn’t too difficult. Instead of defining the update function for addUser as

I define it as

The complete code at this point can be found here.

Fixing another issue with the model

With that fix in place another issue with the model shows up

━━━ Main ━━━
  ✗ sequential failed after 74 tests and 2 shrinks.
  
        ┏━━ tst/test-01.hs ━━━
     91 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     92 ┃ getUser = Command gen exec [ Require r
     93 ┃                            , Ensure e
     94 ┃                            ]
     95 ┃   where
     96 ┃     gen (State m) = case M.keys m of
     97 ┃       [] -> Nothing
     98 ┃       ks -> Just $ GetUser <$> Gen.element ks
     99 ┃ 
    100 ┃     exec (GetUser i) = liftIO $ do
    101 ┃       mgr <- newManager defaultManagerSettings
    102 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
    103 ┃       getResp <- httpLbs getReq mgr
    104 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    105 ┃       return (status200 == responseStatus getResp, us)
    106 ┃ 
    107 ┃     r (State m) (GetUser i) = i `elem` M.keys m
    108 ┃ 
    109 ┃     e _ _ (GetUser _) (r, us) = do
    110 ┃       r === True
    111 ┃       assert $ isJust us
    112 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1
    
        ┏━━ tst/test-01.hs ━━━
    120 ┃ prop_seq :: Property
    121 ┃ prop_seq = property $ do
    122 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 10) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 1 = DeleteUser 0
        ┃   │ Var 2 = AddUser ""
        ┃   │ Var 3 = GetUser 0
    123 ┃   resetWS
    124 ┃   executeSequential initialState actions
    
    This failure can be reproduced by running:
    > recheck (Size 73) (Seed 3813043122711576923 (-444438259649958339)) sequential
  
  ✗ 1 failed.

Again, verifying this using httpie shows what the issue is

In other words, the model assumes that the 0 User ID get’s re-used.

To fix this I need a bigger change. The central bit is that the state is changed to keep track of the index more explicitly. That is, it changes from

to

That change does, quite obviously, require a bunch of other changes in the other functions dealing with the state. The complete file can be viewed here.

All is well, or is it?

After this the tests pass, so all is good in the world, right?

In the test I defined the property over rather short sequences of commands. What happens if I increase the (maximum) length of the sequences a bit? Instead using Range.linear 1 10 I’ll use Range.linear 1 1000. Well, besides taking slightly longer to run I get another sequence of commands that triggers an issue:

━━━ Main ━━━
  ✗ sequential failed after 13 tests and 29 shrinks.
  
        ┏━━ tst/test-01.hs ━━━
     87 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     88 ┃ getUser = Command gen exec [ Require r
     89 ┃                            , Ensure e
     90 ┃                            ]
     91 ┃   where
     92 ┃     gen (State _ m) = case M.keys m of
     93 ┃       [] -> Nothing
     94 ┃       ks -> Just $ GetUser <$> Gen.element ks
     95 ┃ 
     96 ┃     exec (GetUser i) = liftIO $ do
     97 ┃       mgr <- newManager defaultManagerSettings
     98 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
     99 ┃       getResp <- httpLbs getReq mgr
    100 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    101 ┃       return (status200 == responseStatus getResp, us)
    102 ┃ 
    103 ┃     r (State _ m) (GetUser i) = i `elem` M.keys m
    104 ┃ 
    105 ┃     e _ _ (GetUser _) (r, us) = do
    106 ┃       r === True
    107 ┃       assert $ isJust us
    108 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1
    
        ┏━━ tst/test-01.hs ━━━
    116 ┃ prop_seq :: Property
    117 ┃ prop_seq = property $ do
    118 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 1000) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 2 = AddUser ""
        ┃   │ Var 5 = AddUser ""
        ┃   │ Var 7 = AddUser ""
        ┃   │ Var 9 = AddUser ""
        ┃   │ Var 11 = AddUser ""
        ┃   │ Var 20 = AddUser ""
        ┃   │ Var 28 = AddUser ""
        ┃   │ Var 30 = AddUser ""
        ┃   │ Var 32 = AddUser ""
        ┃   │ Var 33 = AddUser ""
        ┃   │ Var 34 = AddUser ""
        ┃   │ Var 37 = AddUser ""
        ┃   │ Var 38 = AddUser ""
        ┃   │ Var 41 = AddUser ""
        ┃   │ Var 45 = AddUser ""
        ┃   │ Var 47 = GetUser 15
    119 ┃   resetWS
    120 ┃   executeSequential initialState actions
    
    This failure can be reproduced by running:
    > recheck (Size 12) (Seed 2976784816810995551 (-47094630645854485)) sequential
  
  ✗ 1 failed.

That is, after inserting 16 users, we don’t see any user when trying to get that 16th user (User ID 15). That’s a proper bug in the server.

As a matter of fact, this is the bug I put into the server and was hoping to find. In particular, I wanted hedgehog to find the minimal sequence leading to this bug.1 Which it clearly has!


  1. If you recall from the previous post, I was interested in the integrated shrinking offered by hedgehog.