28 Jun 2025

Reading Redis responses

When I began experimenting with writing a new Redis client package I decided to use lazy bytestrings, because:

  1. aeson seems to prefer it – the main encoding and decoding functions use lazy byte strings, though there are strict variants too.
  2. the Builder type in bytestring produce lazy bytestrings.

At the time I was happy to see that attoparsec seemed to support strict and lazy bytestrings equally well.

To get on with things I also wrote the simplest function I could come up with for sending and receiving data over the network – I used send and recv from Network.Socket.ByteString.Lazy in network. The function was really simple

import Network.Socket.ByteString.Lazy qualified as SB

sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \sock -> do
    _ <- SB.send sock $ toWireCmd cmd
    resp <- SB.recv sock 4096
    case decode resp of
        Left err -> pure $ Left $ RespError "decode" (TL.pack err)
        Right r -> pure $ k <$> fromWireResp cmd r

with decode defined like this

decode :: ByteString -> Either String Resp
decode = parseOnly resp

I knew I'd have to revisit this function, it was naïve to believe that a call to recv would always result in as single complete response. It was however good enough to get going. When I got to improving sendCmd I was a little surprised to find that I'd also have to switch to using strict bytestrings in the parser.

Interlude on the Redis serialisation protocol (RESP3)

The Redis protocol has some defining attributes

  • It's somewhat of a binary protocol. If you stick to keys and values that fall within the set of ASCII strings, then the protocol is humanly readable and you can rather easily use netcat or telnet as a client. However, you aren't limited to storing only readable strings.
  • It's somewhat of a request-response protocol. A notable exception is the publish-subscribe subset, but it's rather small and I reckon most Redis users don't use it.
  • It's somewhat of a type-length-value style protocol. Some of the data types include their length in bytes, e.g. bulk strings and verbatim strings. Other types include the number of elements, e.g. arrays and maps. A large number of them have no length at all, e.g. simple strings, integers, and doubles.

I suspect there are good reasons, I gather a lot of it has to do with speed. It does however cause one issue when writing a client: it's not possible to read a whole response without parsing it.

Rewriting sendCmd

With that extra information about the RESP3 protocol the naïve implementation above falls short in a few ways

  • The read buffer may contain more than one full message and give the definition of decode above any remaining bytes are simply dropped.1
  • The read buffer my contain less than one full message and then decode will return an error.2

Surely this must be solvable, because in my mind running the parser results in one of three things:

  1. Parsing is done and the result is returned, together with any input that wasn't consumed.
  2. The parsing is not done due to lack of input, this is typically encoded as a continuation.
  3. The parsing failed so the error is returned, together with input that wasn't consumed.

So, I started looking in the documentation for the module Data.Attoparsec.ByteString.Lazy in attoparsec. I was a little surprised to find that the Result type lacked a way to feed more input to a parser – it only has two constructors, Done and Fail:

data Result r
    = Fail ByteString [String] String
    | Done ByteString r

I'm guessing the idea is that the function producing the lazy bytestring in the first place should be able to produce more chunks of data on demand. That's likely what the lazy variant of recv does, but at the same time it also requires choosing a maximum length and that doesn't rhyme with RESP3. The lazy recv isn't quite lazy in the way I needed it to be.

When looking at the parser for strict bytestrings I calmed down. This parser follows what I've learned about parsers (it's not defined exactly like this; it's parameterised in its input but for the sake of simplicity I show it with ByteString as input):

data Result r
    = Fail ByteString [String] String
    | Partial (ByteString -> Result r)
    | Done ByteString r

Then to my delight I found that there's already a function for handling exactly my problem

parseWith :: Monad m => (m ByteString) -> Parser a -> ByteString -> m (Result a)

I only needed to rewrite the existing parser to work with strict bytestrings and work out how to write a function using recv (for strict bytestrings) that fulfils the requirements to be used as the first argument to parseWith. The first part wasn't very difficult due to the similarity between attoparsec's APIs for lazy and strict bytestrings. The second only had one complication. It turns out recv is blocking, but of course that doesn't work well with parseWith. I wrapped it in timeout based on the idea that timing out means there's no more data and the parser should be given an empty string so it finishes. I also decided to pass the parser as an argument, so I could use the same function for receiving responses for individual commands as well as for pipelines. The full receiving function is

import Data.ByteString qualified as BS
import Data.Text qualified as T
import Network.Socket.ByteString qualified as SB

recvParse :: S.Socket -> Parser r -> IO (Either Text (BS.ByteString, r))
recvParse sock parser = do
    parseWith receive parser BS.empty >>= \case
        Fail _ [] err -> pure $ Left (T.pack err)
        Fail _ ctxs err -> pure $ Left $ T.intercalate " > " (T.pack <$> ctxs) <> ": " <> T.pack err
        Partial _ -> pure $ Left "impossible error"
        Done rem result -> pure $ Right (rem, result)
  where
    receive =
        timeout 100_000 (SB.recv sock 4096) >>= \case
            Nothing -> pure BS.empty
            Just bs -> pure bs

Then I only needed to rewrite sendCmd and I wanted to do it in such a way that any remaining input data could be use in by the next call to sendCmd.3 I settled for modifying the Conn type to hold an IORef ByteString together with the socket and then the function ended up looking like this

sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \(sock, remRef) -> do
    _ <- SBL.send sock $ toWireCmd cmd
    rem <- readIORef remRef
    recvParse sock rem resp >>= \case
        Left err -> pure $ Left $ RespError "recv/parse" err
        Right (newRem, r) -> do
            writeIORef remRef newRem
            pure $ k <$> fromWireResp cmd r

What's next?

I've started looking into pub/sub, and basically all of the work described in this post is a prerequisite for that. It's not very difficult on the protocol level, but I think it's difficult to come up with a design that allows maximal flexibility. I'm not even sure it's worthwhile the complexity.

Footnotes:

1

This isn't that much of a problem when sticking to the request-response commands, I think. It most certainly becomes a problem with pub/sub though.

2

I'm sure that whatever size of buffer I choose to use there'll be someone out there who's storing values that are larger. Then there's pipelining that makes it even more of an issue.

3

To be honest I'm not totally convinced there'll ever be any remaining input. Unless a single Conn is used by several threads – which would lead to much pain with the current implementation – or pub/sub is used – which isn't supported yet.

Tags: haskell redis
Comment here.