Reading Redis responses
When I began experimenting with writing a new Redis client package I decided to use lazy bytestrings, because:
- aeson seems to prefer it – the main encoding and decoding functions use lazy byte strings, though there are strict variants too.
- the
Builder
type in bytestring produce lazy bytestrings.
At the time I was happy to see that attoparsec seemed to support strict and lazy bytestrings equally well.
To get on with things I also wrote the simplest function I could come up with
for sending and receiving data over the network – I used send
and recv
from
Network.Socket.ByteString.Lazy in network. The function was really simple
import Network.Socket.ByteString.Lazy qualified as SB
sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \sock -> do
_ <- SB.send sock $ toWireCmd cmd
resp <- SB.recv sock 4096
case decode resp of
Left err -> pure $ Left $ RespError "decode" (TL.pack err)
Right r -> pure $ k <$> fromWireResp cmd r
with decode
defined like this
decode :: ByteString -> Either String Resp
decode = parseOnly resp
I knew I'd have to revisit this function, it was naïve to believe that a call to
recv
would always result in as single complete response. It was however good
enough to get going. When I got to improving sendCmd
I was a little surprised
to find that I'd also have to switch to using strict bytestrings in the parser.
Interlude on the Redis serialisation protocol (RESP3)
The Redis protocol has some defining attributes
- It's somewhat of a binary protocol. If you stick to keys and values that fall
within the set of ASCII strings, then the protocol is humanly readable and you
can rather easily use
netcat
ortelnet
as a client. However, you aren't limited to storing only readable strings. - It's somewhat of a request-response protocol. A notable exception is the publish-subscribe subset, but it's rather small and I reckon most Redis users don't use it.
- It's somewhat of a type-length-value style protocol. Some of the data types include their length in bytes, e.g. bulk strings and verbatim strings. Other types include the number of elements, e.g. arrays and maps. A large number of them have no length at all, e.g. simple strings, integers, and doubles.
I suspect there are good reasons, I gather a lot of it has to do with speed. It does however cause one issue when writing a client: it's not possible to read a whole response without parsing it.
Rewriting sendCmd
With that extra information about the RESP3 protocol the naïve implementation above falls short in a few ways
- The read buffer may contain more than one full message and give the definition
of
decode
above any remaining bytes are simply dropped.1 - The read buffer my contain less than one full message and then
decode
will return an error.2
Surely this must be solvable, because in my mind running the parser results in one of three things:
- Parsing is done and the result is returned, together with any input that wasn't consumed.
- The parsing is not done due to lack of input, this is typically encoded as a continuation.
- The parsing failed so the error is returned, together with input that wasn't consumed.
So, I started looking in the documentation for the module
Data.Attoparsec.ByteString.Lazy in attoparsec. I was a little surprised to find
that the Result
type lacked a way to feed more input to a parser – it only
has two constructors, Done
and Fail
:
data Result r
= Fail ByteString [String] String
| Done ByteString r
I'm guessing the idea is that the function producing the lazy bytestring in the
first place should be able to produce more chunks of data on demand. That's
likely what the lazy variant of recv
does, but at the same time it also
requires choosing a maximum length and that doesn't rhyme with RESP3. The lazy
recv
isn't quite lazy in the way I needed it to be.
When looking at the parser for strict bytestrings I calmed down. This parser
follows what I've learned about parsers (it's not defined exactly like this;
it's parameterised in its input but for the sake of simplicity I show it with
ByteString
as input):
data Result r
= Fail ByteString [String] String
| Partial (ByteString -> Result r)
| Done ByteString r
Then to my delight I found that there's already a function for handling exactly my problem
parseWith :: Monad m => (m ByteString) -> Parser a -> ByteString -> m (Result a)
I only needed to rewrite the existing parser to work with strict bytestrings and
work out how to write a function using recv
(for strict bytestrings) that
fulfils the requirements to be used as the first argument to parseWith
. The
first part wasn't very difficult due to the similarity between attoparsec's
APIs for lazy and strict bytestrings. The second only had one complication. It
turns out recv
is blocking, but of course that doesn't work well with
parseWith
. I wrapped it in timeout
based on the idea that timing out means
there's no more data and the parser should be given an empty string so it
finishes. I also decided to pass the parser as an argument, so I could use the
same function for receiving responses for individual commands as well as for
pipelines. The full receiving function is
import Data.ByteString qualified as BS
import Data.Text qualified as T
import Network.Socket.ByteString qualified as SB
recvParse :: S.Socket -> Parser r -> IO (Either Text (BS.ByteString, r))
recvParse sock parser = do
parseWith receive parser BS.empty >>= \case
Fail _ [] err -> pure $ Left (T.pack err)
Fail _ ctxs err -> pure $ Left $ T.intercalate " > " (T.pack <$> ctxs) <> ": " <> T.pack err
Partial _ -> pure $ Left "impossible error"
Done rem result -> pure $ Right (rem, result)
where
receive =
timeout 100_000 (SB.recv sock 4096) >>= \case
Nothing -> pure BS.empty
Just bs -> pure bs
Then I only needed to rewrite sendCmd
and I wanted to do it in such a way that
any remaining input data could be use in by the next call to sendCmd
.3 I
settled for modifying the Conn
type to hold an IORef ByteString
together
with the socket and then the function ended up looking like this
sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \(sock, remRef) -> do
_ <- SBL.send sock $ toWireCmd cmd
rem <- readIORef remRef
recvParse sock rem resp >>= \case
Left err -> pure $ Left $ RespError "recv/parse" err
Right (newRem, r) -> do
writeIORef remRef newRem
pure $ k <$> fromWireResp cmd r
What's next?
I've started looking into pub/sub, and basically all of the work described in this post is a prerequisite for that. It's not very difficult on the protocol level, but I think it's difficult to come up with a design that allows maximal flexibility. I'm not even sure it's worthwhile the complexity.
Footnotes:
This isn't that much of a problem when sticking to the request-response commands, I think. It most certainly becomes a problem with pub/sub though.
I'm sure that whatever size of buffer I choose to use there'll be someone out there who's storing values that are larger. Then there's pipelining that makes it even more of an issue.
To be honest I'm not totally convinced there'll ever be any remaining input.
Unless a single Conn
is used by several threads – which would lead to much
pain with the current implementation – or pub/sub is used – which isn't
supported yet.