Adventures in parsing, part 3

I got a great many comments, at least by my standards, on my earlier two posts on parsing in Haskell. Especially on the latest one. Conal posted a comment on the first pointing me towards liftM and its siblings, without telling me that it would only be the first step towards “applicative style”. So, here I go again…

First off, importing Control.Applicative. Apparently <|> is defined in both Applicative and in Parsec. I do use <|> from Parsec so preventing importing it from Applicative seemed like a good idea:

Second, Cale pointed out that I need to make an instance for Control.Applicative.Applicative for GenParser. He was nice enough to point out how to do that, leaving syntax the only thing I had to struggle with:

I decided to take baby-steps and I started with parseAddress. Here’s what it used to look like:

On Twan’s suggestion I rewrote it using where rather than let ... in and since this was my first function I decided to go via the ap function (at the same time I broke out hexStr2Int since it’s used in so many places):

Then on to applying some functions from Applicative:

By now the use of thenChar looks a little silly so I changed that part into many1 hexDigit <* char '-' instead. Finally I removed the where part altogether and use <*> to string it all together:

From here on I skipped the intermediate steps and went straight for the last form. Here’s what I ended up with:

I have to say I’m fairly pleased with this version of the parser. It reads about as easy as the first version and there’s none of the “reversing” that thenChar introduced.

Conal Elliott

A thing of beauty! I’m glad you stuck with it, Magnus.

Some much smaller points:

  • The pattern (== c) <$> anyChar (nicely written, btw) arises three times, so it might merit a name.
  • Similarly for hexStr2Int <$> many1 hexDigit, especially when you rewrite f <$> (a <* b) to (f <$> a) <* b.
  • The pattern (a <* char ' ') <*> b comes up a lot. How about naming it also, with a nice infix op, say a <#> b?
  • The cA definition could use pattern matching instead (e.g., cA 'p' = Private and cA 's' = Shared).
  • Some of your parens are unnecessary (3rd line of parseDevice and last of parseRegion), since application binds more tightly than infix ops.

Conal Elliott

hm. i wonder why the boxes around list items in my previous reply.

Twan van Laarhoven

First of all, note that you don’t need parentheses around parseSomething <* char ' '.

You can also simplify things a bit more by combining hexStr2Int <$> many1 hexDigit into a function, then you could say:

Also, in cA, should there be a case for character other than ‘p’ or ‘s’? Otherwise the program could fail with a pattern match error.

Magnus

Conal and Twan, thanks for your suggestions. I’ll put them into practice and post the “final” result as soon as I find some time.

Leave a comment