Adventures in parsing, part 3

I got a great many comments, at least by my standards, on my earlier two posts on parsing in Haskell. Especially on the latest one. Conal posted a comment on the first pointing me towards liftM and its siblings, without telling me that it would only be the first step towards “applicative style”. So, here I go again…

First off, importing Control.Applicative. Apparently <|> is defined in both Applicative and in Parsec. I do use <|> from Parsec so preventing importing it from Applicative seemed like a good idea:

Second, Cale pointed out that I need to make an instance for Control.Applicative.Applicative for GenParser. He was nice enough to point out how to do that, leaving syntax the only thing I had to struggle with:

I decided to take baby-steps and I started with parseAddress. Here’s what it used to look like:

On Twan’s suggestion I rewrote it using where rather than let ... in and since this was my first function I decided to go via the ap function (at the same time I broke out hexStr2Int since it’s used in so many places):

Then on to applying some functions from Applicative:

By now the use of thenChar looks a little silly so I changed that part into many1 hexDigit <* char '-' instead. Finally I removed the where part altogether and use <*> to string it all together:

From here on I skipped the intermediate steps and went straight for the last form. Here’s what I ended up with:

I have to say I’m fairly pleased with this version of the parser. It reads about as easy as the first version and there’s none of the “reversing” that thenChar introduced.

⟸ Metacity joy... Adventures in parsing, part 4 ⟹

Conal Elliott

A thing of beauty! I’m glad you stuck with it, Magnus.

Some much smaller points:

  • The pattern (== c) <$> anyChar (nicely written, btw) arises three times, so it might merit a name.
  • Similarly for hexStr2Int <$> many1 hexDigit, especially when you rewrite f <$> (a <* b) to (f <$> a) <* b.
  • The pattern (a <* char ' ') <*> b comes up a lot. How about naming it also, with a nice infix op, say a <#> b?
  • The cA definition could use pattern matching instead (e.g., cA 'p' = Private and cA 's' = Shared).
  • Some of your parens are unnecessary (3rd line of parseDevice and last of parseRegion), since application binds more tightly than infix ops.

Conal Elliott

hm. i wonder why the boxes around list items in my previous reply.

Twan van Laarhoven

First of all, note that you don’t need parentheses around parseSomething <* char ' '.

You can also simplify things a bit more by combining hexStr2Int <$> many1 hexDigit into a function, then you could say:

Also, in cA, should there be a case for character other than ‘p’ or ‘s’? Otherwise the program could fail with a pattern match error.

Magnus

Conal and Twan, thanks for your suggestions. I’ll put them into practice and post the “final” result as soon as I find some time.

Leave a comment