Adventures in parsing, part 3
I got a great many comments, at least by my standards, on my earlier two posts
on parsing in Haskell. Especially on the latest one. Conal posted a comment on
the first pointing me towards liftM
and its siblings, without telling me that
it would only be the first step towards "applicative style". So, here I go
again…
First off, importing Control.Applicative
. Apparently <|>
is defined in both
Applicative
and in Parsec
. I do use <|>
from Parsec
so preventing
importing it from Applicative
seemed like a good idea:
import Control.Applicative hiding ( (<|>) )
Second, Cale pointed out that I need to make an instance for
Control.Applicative.Applicative
for GenParser
. He was nice enough to point
out how to do that, leaving syntax the only thing I had to struggle with:
instance Applicative (GenParser c st) where pure = return (<*>) = ap
I decided to take baby-steps and I started with parseAddress
. Here's what it
used to look like:
parseAddress = let hexStr2Int = Prelude.read . ("0x" ++) in do start <- liftM hexStr2Int $ thenChar '-' $ many1 hexDigit end <- liftM hexStr2Int $ many1 hexDigit return $ Address start end
On Twan's suggestion I rewrote it using where
rather than let ... in
and
since this was my first function I decided to go via the ap
function (at the
same time I broke out hexStr2Int
since it's used in so many places):
parseAddress = do start <- return hexStr2Int `ap` (thenChar '-' $ many1 hexDigit) end <- return hexStr2Int `ap` (many1 hexDigit) return $ Address start end
Then on to applying some functions from Applicative
:
parseAddress = Address start end where start = hexStr2Int <$> (thenChar '-' $ many1 hexDigit) end = hexStr2Int <$> (many1 hexDigit)
By now the use of thenChar
looks a little silly so I changed that part into
many1 hexDigit <* char '-'
instead. Finally I removed the where
part
altogether and use <*>
to string it all together:
parseAddress = Address <$> (hexStr2Int <$> many1 hexDigit <* char '-') <*> (hexStr2Int <$> (many1 hexDigit))
From here on I skipped the intermediate steps and went straight for the last form. Here's what I ended up with:
parsePerms = Perms <$> ( (== 'r') <$> anyChar) <*> ( (== 'w') <$> anyChar) <*> ( (== 'x') <$> anyChar) <*> (cA <$> anyChar) where cA a = case a of 'p' -> Private 's' -> Shared parseDevice = Device <$> (hexStr2Int <$> many1 hexDigit <* char ':') <*> (hexStr2Int <$> (many1 hexDigit)) parseRegion = MemRegion <$> (parseAddress <* char ' ') <*> (parsePerms <* char ' ') <*> (hexStr2Int <$> (many1 hexDigit <* char ' ')) <*> (parseDevice <* char ' ') <*> (Prelude.read <$> (many1 digit <* char ' ')) <*> (parsePath <|> string "") where parsePath = (many1 $ char ' ') *> (many1 anyChar)
I have to say I'm fairly pleased with this version of the parser. It reads about
as easy as the first version and there's none of the "reversing" that thenChar
introduced.
Comment by Conal Elliott:
A thing of beauty! I'm glad you stuck with it, Magnus.
Some much smaller points:
- The pattern
(== c) <$> anyChar
(nicely written, btw) arises three times, so it might merit a name. - Similarly for
hexStr2Int <$> many1 hexDigit
, especially when you rewritef <$> (a <* b)
to(f <$> a) <* b
. - The pattern
(a <* char ' ') <*> b
comes up a lot. How about naming it also, with a nice infix op, saya <#> b
? - The cA definition could use pattern matching instead (e.g.,
cA 'p' = Private
andcA 's' = Shared
). - Some of your parens are unnecessary (3rd line of
parseDevice
and last ofparseRegion
), since application binds more tightly than infix ops.
Comment by Twan van Laarhoven:
First of all, note that you don't need parentheses around parseSomething <*
char ' '
.
You can also simplify things a bit more by combining hexStr2Int <$> many1
hexDigit
into a function, then you could say:
parseHex = hexStr2Int <$> many1 hexDigit parseAddress = Address <$> parseHex <* char '-' <*> parseHex parseDevice = Device <$> parseHex <</em> char ':' <*> parseHex
Also, in cA
, should there be a case for character other than 'p' or 's'?
Otherwise the program could fail with a pattern match error.
Response to Conal and Twan:
Conal and Twan, thanks for your suggestions. I'll put them into practice and post the "final" result as soon as I find some time.