More adventures in parsing
I received an interesting comment from Conal Elliott on my previous post on parsing I have to admit I wasn't sure I understood him at first, I'm still not sure I do, but I think I have an idea of what he means :-)
Basically my code is very sequential in that I use the do
construct everywhere
in the parsing code. Personally I thought that makes the parser very easy to
read since the code very much mimics the structure of the maps
file. I do
realise the code isn't very "functional" though so I thought I'd take Conal's
comments to heart and see what the result would be.
Let's start with observation that every entity in a line is separated by a space. However some things are separated by other characters. So the first thing I did was write a higher-order function that first reads something, then reads a character and returns the first thing that was read:
thenChar c f = f >>= (\ r -> char c >> return r)
Since space is used as a separator so often I added a short-cut for that:
thenSpace = thenChar ' '
Then I put that to use on parseAddress
:
parseAddress = let hexStr2Int = Prelude.read . ("0x" ++) in do start <- thenChar '-' $ many1 hexDigit end <- many1 hexDigit return $ Address (hexStr2Int start) (hexStr2Int end)
Modifying the other parsing functions using =thenChar~ and thenSpace
is
straight forward.
I'm not entirely sure I understand what Conal meant with the part about liftM
in his comment. I suspect his referring to the fact that I first read characters
and then convert them in the "constructors". By using liftM
I can move the
conversion "up in the code". Here's parseAddress
after I've moved the calls to
hexStr2Int
:
parseAddress = let hexStr2Int = Prelude.read . ("0x" ++) in do start <- liftM hexStr2Int $ thenChar '-' $ many1 hexDigit end <- liftM hexStr2Int $ many1 hexDigit return $ Address start end
After modifying the other parsing functions in a similar way I ended up with this:
parsePerms = let cA a = case a of 'p' -> Private 's' -> Shared in do r <- liftM (== 'r') anyChar w <- liftM (== 'w') anyChar x <- liftM (== 'x') anyChar a <- liftM cA anyChar return $ Perms r w x a parseDevice = let hexStr2Int = Prelude.read . ("0x" ++) in do maj <- liftM hexStr2Int $ thenChar ':' $ many1 hexDigit min <- liftM hexStr2Int $ many1 hexDigit return $ Device maj min parseRegion = let hexStr2Int = Prelude.read . ("0x" ++) parsePath = (many1 $ char ' ') >> (many1 $ anyChar) in do addr <- thenSpace parseAddress perm <- thenSpace parsePerms offset <- liftM hexStr2Int $ thenSpace $ many1 hexDigit dev <- thenSpace parseDevice inode <- liftM Prelude.read $ thenSpace $ many1 digit path <- parsePath <|> string "" return $ MemRegion addr perm offset dev inode path
Is this code more "functional"? Is it easier to read? You'll have to be the judge of that…
Conal, if I got the intention of your comment completely wrong then feel free to tell me I'm an idiot ;-)
Comment by Holger:
Your story and Conal's comment inspired me to play around with liftM and I came up with this version of parseAddress:
parseAddress = let hexStr2Int = Prelude.read . ("0x" ++) in liftM3 (\ x _ y -> Address (hexStr2Int x) (hexStr2Int y)) (many1 hexDigit) (char '-') (many1 hexDigit)
You could probably rewrite all the functions in a similar way, but honestly I find your original code in do-notation much easier to read.
Response to Holger:
Yeah, the same line of thought made it a little difficult to fall asleep yesterday (yes, I know, nerdiness taken to new levels). My thoughts was something like this:
parseAddress = let hexStr2Int = ... in liftM2 Address (liftM hexStr2Int $ thenChar '-' $ many1 hexDigit) (liftM hexStr2Int $ many1 hexDigit)
I agree with you on readability. I also wonder if laziness could bite back or if `liftM2` guarantees an order of evaluation.
Comment by Holger:
I just looked at the source of liftM2/3 and it seems that it basically just resolves to an expression in do-notation. So it's just a shortcut and therefore should yield the same program.
Comment by Cale Gibbard:
liftM2
guarantees an ordering on the monadic computation, because it's defined like:
liftM2 f x y = do { u <- x; v <- y; return (f u v) }
Though, that's a little different from guaranteeing an order of evaluation – depending on the monad, the order of evaluation will vary. In any case, it shouldn't be much different from what you originally had.
Another thing you might like to play around with, at least in your head, is the fact that:
liftM2 f x y = return f `ap` x `ap` y liftM3 f x y z = return f `ap` x `ap` y `ap` z
and so on, which leads up to the style embodied by the Control.Applicative
library, though if you really want to play around with that, you'll need to
write a quick instance of Applicative for GenParser, which should just consist
of making pure = return
and (<*>) = ap
.
Comment by Twan van Laarhoven:
The best way to make parsing code readable is to use Data.Applicative. Also,
most people prefer where to let..in
if possible. This gives something like:
parseHexStr = Prelude.read . ("0x" ++) many1 hexDigit parsePath = many1 (char ' ') *> many1 anyChar parseAddress = Address hexStr *> char '-' hexStr parseRegion = MemRegion parseAddress *> char ' ' parsePerms *> char ' ' parseHexStr *> char ' ' parseDevice *> char ' ' (Prelude.read many1 digit) *> char ' ' (parsePath return "")
Basicly (liftM# f x y z)
can be written as f x <*> y <*> z
.
Comment by Nick:
I find the thenSpace
a bit difficult to read.I think something like this is
more natural, as it maintains the left-to-right relationship of the parsed data
and the following space:
aChar c r = char c >> return r aSpace = aChar ' ' ... do start <- many1 hexDigit >> aChar ....
note that I haven't tested this and my haskell-fu is not very strong, so I could be way off here.
Comment by Conal Elliot:
Yes, that's exactly the direction i had in mind. once you switch from "do" style to "liftM#" style, it's a simple step to replace the monad operators to applicative functor operators.
Response to Nick:
Nick, yes your way of writing it is easier to read. You'll need to change to
using >>=
though:
do start <- many1 hexDigit >>= aChar '-'
Then you can mix in `liftM` as well:
do start <- liftM hexStr2Int $ many1 hexDigit >>= aChar '-'
to do the conversion. However, I think my initial version is even clearer:
do start <- liftM hexStr2Int $ many1 hexDigitc char '-'
Response to Conal Elliot:
Conal, ok, you're one sneaky little b… ;-) I'll have to look at the applicative operators to see what I think of them.