More adventures in parsing
- Magnus Therning
I received an interesting comment from Conal Elliott on my previous post on parsing. I have to admit I wasn’t sure I understood him at first, I’m still not sure I do, but I think I have an idea of what he means :-)
Basically my code is very sequential in that I use the do
construct everywhere in the parsing code. Personally I thought that makes the parser very easy to read since the code very much mimics the structure of the maps
file. I do realise the code isn’t very “functional” though so I thought I’d take Conal’s comments to heart and see what the result would be.
Let’s start with observation that every entity in a line is separated by a space. However some things are separated by other characters. So the first thing I did was write a higher-order function that first reads something, then reads a character and returns the first thing that was read:
= f >>= (\ r -> char c >> return r) thenChar c f
Since space is used as a separator so often I added a short-cut for that:
= thenChar ' ' thenSpace
Then I put that to use on parseAddress
:
= let
parseAddress = Prelude.read . ("0x" ++)
hexStr2Int in do
<- thenChar '-' $ many1 hexDigit
start <- many1 hexDigit
end return $ Address (hexStr2Int start) (hexStr2Int end)
Modifying the other parsing functions using thenChar
and thenSpace
is straight forward.
I’m not entirely sure I understand what Conal meant with the part about liftM
in his comment. I suspect his referring to the fact that I first read characters and then convert them in the “constructors”. By using liftM
I can move the conversion “up in the code”. Here’s parseAddress
after I’ve moved the calls to hexStr2Int
:
= let
parseAddress = Prelude.read . ("0x" ++)
hexStr2Int in do
<- liftM hexStr2Int $ thenChar '-' $ many1 hexDigit
start <- liftM hexStr2Int $ many1 hexDigit
end return $ Address start end
After modifying the other parsing functions in a similar way I ended up with this:
= let
parsePerms = case a of
cA a 'p' -> Private
's' -> Shared
in do
<- liftM (== 'r') anyChar
r <- liftM (== 'w') anyChar
w <- liftM (== 'x') anyChar
x <- liftM cA anyChar
a return $ Perms r w x a
= let
parseDevice = Prelude.read . ("0x" ++)
hexStr2Int in do
<- liftM hexStr2Int $ thenChar ':' $ many1 hexDigit
maj min <- liftM hexStr2Int $ many1 hexDigit
return $ Device maj min
= let
parseRegion = Prelude.read . ("0x" ++)
hexStr2Int = (many1 $ char ' ') >> (many1 $ anyChar)
parsePath in do
<- thenSpace parseAddress
addr <- thenSpace parsePerms
perm <- liftM hexStr2Int $ thenSpace $ many1 hexDigit
offset <- thenSpace parseDevice
dev <- liftM Prelude.read $ thenSpace $ many1 digit
inode <- parsePath <|> string ""
path return $ MemRegion addr perm offset dev inode path
Is this code more “functional”? Is it easier to read? You’ll have to be the judge of that…
Conal, if I got the intention of your comment completely wrong then feel free to tell me I’m an idiot ;-)