Adventures in parsing, part 4
I received a few comments on part 3 of this little mini-series and I just wanted
to address them. While doing this I still want the main functions of the parser
parseXxx
to read like the maps
file itself. That means I want to avoid
"reversing order" like thenChar
and thenSpace
did in part 2. I also don't
want to hide things, e.g. I don't want to introduce a function that turns (a <*
char ' ') <*> b
into a <#> b
.
So, first up is to do something about hexStr2Int <$> many1 hexDigit
which
appears all over the place. I made it appear in even more places by moving
around a few parentheses; the following two functions are the same:
foo = a <$> (b <* c) bar = (a <$> b) <* c
Then I scrapped hexStr2Int
completely and instead introduced hexStr
:
hexStr = Prelude.read . ("0x" ++) <$> many1 hexDigit
This means that parseAddress
can be rewritten to:
parseAddress = Address <$> hexStr <* char '-' <*> hexStr
Rather than, as Conal suggested, introduce an infix operation that addresses the
pattern (a <* char ' ') <*> b
I decided to do something about a <* char c
. I
feel Conal's suggestion, while shortening the code more than my solution, goes
against my wish to not hide things. This is the definition of <##>
:
(<##>) l r = l <* char r
After this I rewrote parseAddress
into:
parseAddress = Address <$> hexStr <##> '-' <*> hexStr
The pattern (== c) <$> anyChar
appears three times in parsePerms
so it got a
name and moved down into the where
clause. I also modified cA
to use pattern
matching. I haven't spent much time considering error handling in the parser, so
I didn't introduce a pattern matching everything else.
parsePerms = Perms <$> pP 'r' <*> pP 'w' <*> pP 'x' <*> (cA <$> anyChar) where pP c = (== c) <$> anyChar cA 'p' = Private cA 's' = Shared
The last change I did was remove a bunch of parentheses. I'm always a little hesitant removing parentheses and relying on precedence rules, I find I'm even more hesitant doing it when programming Haskell. Probably due to Haskell having a lot of infix operators that I'm unused to.
The rest of the parser now looks like this:
parseDevice = Device <$> hexStr <##> ':' <*> hexStr parseRegion = MemRegion <$> parseAddress <##> ' ' <*> parsePerms <##> ' ' <*> hexStr <##> ' ' <*> parseDevice <##> ' ' <*> (Prelude.read <$> many1 digit) <##> ' ' <*> (parsePath <|> string "") where parsePath = (many1 $ char ' ') *> (many1 anyChar)
I think these changes address most of the comments Conal and Twan made on the previous part. Where they don't I hope I've explained why I decided not to take their advice.
Comment by Jedaï:
That's really pretty ! Code you can read, but concise, Haskell is really good at that, though I need to look at how Applicative works its magic. :)
Good work !
Comment by Conal Elliot:
Magnus wrote
I also don’t want to hide things, e.g. I don’t want to introduce a function that turns
(a <* char ' ') <*> b
intoa <#> b
.
I'm puzzled about this comment. Aren't all of your definitions (as well as much of Parsec and other Haskell libraries) "hiding things"?
What appeals to me about a <#> b = (a <* char ' ') <*> b
(and similarly for,
say "a <:> b
", is that it captures the combination of a character separator
and <*>
-style application. As your example illustrates (and hadn't previously
occurred to me), this combination is very common.
Response to Conal:
Conal, you are right and I was unclear in what I meant. Basically I like the
idea of reading the parseXxx
functions and see the structure of the original
maps
file. At the moment I think that
parseAddress = Address <$> hexStr <##> '-' <*> hexStr
better reflects the structure of the maps
file than hiding away the separator
inside an operator. I also find it doesn't require me to carry a lot of "mental
baggage" when reading the code (I suspect this is the thing that's been
bothering me with the love of introducing operators that seems so prevalent
among Haskell developers, thanks for helping me put a finger on it). However,
your persistence might be paying off ;-) I'm warming to the idea. I just have to
come up with a scheme for naming operators that allows easy reading of the code.
Comment by Conal Elliott:
Oh! I'm finally getting what you've meant about "hiding things" vs "reflect[ing]
the structure of the maps
file". I think you want the separator characters to
show up in the parser, and between the sub-parsers that they separate.
Maybe what's missing in my <#>
suggestion is that the choice of the space
character as a separator is far from obvious, and I guess that's what you're
saying about "mental baggage" naming the operators for easy reading.
I suppose you could use sepSpace
and sepColon
as operator names.
parseAddress = Address <$> hexStr `sepColon` hexStr parseRegion = MemRegion <$> parseAddress `sepSpace` parsePerms `sepSpace` ...
Still, an actual space/colon character would probably be clearer. For colon, you
could use <:>
, but what for space?
Response to Conal:
Conal, that's exactly what I mean, just much more clearly expressed than I could ever hope to do.
I too was thinking of the problem with space
in an operator…