Trying to work out iteratees
- Magnus Therning
A few days ago I decided to explore the idea of using iteratee to do IO in Haskell. I read most of what Oleg has written on input processing using left-fold enumerators. Only a little wiser I took a look at the Iteratee IO package on Hackage. Unfortunately it still hadn’t quite sunk in. To be honest I couldn’t quite make heads or tails of it. Often just lining up the types properly will just work, even if I don’t understand why ((This reminds me of a math professor I had at university who said something like: “When it feels like the pen understands more than your head, persevere! It means you are close to getting it.”)) and soon after I usually gain some sort of understanding. This strategy didn’t seem to work with this particular package though :(
Somewhat based on Don’s answer to my question on Stackoverflow.com I thought I’d try to work through an implementation of my own. I just really hope I’ve got it right :)
I must admit I used Iteratee
for inspiration at times. However, I couldn’t copy it straight off since I decided to first implement iteratee without involving monads. Sure, it’s rather silly to do “input processing” in Haskell without involving monads, but I find that including monads often clouds the problem at hand. So, I left them out to begin with, and add them again once I feel I know what I’m doing. So, here goes nothing…
The basic idea is to process a stream of data, presented in chunks. Each chunk is handed to an iteratee by an enumerator. For each chunk the iteratee signals to the enumerator whether it’s done or requires more data. These are the types I cooked up for this.
data Stream c e
= Eof
| Chunk (c e)
data Stepper c e a
= Done a (Stream c e)
| NeedAnotherChunk (Iteratee c e a)
data Iteratee c e a = Iteratee
runIteratee :: (Stream c e) -> (Stepper c e a) }
{
type Enumerator c e a = Iteratee c e a -> Iteratee c e a
I found it rather useful to implement Show
for the two first, but I’ll leave that out of this post since it’s a simple thing to do.
I should probably point out that the container type for the stream (that’s the ‘c’ in Stream c e
) is rather pointless in what I’ve done; it’s always going to be []
in this post. Keeping it in does provide some more similarity with the Iteratee
on Hackage.
At this point I jumped to turning a list into an enumerator. The way I implemented it the list is split up in chunk of 3 items and present each chunk in order to the passed in iteratee.
= loop grouped iter
enumList l iter where
= groupList 3 l
grouped
= let
groupList n l = splitAt n l
(p, r) in case p of
-> []
[] -> xs:groupList n r xs
The loop
function is the main part. Ending when the chunks are all used up is the easy part:
= let
loop [] i = runIteratee i Eof
s in case s of
Done v str -> Iteratee $ \ _ -> s
NeedAnotherChunk i -> i
It is arguably an error if the iteratee returns NeedAnotherChunk
when passed an Eof
stream, but for now I’ll leave it the way it is. Doing the recursive step is very similar:
:xs) i = let
loop (x= runIteratee i (Chunk x)
s in case s of
Done v str -> Iteratee $ \ _ -> s
NeedAnotherChunk i' -> loop xs i'
Here it is worth noticing that the iteratee is expected to return any part of the chunk that wasn’t processed.
Next I coded up my first iteratee, a map over the stream:
= let
iFMap f Eof = Done acc Eof
doAcc acc Chunk i) = NeedAnotherChunk $ Iteratee $ doAcc (acc `mappend` (fmap f i))
doAcc acc (in Iteratee $ doAcc mempty
Now I can run the iterator over a list enumerator:
> runIteratee (enumList [1..9] (iFMap (*2))) Eof
Stepper Done <<[2,4,6,8,10,12,14,16,18]>> <<Stream: Eof>>
I found that a bit verbose, especially for interactive experimentation so the following simplifies it a bit
= case (runIteratee iter) Eof of
run iter Done a _ -> a
NeedAnotherChunk _ -> error "run: Iterator didn't finish on Eof"
As Oleg pointed out in his writings it turns out that Iteratee c e
is a monad.
instance Monad (Iteratee c e) where
return x = Iteratee $ \ s -> Done x s
The implementation of return
is obvious, there really isn’t any other option than to encode is a continuation that returns Done
irrespective of what is offered, and passes along the rest of the stream no matter what’s passed in. Bind (>>=
) is a bit more complicated:
>>= f = Iteratee $ \ s ->
i let
= runIteratee i s
c in case c of
Done v str -> runIteratee (f v) str
NeedAnotherChunk i' -> NeedAnotherChunk $ i' >>= f
My understanding is that the left iteratee should be stepped along until it returns Done
, at that point the result is passed to the right-side function, which results in an iteratee. The rest of the stream is then passed on to the new iteratee.
I implemented two other iteratees, iDrop :: Int -> Iteratee [] e ()
and iTakeWhile :: (e -> Bool) -> Iteratee [] e [e]
with the obvious implementations. This then allows me to write a little test like this:
= do
iTest 2
iDrop <- iTakeWhile (< 5)
t <- return 'c'
a <- iFMap (+ 3)
m return (t, a, m)
Running it gives the expected result:
> run (enumList [1..9] iTest)
3,4],'c',[8,9,10,11,12]) ([
That’s pretty much it. Not that much to it. At least not as long as I’ve actually understood iteratees.