Monads May 2006 H. Richards
========================================
This file is available at ~ham/public/MonadsTalk.lhs
> module MonadsTalk where -- "module" and "where" are reserved words
> import IO (hugsIsEOF) -- a module we'll need later
> import ParseLib -- another module we'll need later
========================================
A few Haskell preliminaries
In a literate Haskell script, code has '>' in col 1.
Program layout is syntactically significant.
Haskell is typed statically.
Type notation:
x :: y means "the type of x is y"
f :: x -> y f is a function from x to y
Capitalization of names is significant:
names of values and variables (including type variables) begin with lowercase
names of specific types begin with Uppercase
Haskell is "lazy": all computation is driven by demand for results
Haskell is referentially transparent, i.e.,
for any given expression E
¥ the order of evaluation of E's subexpressions does not affect E's value.
¥ Church-Rosser theorem: all terminating evaluations of E end in the
same normal form.
¥ replacing any subexpression of E by an equivalent subexpression
does not alter E's value.
¥ evaluation of E has no-side effects.
¥ application of any given function to E always yields the same result.
==========================================
Monadic I/O
-----------
Input is a problem for pure functional languages.
The Standard-ML approach to input:
inputInt :: Int (using Haskell's type notation)
Each evaluation of inputInt reads the next Int from the input stream.
Seems straightforward, but consider
w = inputInt - inputInt -- order-dependent
x = y - y where y = inputInt -- how many items are read?
Thanks to the input stream's changing state, referential transparency is done for.
-----------------------------------------
Haskell's monadic solution: Keep the input stream's state hidden.
IO b -- (polymorphic abstract) type of I/O commands
that deliver values of type b.
IO is an example of a monad-- a concept from category theory which has
proven highly useful in practice (for mare than just IO).
some primitives:
getChar :: IO Char -- when executed, inputs and delivers a character
getLine :: IO String -- when executed, inputs and delivers a String
putChar :: Char -> IO () -- a function which, applied to x :: Char,
creates an IO action which, when executed,
outputs x and delivers nothing
() :: () ; cf. C's void
putStr :: String -> IO () -- applied to x :: String, creates an IO action
which, when executed, outputs x and delivers nothing
return :: b -> IO b -- applied to a value x :: b,
creates an IO action which delivers x
(without performing any IO)
An IO command is performed by submitting it at the Hugs prompt.
putChar '#'
putStr "ACL2"
Combining IO commands
do command1
command2
command3
defines an IO command which is executed by executing its components in
sequence.
Alternative syntax:
do command1; command2; command3
Example: do putStr "ACL"; putChar '2'
some nonprimitives:
> putStrLn :: String -> IO ()
> putStrLn s = do putStr s
> putChar '\n'
do putStrLn "AC"; putStr "L2"
> putNtimes :: Int -> String -> IO () -- Haskell functions are curried
> putNtimes n s
> = if n < 1
> then return ()
> else do putStrLn s
> putNtimes (n-1) s
Delivered values can be named for subsequent reference:
do x <- command1
y <- command2
command3 x y
Examples
> rev1line :: IO ()
> rev1line = do line <- getLine
> putStr (reverse line)
> rev2lines :: IO ()
> rev2lines = do line1 <- getLine
> line2 <- getLine
> let r1 = reverse line1
> let r2 = reverse line2
> putStrLn r2
> putStrLn r1
Note the difference between '<-' and '=':
x <- y gives the name x to the value delivered by y
x = y gives the name x to the value of y
A bigger example:
> palindromes :: IO ()
> palindromes = do putStrLn "Palindrome detector"
> pal 0 0
> where
> pal :: Int -> Int -> IO ()
> pal n np = do line <- getLine
> if null line
> then do if n>0
> then do putStr "Palindrome frequency: "
> putStr (show (100 * fromInt np / fromInt n))
> putStr "%\n"
> else putStr "Empty input\n"
> return ()
> else do let letters = map toUpper (filter isAlpha line)
> let isPal = letters == reverse letters
> putStrLn (if isPal then "yes" else "no")
> pal (n+1) (np + oneIf isPal)
>
> oneIf :: Integral a => Bool -> a
> oneIf True = 1
> oneIf _ = 0
Summary: Every Haskell program is an IO command.
The purely functional part of the program is about defining the IO commands;
execution of the commands is behind the IO abstract-type firewall.
==========================================
Useful URLs:
http://www.haskell.org/
http://www.nomaware.com/monads/html/
==========================================
To see how monads fit into Haskell, some context is needed.
Hugs
====
A command-line interpreter for Haskell.
Designed for education and prototyping (fast compilation, slow execution).
You can get a copy of the Hugs interpreter for most any platform at
http://www.haskell.org/hugs/.
A guide to the more common Hugs errors is available at
http://www.cs.ukc.ac.uk/people/staff/sjt/craft2e/errors.html.
Hugs commands
-------------
A line beginning ':' is a command; the one to remember is ":?".
All other lines are treated as Haskell expressions to be evaluated.
Expressions
-----------
Numbers, Booleans
3+4 == 9-2
Conditional expressions
if 3 < 7 then "yes" else "no"
functional (lambda-) expressions:
\x -> x*5 -- Application of the \-expression binds x to the argument.
-- Scope of x is limited to the \-expression's body.
Definitions and Scripts
=======================
Names with global scope are defined in "scripts".
[BTW-- unlike Scheme, Hugs does not allow global names to be defined in commands.]
This file is a script; ".lhs" means "literate Haskell Script".
In literate scripts, Haskell-code lines begin with ">"; all others are comments.
A script defines a module.
> times5 = \x -> x*5
> mby5 x = x*5
Non-literate scripts' names end in ".hs".
Prime example: Prelude.hs
/usr/local/lib/hugs/lib/Prelude.hs
Hugs loads Prelude.hs automatically as it starts up.
Syntax note
-----------
Indentation is significant.
Indentation of first nonblank after each "where", "let", "do", or "of" (each of
which introduces a set of definitions) establishes a boundary;
a line beginning
¥ to the right of the boundary is part of the same definition
¥ on the boundary begins a new definition at same level
¥ to the left of the boundary ends the set of definitions
Use of indentation is optional ((http://www.haskell.org/onlinereport/lexemes.html#sect2.7));
one can use braces {} and semicolons instead.
> module HaskellTutorial where { times5 = \x -> x*5; mby5 x = x*5 }
Local definitions
let x = 3 in x+5 -- a "let-expression"
let x = 3*y; y = 5 in x*y+2 -- order of defs in a let doesn't matter
> a = b+c where b = c*2; c = 7 -- "where" can be used only in definitions
(no such thing as a where-expression)
> a = b+c
> where
> b = c*2
> c = 7
Syntax note: Infix operators have conventional precedences and associativities
(defined in Prelude.hs)
Except for infix operators, functions are applied by prefixing their names to their
arguments. Parentheses are used only for grouping, and function application
takes precedence over infix operators, so
f a+x b-y means (f a) + (x b) - y
*not* f (a+x) (b-y)
Sidelight: An infix operator can be converted to a name by enclosing it in parentheses:
(*) x y = x * y
and a name can be converted to an infix operator by enclosing it in back-quotes:
x `mod` y = mod x y
Types in Haskell
================
TypeExp ::= TypeName | TypeVar | ( TypeExp ) | FunType | ...
FunType ::= TypeExp -> FunType | TypeExp -- note that -> is right-associative
TypeName ::= Upper { Letter }
TypeVar ::= Lower { Letter }
Examples:
Int Float Bool Char String Integer Rational a b c
Int -> Char -> Int -- function of 2 parameters returning Int, e.g., \x y -> x + ord y
Int -> (Char -> Int) -- function of 1 parameter returning (function of 1 parameter),
e.g., \x -> \y -> x + ord y
... but these are the **same** function, and hence the two type expressions
denote the same type. (This is what makes currying work.)
(Int -> Char) -> Int -- function of 1 parameter, e.g., \f -> ord (f (ord 'p'))
-- no type variables => this type is monomorphic
a -> b -> Char -- function of 2 parameters, e.g., \x y -> 'z';
-- parameters' types are type variables, so
-- this type is polymorphic.
a -> a -> Int -- function of 2 parameters \x y -> length [x,y]
-- parameters may be any type, but must be the *same* type
Type declarations are (nearly always) optional;
Hindley-Milner => types can be inferred from program text.
"If your program passes the type check, it's probably correct."
Declaring types makes good documentation, improves type-error messages.
Hugs command for determining exp's type:
> :t
Tuples
======
A handy means of packaging heterogeneous collections of fixed size
(i.e, a tuple's size is inherent in its type).
> (1,False,"move") :: (Integer,Bool,String)
> sumDif :: Int -> Int -> (Int,Int)
> sumDif x y = (x+y,x-y)
Tuples are "unpacked" by pattern-matching:
> firstOf2 :: (a,b) -> a
> firstOf2 (x,_) = x
> secondOf2 :: (a,b) -> b
> secondOf2 (_,y) = y
Lists
=====
As in Scheme, the workhorse data structure is the list.
Lists are collections of variable size (i.e., a list's size is not part of its type).
Lists are (unlike Scheme) homogeneous.
Defining lists
--------------
list-construction operator
---------------------------
The list-building primitive (i.e., nonstrict, polymorphically typed cons) is
(:) :: a -> [a] -> [a] -- pronounced "followed-by"
So [1,2,3] = 1 : [2,3]
= 1 : (2 : [3])
= 1 : (2 : (3 : []))
= 1 : 2 : 3 : [] (right-associative, to avoid ()'s)
> fromTo :: Int -> Int -> [Int]
> fromTo x y
> | x <= y = x : fromTo (x+1) y
> | otherwise = []
> threes = 3 : threes
Deconstructing lists
--------------------
(:) is also used in patterns.
When (x:xs) matches a list,
¥ the list is not empty
¥ its first element is named x
¥ the rest of the list (possibly empty) is named xs.
head (x:_) = x
tail (_:xs) = xs
> doubleAll :: [Int] -> [Int]
> doubleAll [] = []
> doubleAll (n:ns) = 2*n : doubleAll ns
A few list-processing functions
----------------------------------------
> length :: [a] -> Int
> length [] = 0
> length (_:xs) = 1 + length xs
Generalizing doubleAll,
> map f [] = []
> map f (x:xs) = f x : map f xs
> doubleAll = map (\x->2*x)
concatenation:
> (++) :: [a] -> [a] -> [a]
> [] ++ ys = ys
> (x:xs) ++ ys = x : (xs ++ ys)
membership test:
> _ `elt` [] = False
> x `elt` (y:ys) = x==y || x `elt` ys
insertion sort:
> inSort [] = []
> inSort (x:xs) = insert x (inSort xs)
> where
> insert x [] = [x]
> insert x (y:ys)
> | x | otherwise = y : insert x ys
Restricted Types, Type Classes, and Overloading
===============================================
Note absence of type declarations for `elt` and `inSort`.
The declaration elt :: a -> [a] -> Bool would be too general.
elt applies (==) to its arguments, and some types do not
support equality tests (e.g., functions).
The types that DO support (==) (and (/=) also) form the "type class" Eq.
The class definition:
> class Eq a where
> (==), (/=) :: a -> a -> Bool -- Eq's "signature"
Meaning: Any type a belonging to class Eq
supports the functions (==) and (/=), both of which
take two arguments of type a and return a result of type Bool.
(==) and (/=) are said to be overloaded for the Eq types.
----------
Instance declarations
---------------------
A specific type belongs to a type class as a consequence of an
"instance declaration", which includes equations defining the
class's signature functions.
> instance Eq Bool where
>
> True == True = True
> False == False = True
> _ == _ = False
>
> x /= y = not (x === y)
Specifying restrictions
-----------------------
Now we can specify elt's type:
> elt :: Eq a => a -> [a] -> Bool
Meaning: elt is defined for all types in the Eq class.
"Eq a =>" is a "context" which restricts the range of type variable a.
elt inherits the overloading of (==).
--------
:t inSort
Default equations
-----------------
Implementation equations can be moved from instance declarations
to class definitions:
> class Eq a where
> (==), (/=) :: a -> a -> Bool -- Eq's "signature"
>
> x==y = not (x/=y) -- "default equation"
> x/=y = not (x==y) -- "default equation"
The defaults can be overridden in any instance declaration (for
instances of Eq, at least one of the two defaults MUST be overridden).
Type subclasses
---------------
The list-sorting functions use functions (<), (>), (<=), (>=),
which --like (==)-- are defined for many types but not all. These types
are members of class Ord.
Any type supporting (<) etc. also supports (==), but not vice versa.
Hence Ord is a subclass of Eq:
> class (Eq a) => Ord a where
> compare :: a -> a -> Ordering
> (<), (<=), (>=), (>) :: a -> a -> Bool
> max, min :: a -> a -> a
>
> compare x y | x==y = EQ
> | x<=y = LT
> | otherwise = GT
>
> x <= y = compare x y /= GT
> x < y = compare x y == LT
> x >= y = compare x y /= LT
> x > y = compare x y == GT
>
> max x y | x >= y = x
> | otherwise = y
> min x y | x <= y = x
> | otherwise = y
------------------------------------------------------
| What's Ordering? Its definition: |
| |
| > data Ordering = LT | EQ | GT |
| |
| For comparison, here's the definition of Bool: |
| |
| > data Bool = False | True |
| |
------------------------------------------------------
Some other standard (i.e., Prelude) classes
===========================================
Show
Read
Num (Eq, Show)
Real (Num, Ord)
Integral (Real, Enum) : Int, Integer
Fractional (Num)
Floating (Fractional)
RealFrac (Real, Fractional) : Rational
RealFloat (RealFrac, Floating) : Float, Double
Enum: Int, Integer, Char, Bool, Float, Double
Bounded: Bool, Char, Int
numerals are overloaded
-----------------------
7 :: Num a => a
7.0 :: Fractional a => a
Defining new types
==================
type synonyms (cf. C's typedef): new names for existing types
-------------
> type String = [Char]
> type IntToChar :: Int -> Char
truly new types: Algebraic Types
--------------------------------
enumeration type:
> data Direction = North | East | South | West
> deriving (Show, Read, Eq, Ord, Bounded, Enum)
> heading :: Direction -> Float
> heading North = 0
> heading East = 90
> heading South = 180
> heading West = 270
product type:
> data Point = Pt Float Float deriving (Eq, Show, Read)
> distance :: Point -> Point -> Float
> distance (Pt x0 y0) (Pt x1 y1) = sqrt ((x0-x1)^2 + (y0-y1)^2)
distance (Pt 1 2) (Pt 4 6)
sum-of-products (discriminated union) type:
> data Temp = Celsius Float | Fahrenheit Float
> deriving (Show, Read)
> toCelsius :: Temp -> Temp
> toCelsius (Fahrenheit t) = Celsius (5 / 9 * (t - 32))
> toCelsius x = x
> degreesC :: Temp -> Float
> degreesC (Celsius t) = t
> degreesC t = degreesC (toCelsius t)
> instance Eq Temp where
> t1 == t2 = degreesC t1 == degreesC t2
> instance Ord Temp where
> t1 <= t2 = degreesC t1 <= degreesC t2
new polymorphic types
> data Pair a b = Pr a b deriving (Eq, Ord)
Pr is a value constructor; Pr :: a -> b -> Pair a b
Pair is a *type* constructor: applied to two type arguments, it yields a type.
> first :: Pair a b -> a
> first (Pr x _) = x
> instance (Show a, Show b) => Show (Pair a b) where
> show (Pr x y) = "<" ++ show x ++ "," ++ show y ++ ">"
new recursive types
> data List a = Nill | Cons a (List a) deriving (Eq,Ord,Show,Read)
mimics the built-in list types
[] :: [a] Nill :: List a
(:) :: a -> [a] -> [a] Cons :: a -> List a -> List a
infix constructors (first character is ':')
> data Expr = Lit Integer | Expr :+: Expr | Expr :-: Expr | Expr :*: Expr
> eval :: Expr -> Integer
> eval (Lit n) = n
> eval (e0 :+: e1) = eval e0 + eval e1
> eval (e0 :-: e1) = eval e0 - eval e1
> eval (e0 :*: e1) = eval e0 * eval e1
eval ((Lit 3 :+: Lit 5) :*: Lit 9)
Trees
-----
> data Tree a = Nil | Node a (Tree a) (Tree a) deriving (Show)
> depth :: Tree a -> Int
> depth Nil = 0
> depth (Node _ t0 t1) = 1 + depth t0 `max` depth t1
===========================================================
Classes can contain not only types but also type constructors.
> class Functor f where
> fmap :: (a -> b) -> (f a -> f b)
Since a and b are types, the type expressions f a and f b are valid only
if f is a type constructor (only type contructors can be applied to types).
So Functor is a type-constructor class. Every type constructor in Functor
provides an implementation of fmap.
> instance Functor [] where -- ([] a) means [a]
> fmap = map
> instance Functor Tree where
> fmap f Nil = Nil
> fmap f (Node x t0 t1) = Node (f x) (fmap f t0) (fmap f t1)
Using fmap, we can define functions applicable to trees, lists, and other container
"types".
> doubleAll5 :: (Functor f, Num n) => f n -> f n -- good for lists, trees, and
> doubleAll5 = fmap (\x -> 2*x) -- any other data structure in Functor
doubleAll5 [1,2,3]
doubleAll5 (Node 1 (Node 2 Nil (Node 3 Nil Nil)) Nil)
A very small container "type":
> data Maybe a = Nothing | Just a -- contains at most one item
> deriving (Eq, Ord, Read, Show)
> instance Functor Maybe where
> fmap f Nothing = Nothing
> fmap f (Just x) = Just (f x)
Maybe types are useful for making partial functions total.
div :: Integral a => a -> a -> a
> totalDiv :: Integral a => a -> a -> Maybe a
> totalDiv n d
> | d /= 0 = Just (n `div` d)
> | otherwise = Nothing
=============================
A much more important type-constructor class: Monad
Easing into monads...
Three equivalent definitions of f as a functional composition of g, h, and j:
> f x = j (h (g x))
> f x = let y = g x
> z = h y
> w = j z
> in w
> f = j . h . g -- nice and simple, no superfluous names
Note: (.) :: (a -> b) -> (c -> a) -> (c -> b)
Each of g, h, and j consumes the *entire* value returned by its predecessor.
Sometimes that's too simple. Consider this C function:
int f1( float w ) {
const char x = g1(w);
const double y = h1(w);
const int z = j1(w,x,y);
return z; }
This isn't a simple composition of g1, h1, and j1.
Also, g1, h1, and j1 may perform "invisible" updates of "state"
(i.e., global variables, IO streams, etc.).
Mimicking this structure in Haskell:
> f1 w s0 = let (s1,x) = g1 w s0
> (s2,y) = h1 w s1
> (s3,z) = j1 w x y s2
> in (s3,z)
in which s0, s1, s2, s3 :: State -- successive states
Hence g1 w :: State -> (State, a)
h1 w :: State -> (State, b)
j1 w x y :: State -> (State, c)
f1 w :: State -> (State, d)
We would like to define structures like this without all the
intermediate-state names.
So we define a new type
> data StateTrans s r = ST (s -> (s,r))
where
¥ s is some "state" type
¥ r is a result type
¥ (StateTrans s r) is a "state transformer" which, applied to
a state, returns a new state paired with a separate result
and a new kind of composition:
> (>>=) :: StateTrans s a -> (a -> StateTrans s b) -> StateTrans s b
>
> f >>= g = \s0 -> let (s1,x) = f s0 in g x s1
"f bind g"
and a function that takes a value into a state "transformer" which
returns a given state (unchanged), paired with the value as result:
> return :: a -> StateTrans s a
> return res = \s0 -> (s0,res)
Now we can define f1 without naming the states (which are thereby hidden):
> f1 w = g1 w >>=
> (\x -> h1 w >>=
> (\y -> j1 w x y >>=
> (\z -> return z)))
A variant of (>>=) is (>>), which discards its first argument's result:
> f >> g = f >>= (\ _ -> g )
"f then g"
These functions are useful in many applications
¥ input and output
¥ exception handling
¥ nondeterminism
¥ parsing
¥ hiding mutable arrays
so we overload them by defining a class of which they're the signature:
> class Monad m where
> return :: a -> m a
> (>>=) :: m a -> (a -> m b) -> m b
> (>>) :: m a -> m b -> m b
> fail :: String -> m a -- allows for custom exception-handling
Minimal complete definition: (>>=), return
> p >> q = p >>= \ _ -> q
>
> fail s = error s -- default exception-handler
Now we can make (StateTrans s) an instance of Monad:
> instance Monad (StateTrans s) where
>
> (ST p) >>= k = ST (\s0 -> let (s1,a) = p s0
> (ST q) = k a
> in q s1)
>
> return a = ST (\s -> (s,a))
A function which extracts a state-transformer function and applies it:
> applyST :: StateTrans s a -> s -> (s,a)
> applyST (ST p) s = p s
Couldn't the need for extraction be avoided by defining
type StateTrans s r = s -> (s,r) ?
No-- type synonyms can't be made class instances.
To create a monad, it is not enough just to declare a Haskell instance of
the Monad class with the correct type signatures.
To be a proper monad, the return and >>= functions must work together
according to three laws:
1. (return x) >>= f == f x
return is a left-identity with respect to >>=
2. m >>= return == m
return is a right-identity with respect to >>=
3. (m >>= f) >>= g == m >>= (\x -> f x >>= g)
a kind of associativity law for >>=
Obeying the three laws ensures that the semantics of the do-notation using
the monad will be consistent.
The do-notation
> f1 w = do x <- g1 w
> y <- h1 w
> z <- j1 w x y
> return z -- "return" means "deliver"
In the construction x <- m
¥ m is an instance of a type belonging to a monadic class
¥ x is a new variable, naming the value "returned" by m
(terminology is confusing-- m is not a function)
¥ x's scope extends to the end of the do-expression.
Note: In general, x is a pattern, and this scope rule applies to each
of its variables.
Definition:
do { pat <- exp; rest } = exp >>= (\pat -> do rest)
do { exp; rest } = exp >> do rest
do { let decList; rest } = let decList in do rest
do exp = exp
===================================
Maybe is a monad
> instance Monad Maybe where
> Just x >>= k = k x
> Nothing >>= k = Nothing
> return = Just
> fail s = Nothing
so, e.g., instead of
let lookup :: Database -> Item -> Maybe Item
lookup = ...
in case lookup db x of
Nothing -> Nothing
Just y -> case lookup db y of
Nothing -> Nothing
Just z -> lookup db z
we can write
let lookup :: Database -> Item -> Maybe Item
lookup = ...
in do y <- lookup db x
z <- lookup db y
lookup db z
Another monad is [], i.e., lists:
> instance Monad [ ] where
> (x:xs) >>= f = f x ++ (xs >>= f)
> [] >>= f = []
> return x = [x]
> fail s = []
> mFilter p xs = xs >>= (\x -> if p x then [x] else [])
mFilter odd [0..9]
> mConcat xs = xs >>= id
mConcat ["123","xyz","456",]
==========================================
MonadPlus: a subclass of Monad, adding two names to the Monad interface
(defined in file (Hugs lib/Monad.hs))
> class Monad m => MonadPlus m where
> mzero :: m a
> mplus :: m a -> m a -> m a
Instances of MonadPlus
> instance MonadPlus Maybe where
> mzero = Nothing
> Nothing `mplus` ys = ys -- False || ys = ys
> xs `mplus` ys = xs -- xs || ys = xs
> instance MonadPlus [ ] where
> mzero = []
> mplus = (++)
==========================================
Parsing with monads
-------------------
A "parse" is a pair: (recognized item, rest of input)
parser :: String -> [(a,String)]
Applied to a string, a parser consumes the first 0 or more
characters and returns a list of 0 or more parses.
File (Hugs lib/hugs/ParseLib.hs) defines
¥ some primitive parsers
¥ some parser-combining functions
Haskell's nonstrictness => backtracking is implicit.
Monadic composition => combining parsers is simple.
Parser is a new type:
> newtype Parser a = P (String -> [(a,String)])
("newtype" Å "data")
To apply a parser to a string, we must extract its function:
> papply :: Parser a -> String -> [(a,String)]
> papply (P p) inp = p inp
Parser is declared to be an instance of Monad and of MonadPlus:
> instance Monad Parser where
> -- return :: a -> Parser a
> return v = P (\inp -> [(v,inp)])
>
> -- >>= :: Parser a -> (a -> Parser b) -> Parser b
> (P p) >>= f = P (\inp -> concat [papply (f v) out | (v,out) <- p inp])
> instance MonadPlus Parser where
> -- mzero :: Parser a
> mzero = P (\inp -> [])
>
> -- mplus :: Parser a -> Parser a -> Parser a
> (P p) `mplus` (P q) = P (\inp -> (p inp ++ q inp))
Primitive parsers
(return v) defines a parser that "recognizes" v without consuming any input
mzero is a parser that always "fails"
the parser that returns input's first character:
> item :: Parser Char
> item = P (\inp -> case inp of
> [] -> []
> (x:xs) -> [(x,xs)])
papply item "abc"
Parser combinators
mplus provides parallel (or) combination
do-expressions provide sequential (and) combination
Some simple parsers, recognizing ...
a character satisfying a given predicate:
> sat :: (Char -> Bool) -> Parser Char
> sat p = do {x <- item; if p x then return x else mzero}
papply (sat (\c -> c>'m')) "abc"
papply (sat (\c -> c>'m')) "xyz"
a given character:
> char :: Char -> Parser Char
> char x = sat (\y -> x == y)
papply (char 'a') "abc"
papply (char 'a') "bac"
any digit character:
> digit :: Parser Char
> digit = sat isDigit
papply digit "01234"
a given String:
> string :: String -> Parser String
> string "" = return ""
> string (x:xs) = do {char x; string xs; return (x:xs)}
papply (string "xyz") "xyzpqr"
papply (string "xyz") "1xyzpqr"
Some parser combinators, recognizing ...
either an item recognized by a given parser p or an
item recognized by a given parser q, i.e., either a p or a q:
> (+++) :: Parser a -> Parser a -> Parser a
> p +++ q = first (p `mplus` q)
papply (char 'x' +++ char 'y') "xyz"
papply (char 'x' +++ char 'y') "yxz"
papply (char 'x' +++ char 'y') "zyx"
a sequence of 1 or more p's:
> many1 :: Parser a -> Parser [a]
> many1 p = do {x <- p; xs <- many p; return (x:xs)}
papply (many1 digit) "x01234.5678"
a sequence of 0 or more p's:
> many :: Parser a -> Parser [a]
> many p = many1 p +++ return []
papply (many digit) "01234.5678"
papply (many digit) "x01234.5678"
A simple expression evaluator:
Expr ::= Num | App
App ::= Ô(Õ Expr Op Expr Ô)Õ
Num ::= [-] Digit { Digit }
Op ::= + | - | * | /
Digit ::= 0 | 1 | É | 9
> calc :: IO ()
> calc = do putStr "Expression evaluator"
> run
> run :: IO ()
> run
> = do putStr "\n? "
> s <- getLine
> if s == quitSignal
> then do { putStr "...bye"; return () }
> else
> do putStr (case papply expr s of
> [(v,"")] -> " = "++ show v
> _ -> " *** error")
> run
> where
> expr :: Parser Int
> expr = num +++ app
>
> app :: Parser Int
> app = do char '('
> e1 <- expr
> op <- char '+' +++ char '-' +++
> char '*' +++ char '/'
> e2 <- expr
> char ')'
> case op of
> '+' -> return (e1 + e2)
> '-' -> return (e1 - e2)
> '*' -> return (e1 * e2)
> '/' -> if e2 /= 0
> then return (e1 `div` e2)
> else mzero
> nat :: Parser Int
> nat = do d <- digit
> s <- many digit
> return (read (d:s))
> num = do char '-'
> n <- nat
> return (-n)
> +++ nat
> quitSignal = "#"
More-powerful parsing tools
a sequence of 0 or more p's separated by sep's:
> sepby :: Parser a -> Parser b -> Parser [a]
> p `sepby` sep = (p `sepby1` sep) +++ return []
a sequence of 1 or more p's separated by sep's:
> sepby1 :: Parser a -> Parser b -> Parser [a]
> p `sepby1` sep = do {x <- p; xs <- many (do {sep; p}); return (x:xs)}
a p, preceded by an open and followed by a close:
> bracket :: Parser a -> Parser b -> Parser c -> Parser b
> bracket open p close = do {open; x <- p; close; return x}
papply (bracket (char '<') (many digit `sepby` (char ',')) (char '>')) "<12,34,56>"
a sequence of p's with meaningful left-associative separators:
> chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
> chainl p op v = (p `chainl1` op) +++ return v
>
> chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
> p `chainl1` op = do {x <- p; rest x}
> where
> rest x = do {f <- op; y <- p; rest (x `f` y)}
> +++ return x
a sequence of p's with meaningful right-associative separators:
> chainr :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
> chainr p op v = (p `chainr1` op) +++ return v
>
> chainr1 :: Parser a -> Parser (a -> a -> a) -> Parser a
> p `chainr1` op = do {x <- p; rest x}
> where
> rest x = do {f <- op; y <- p `chainr1` op; return (x `f` y)}
> +++ return x
an operator symbol in a table, returning the corresponding function
> ops :: [(Parser a, b)] -> Parser b
> ops xs = foldr1 (+++) [do {p; return op} | (p,op) <- xs]
> operators :: [(Parser Char, (Int->Int->Int))]
> operators = [ (char '+', (+))
> , (char '-', (-))
> , (char '*', (*))
> , (char '/', div)
> ]
papply (nat `chainl1` (ops operators)) "1+4*3/2"
papply (nat `chainr1` (ops operators)) "1+4*3/2"
Utility parsers
consumes spaces:
> spaces :: Parser ()
> spaces = do {many1 (sat isSpace); return ()}
consumes Haskell-style one-line comments:
> comment :: Parser ()
> comment = do {string "--"; many (sat (\x -> x /= '\n')); return ()}
consumes spaces and comments:
> junk :: Parser ()
> junk = do {many (spaces +++ comment); return ()}
consumes junk and then applies a given parser:
> parse :: Parser a -> Parser a
> parse p = do {junk; p}
applies a given parser and consumes any subsequent junk:
> token :: Parser a -> Parser a
> token p = do {v <- p; junk; return v}
recognizes a given string and consumes any subsequent junk:
> symbol :: String -> Parser String
> symbol xs = token (string xs)
recognizes an identifier:
> ident :: Parser String
> ident = do {x <- lower; xs <- many alphanum; return (x:xs)}
recognizes an identifier provided it is not in a given list of reserved
names:
> identifier :: [String] -> Parser String
> identifier ks = token (do {x <- ident; if not (elem x ks) then return x
> else mzero})
> nest :: Parser Char -> Parser Char -> Parser Int
> nest left right = do left
> n <- nest left right
> right
> return (n+1)
> +++
> return 0
For more on monads:
http://www.haskell.org/haskellwiki/Books_and_tutorials#Using_Monads
Once you're up to speed on monads, have a look at arrows
http://www.haskell.org/haskellwiki/Books_and_tutorials#Using_Arrows
(of which monads are a special case).