Monads May 2006 H. Richards ======================================== This file is available at ~ham/public/MonadsTalk.lhs > module MonadsTalk where -- "module" and "where" are reserved words > import IO (hugsIsEOF) -- a module we'll need later > import ParseLib -- another module we'll need later ======================================== A few Haskell preliminaries In a literate Haskell script, code has '>' in col 1. Program layout is syntactically significant. Haskell is typed statically. Type notation: x :: y means "the type of x is y" f :: x -> y f is a function from x to y Capitalization of names is significant: names of values and variables (including type variables) begin with lowercase names of specific types begin with Uppercase Haskell is "lazy": all computation is driven by demand for results Haskell is referentially transparent, i.e., for any given expression E ¥ the order of evaluation of E's subexpressions does not affect E's value. ¥ Church-Rosser theorem: all terminating evaluations of E end in the same normal form. ¥ replacing any subexpression of E by an equivalent subexpression does not alter E's value. ¥ evaluation of E has no-side effects. ¥ application of any given function to E always yields the same result. ========================================== Monadic I/O ----------- Input is a problem for pure functional languages. The Standard-ML approach to input: inputInt :: Int (using Haskell's type notation) Each evaluation of inputInt reads the next Int from the input stream. Seems straightforward, but consider w = inputInt - inputInt -- order-dependent x = y - y where y = inputInt -- how many items are read? Thanks to the input stream's changing state, referential transparency is done for. ----------------------------------------- Haskell's monadic solution: Keep the input stream's state hidden. IO b -- (polymorphic abstract) type of I/O commands that deliver values of type b. IO is an example of a monad-- a concept from category theory which has proven highly useful in practice (for mare than just IO). some primitives: getChar :: IO Char -- when executed, inputs and delivers a character getLine :: IO String -- when executed, inputs and delivers a String putChar :: Char -> IO () -- a function which, applied to x :: Char, creates an IO action which, when executed, outputs x and delivers nothing () :: () ; cf. C's void putStr :: String -> IO () -- applied to x :: String, creates an IO action which, when executed, outputs x and delivers nothing return :: b -> IO b -- applied to a value x :: b, creates an IO action which delivers x (without performing any IO) An IO command is performed by submitting it at the Hugs prompt. putChar '#' putStr "ACL2" Combining IO commands do command1 command2 command3 defines an IO command which is executed by executing its components in sequence. Alternative syntax: do command1; command2; command3 Example: do putStr "ACL"; putChar '2' some nonprimitives: > putStrLn :: String -> IO () > putStrLn s = do putStr s > putChar '\n' do putStrLn "AC"; putStr "L2" > putNtimes :: Int -> String -> IO () -- Haskell functions are curried > putNtimes n s > = if n < 1 > then return () > else do putStrLn s > putNtimes (n-1) s Delivered values can be named for subsequent reference: do x <- command1 y <- command2 command3 x y Examples > rev1line :: IO () > rev1line = do line <- getLine > putStr (reverse line) > rev2lines :: IO () > rev2lines = do line1 <- getLine > line2 <- getLine > let r1 = reverse line1 > let r2 = reverse line2 > putStrLn r2 > putStrLn r1 Note the difference between '<-' and '=': x <- y gives the name x to the value delivered by y x = y gives the name x to the value of y A bigger example: > palindromes :: IO () > palindromes = do putStrLn "Palindrome detector" > pal 0 0 > where > pal :: Int -> Int -> IO () > pal n np = do line <- getLine > if null line > then do if n>0 > then do putStr "Palindrome frequency: " > putStr (show (100 * fromInt np / fromInt n)) > putStr "%\n" > else putStr "Empty input\n" > return () > else do let letters = map toUpper (filter isAlpha line) > let isPal = letters == reverse letters > putStrLn (if isPal then "yes" else "no") > pal (n+1) (np + oneIf isPal) > > oneIf :: Integral a => Bool -> a > oneIf True = 1 > oneIf _ = 0 Summary: Every Haskell program is an IO command. The purely functional part of the program is about defining the IO commands; execution of the commands is behind the IO abstract-type firewall. ========================================== Useful URLs: http://www.haskell.org/ http://www.nomaware.com/monads/html/ ========================================== To see how monads fit into Haskell, some context is needed. Hugs ==== A command-line interpreter for Haskell. Designed for education and prototyping (fast compilation, slow execution). You can get a copy of the Hugs interpreter for most any platform at http://www.haskell.org/hugs/. A guide to the more common Hugs errors is available at http://www.cs.ukc.ac.uk/people/staff/sjt/craft2e/errors.html. Hugs commands ------------- A line beginning ':' is a command; the one to remember is ":?". All other lines are treated as Haskell expressions to be evaluated. Expressions ----------- Numbers, Booleans 3+4 == 9-2 Conditional expressions if 3 < 7 then "yes" else "no" functional (lambda-) expressions: \x -> x*5 -- Application of the \-expression binds x to the argument. -- Scope of x is limited to the \-expression's body. Definitions and Scripts ======================= Names with global scope are defined in "scripts". [BTW-- unlike Scheme, Hugs does not allow global names to be defined in commands.] This file is a script; ".lhs" means "literate Haskell Script". In literate scripts, Haskell-code lines begin with ">"; all others are comments. A script defines a module. > times5 = \x -> x*5 > mby5 x = x*5 Non-literate scripts' names end in ".hs". Prime example: Prelude.hs /usr/local/lib/hugs/lib/Prelude.hs Hugs loads Prelude.hs automatically as it starts up. Syntax note ----------- Indentation is significant. Indentation of first nonblank after each "where", "let", "do", or "of" (each of which introduces a set of definitions) establishes a boundary; a line beginning ¥ to the right of the boundary is part of the same definition ¥ on the boundary begins a new definition at same level ¥ to the left of the boundary ends the set of definitions Use of indentation is optional ((http://www.haskell.org/onlinereport/lexemes.html#sect2.7)); one can use braces {} and semicolons instead. > module HaskellTutorial where { times5 = \x -> x*5; mby5 x = x*5 } Local definitions let x = 3 in x+5 -- a "let-expression" let x = 3*y; y = 5 in x*y+2 -- order of defs in a let doesn't matter > a = b+c where b = c*2; c = 7 -- "where" can be used only in definitions (no such thing as a where-expression) > a = b+c > where > b = c*2 > c = 7 Syntax note: Infix operators have conventional precedences and associativities (defined in Prelude.hs) Except for infix operators, functions are applied by prefixing their names to their arguments. Parentheses are used only for grouping, and function application takes precedence over infix operators, so f a+x b-y means (f a) + (x b) - y *not* f (a+x) (b-y) Sidelight: An infix operator can be converted to a name by enclosing it in parentheses: (*) x y = x * y and a name can be converted to an infix operator by enclosing it in back-quotes: x `mod` y = mod x y Types in Haskell ================ TypeExp ::= TypeName | TypeVar | ( TypeExp ) | FunType | ... FunType ::= TypeExp -> FunType | TypeExp -- note that -> is right-associative TypeName ::= Upper { Letter } TypeVar ::= Lower { Letter } Examples: Int Float Bool Char String Integer Rational a b c Int -> Char -> Int -- function of 2 parameters returning Int, e.g., \x y -> x + ord y Int -> (Char -> Int) -- function of 1 parameter returning (function of 1 parameter), e.g., \x -> \y -> x + ord y ... but these are the **same** function, and hence the two type expressions denote the same type. (This is what makes currying work.) (Int -> Char) -> Int -- function of 1 parameter, e.g., \f -> ord (f (ord 'p')) -- no type variables => this type is monomorphic a -> b -> Char -- function of 2 parameters, e.g., \x y -> 'z'; -- parameters' types are type variables, so -- this type is polymorphic. a -> a -> Int -- function of 2 parameters \x y -> length [x,y] -- parameters may be any type, but must be the *same* type Type declarations are (nearly always) optional; Hindley-Milner => types can be inferred from program text. "If your program passes the type check, it's probably correct." Declaring types makes good documentation, improves type-error messages. Hugs command for determining exp's type: > :t Tuples ====== A handy means of packaging heterogeneous collections of fixed size (i.e, a tuple's size is inherent in its type). > (1,False,"move") :: (Integer,Bool,String) > sumDif :: Int -> Int -> (Int,Int) > sumDif x y = (x+y,x-y) Tuples are "unpacked" by pattern-matching: > firstOf2 :: (a,b) -> a > firstOf2 (x,_) = x > secondOf2 :: (a,b) -> b > secondOf2 (_,y) = y Lists ===== As in Scheme, the workhorse data structure is the list. Lists are collections of variable size (i.e., a list's size is not part of its type). Lists are (unlike Scheme) homogeneous. Defining lists -------------- list-construction operator --------------------------- The list-building primitive (i.e., nonstrict, polymorphically typed cons) is (:) :: a -> [a] -> [a] -- pronounced "followed-by" So [1,2,3] = 1 : [2,3] = 1 : (2 : [3]) = 1 : (2 : (3 : [])) = 1 : 2 : 3 : [] (right-associative, to avoid ()'s) > fromTo :: Int -> Int -> [Int] > fromTo x y > | x <= y = x : fromTo (x+1) y > | otherwise = [] > threes = 3 : threes Deconstructing lists -------------------- (:) is also used in patterns. When (x:xs) matches a list, ¥ the list is not empty ¥ its first element is named x ¥ the rest of the list (possibly empty) is named xs. head (x:_) = x tail (_:xs) = xs > doubleAll :: [Int] -> [Int] > doubleAll [] = [] > doubleAll (n:ns) = 2*n : doubleAll ns A few list-processing functions ---------------------------------------- > length :: [a] -> Int > length [] = 0 > length (_:xs) = 1 + length xs Generalizing doubleAll, > map f [] = [] > map f (x:xs) = f x : map f xs > doubleAll = map (\x->2*x) concatenation: > (++) :: [a] -> [a] -> [a] > [] ++ ys = ys > (x:xs) ++ ys = x : (xs ++ ys) membership test: > _ `elt` [] = False > x `elt` (y:ys) = x==y || x `elt` ys insertion sort: > inSort [] = [] > inSort (x:xs) = insert x (inSort xs) > where > insert x [] = [x] > insert x (y:ys) > | x | otherwise = y : insert x ys Restricted Types, Type Classes, and Overloading =============================================== Note absence of type declarations for `elt` and `inSort`. The declaration elt :: a -> [a] -> Bool would be too general. elt applies (==) to its arguments, and some types do not support equality tests (e.g., functions). The types that DO support (==) (and (/=) also) form the "type class" Eq. The class definition: > class Eq a where > (==), (/=) :: a -> a -> Bool -- Eq's "signature" Meaning: Any type a belonging to class Eq supports the functions (==) and (/=), both of which take two arguments of type a and return a result of type Bool. (==) and (/=) are said to be overloaded for the Eq types. ---------- Instance declarations --------------------- A specific type belongs to a type class as a consequence of an "instance declaration", which includes equations defining the class's signature functions. > instance Eq Bool where > > True == True = True > False == False = True > _ == _ = False > > x /= y = not (x === y) Specifying restrictions ----------------------- Now we can specify elt's type: > elt :: Eq a => a -> [a] -> Bool Meaning: elt is defined for all types in the Eq class. "Eq a =>" is a "context" which restricts the range of type variable a. elt inherits the overloading of (==). -------- :t inSort Default equations ----------------- Implementation equations can be moved from instance declarations to class definitions: > class Eq a where > (==), (/=) :: a -> a -> Bool -- Eq's "signature" > > x==y = not (x/=y) -- "default equation" > x/=y = not (x==y) -- "default equation" The defaults can be overridden in any instance declaration (for instances of Eq, at least one of the two defaults MUST be overridden). Type subclasses --------------- The list-sorting functions use functions (<), (>), (<=), (>=), which --like (==)-- are defined for many types but not all. These types are members of class Ord. Any type supporting (<) etc. also supports (==), but not vice versa. Hence Ord is a subclass of Eq: > class (Eq a) => Ord a where > compare :: a -> a -> Ordering > (<), (<=), (>=), (>) :: a -> a -> Bool > max, min :: a -> a -> a > > compare x y | x==y = EQ > | x<=y = LT > | otherwise = GT > > x <= y = compare x y /= GT > x < y = compare x y == LT > x >= y = compare x y /= LT > x > y = compare x y == GT > > max x y | x >= y = x > | otherwise = y > min x y | x <= y = x > | otherwise = y ------------------------------------------------------ | What's Ordering? Its definition: | | | | > data Ordering = LT | EQ | GT | | | | For comparison, here's the definition of Bool: | | | | > data Bool = False | True | | | ------------------------------------------------------ Some other standard (i.e., Prelude) classes =========================================== Show Read Num (Eq, Show) Real (Num, Ord) Integral (Real, Enum) : Int, Integer Fractional (Num) Floating (Fractional) RealFrac (Real, Fractional) : Rational RealFloat (RealFrac, Floating) : Float, Double Enum: Int, Integer, Char, Bool, Float, Double Bounded: Bool, Char, Int numerals are overloaded ----------------------- 7 :: Num a => a 7.0 :: Fractional a => a Defining new types ================== type synonyms (cf. C's typedef): new names for existing types ------------- > type String = [Char] > type IntToChar :: Int -> Char truly new types: Algebraic Types -------------------------------- enumeration type: > data Direction = North | East | South | West > deriving (Show, Read, Eq, Ord, Bounded, Enum) > heading :: Direction -> Float > heading North = 0 > heading East = 90 > heading South = 180 > heading West = 270 product type: > data Point = Pt Float Float deriving (Eq, Show, Read) > distance :: Point -> Point -> Float > distance (Pt x0 y0) (Pt x1 y1) = sqrt ((x0-x1)^2 + (y0-y1)^2) distance (Pt 1 2) (Pt 4 6) sum-of-products (discriminated union) type: > data Temp = Celsius Float | Fahrenheit Float > deriving (Show, Read) > toCelsius :: Temp -> Temp > toCelsius (Fahrenheit t) = Celsius (5 / 9 * (t - 32)) > toCelsius x = x > degreesC :: Temp -> Float > degreesC (Celsius t) = t > degreesC t = degreesC (toCelsius t) > instance Eq Temp where > t1 == t2 = degreesC t1 == degreesC t2 > instance Ord Temp where > t1 <= t2 = degreesC t1 <= degreesC t2 new polymorphic types > data Pair a b = Pr a b deriving (Eq, Ord) Pr is a value constructor; Pr :: a -> b -> Pair a b Pair is a *type* constructor: applied to two type arguments, it yields a type. > first :: Pair a b -> a > first (Pr x _) = x > instance (Show a, Show b) => Show (Pair a b) where > show (Pr x y) = "<" ++ show x ++ "," ++ show y ++ ">" new recursive types > data List a = Nill | Cons a (List a) deriving (Eq,Ord,Show,Read) mimics the built-in list types [] :: [a] Nill :: List a (:) :: a -> [a] -> [a] Cons :: a -> List a -> List a infix constructors (first character is ':') > data Expr = Lit Integer | Expr :+: Expr | Expr :-: Expr | Expr :*: Expr > eval :: Expr -> Integer > eval (Lit n) = n > eval (e0 :+: e1) = eval e0 + eval e1 > eval (e0 :-: e1) = eval e0 - eval e1 > eval (e0 :*: e1) = eval e0 * eval e1 eval ((Lit 3 :+: Lit 5) :*: Lit 9) Trees ----- > data Tree a = Nil | Node a (Tree a) (Tree a) deriving (Show) > depth :: Tree a -> Int > depth Nil = 0 > depth (Node _ t0 t1) = 1 + depth t0 `max` depth t1 =========================================================== Classes can contain not only types but also type constructors. > class Functor f where > fmap :: (a -> b) -> (f a -> f b) Since a and b are types, the type expressions f a and f b are valid only if f is a type constructor (only type contructors can be applied to types). So Functor is a type-constructor class. Every type constructor in Functor provides an implementation of fmap. > instance Functor [] where -- ([] a) means [a] > fmap = map > instance Functor Tree where > fmap f Nil = Nil > fmap f (Node x t0 t1) = Node (f x) (fmap f t0) (fmap f t1) Using fmap, we can define functions applicable to trees, lists, and other container "types". > doubleAll5 :: (Functor f, Num n) => f n -> f n -- good for lists, trees, and > doubleAll5 = fmap (\x -> 2*x) -- any other data structure in Functor doubleAll5 [1,2,3] doubleAll5 (Node 1 (Node 2 Nil (Node 3 Nil Nil)) Nil) A very small container "type": > data Maybe a = Nothing | Just a -- contains at most one item > deriving (Eq, Ord, Read, Show) > instance Functor Maybe where > fmap f Nothing = Nothing > fmap f (Just x) = Just (f x) Maybe types are useful for making partial functions total. div :: Integral a => a -> a -> a > totalDiv :: Integral a => a -> a -> Maybe a > totalDiv n d > | d /= 0 = Just (n `div` d) > | otherwise = Nothing ============================= A much more important type-constructor class: Monad Easing into monads... Three equivalent definitions of f as a functional composition of g, h, and j: > f x = j (h (g x)) > f x = let y = g x > z = h y > w = j z > in w > f = j . h . g -- nice and simple, no superfluous names Note: (.) :: (a -> b) -> (c -> a) -> (c -> b) Each of g, h, and j consumes the *entire* value returned by its predecessor. Sometimes that's too simple. Consider this C function: int f1( float w ) { const char x = g1(w); const double y = h1(w); const int z = j1(w,x,y); return z; } This isn't a simple composition of g1, h1, and j1. Also, g1, h1, and j1 may perform "invisible" updates of "state" (i.e., global variables, IO streams, etc.). Mimicking this structure in Haskell: > f1 w s0 = let (s1,x) = g1 w s0 > (s2,y) = h1 w s1 > (s3,z) = j1 w x y s2 > in (s3,z) in which s0, s1, s2, s3 :: State -- successive states Hence g1 w :: State -> (State, a) h1 w :: State -> (State, b) j1 w x y :: State -> (State, c) f1 w :: State -> (State, d) We would like to define structures like this without all the intermediate-state names. So we define a new type > data StateTrans s r = ST (s -> (s,r)) where ¥ s is some "state" type ¥ r is a result type ¥ (StateTrans s r) is a "state transformer" which, applied to a state, returns a new state paired with a separate result and a new kind of composition: > (>>=) :: StateTrans s a -> (a -> StateTrans s b) -> StateTrans s b > > f >>= g = \s0 -> let (s1,x) = f s0 in g x s1 "f bind g" and a function that takes a value into a state "transformer" which returns a given state (unchanged), paired with the value as result: > return :: a -> StateTrans s a > return res = \s0 -> (s0,res) Now we can define f1 without naming the states (which are thereby hidden): > f1 w = g1 w >>= > (\x -> h1 w >>= > (\y -> j1 w x y >>= > (\z -> return z))) A variant of (>>=) is (>>), which discards its first argument's result: > f >> g = f >>= (\ _ -> g ) "f then g" These functions are useful in many applications ¥ input and output ¥ exception handling ¥ nondeterminism ¥ parsing ¥ hiding mutable arrays so we overload them by defining a class of which they're the signature: > class Monad m where > return :: a -> m a > (>>=) :: m a -> (a -> m b) -> m b > (>>) :: m a -> m b -> m b > fail :: String -> m a -- allows for custom exception-handling Minimal complete definition: (>>=), return > p >> q = p >>= \ _ -> q > > fail s = error s -- default exception-handler Now we can make (StateTrans s) an instance of Monad: > instance Monad (StateTrans s) where > > (ST p) >>= k = ST (\s0 -> let (s1,a) = p s0 > (ST q) = k a > in q s1) > > return a = ST (\s -> (s,a)) A function which extracts a state-transformer function and applies it: > applyST :: StateTrans s a -> s -> (s,a) > applyST (ST p) s = p s Couldn't the need for extraction be avoided by defining type StateTrans s r = s -> (s,r) ? No-- type synonyms can't be made class instances. To create a monad, it is not enough just to declare a Haskell instance of the Monad class with the correct type signatures. To be a proper monad, the return and >>= functions must work together according to three laws: 1. (return x) >>= f == f x return is a left-identity with respect to >>= 2. m >>= return == m return is a right-identity with respect to >>= 3. (m >>= f) >>= g == m >>= (\x -> f x >>= g) a kind of associativity law for >>= Obeying the three laws ensures that the semantics of the do-notation using the monad will be consistent. The do-notation > f1 w = do x <- g1 w > y <- h1 w > z <- j1 w x y > return z -- "return" means "deliver" In the construction x <- m ¥ m is an instance of a type belonging to a monadic class ¥ x is a new variable, naming the value "returned" by m (terminology is confusing-- m is not a function) ¥ x's scope extends to the end of the do-expression. Note: In general, x is a pattern, and this scope rule applies to each of its variables. Definition: do { pat <- exp; rest } = exp >>= (\pat -> do rest) do { exp; rest } = exp >> do rest do { let decList; rest } = let decList in do rest do exp = exp =================================== Maybe is a monad > instance Monad Maybe where > Just x >>= k = k x > Nothing >>= k = Nothing > return = Just > fail s = Nothing so, e.g., instead of let lookup :: Database -> Item -> Maybe Item lookup = ... in case lookup db x of Nothing -> Nothing Just y -> case lookup db y of Nothing -> Nothing Just z -> lookup db z we can write let lookup :: Database -> Item -> Maybe Item lookup = ... in do y <- lookup db x z <- lookup db y lookup db z Another monad is [], i.e., lists: > instance Monad [ ] where > (x:xs) >>= f = f x ++ (xs >>= f) > [] >>= f = [] > return x = [x] > fail s = [] > mFilter p xs = xs >>= (\x -> if p x then [x] else []) mFilter odd [0..9] > mConcat xs = xs >>= id mConcat ["123","xyz","456",] ========================================== MonadPlus: a subclass of Monad, adding two names to the Monad interface (defined in file (Hugs lib/Monad.hs)) > class Monad m => MonadPlus m where > mzero :: m a > mplus :: m a -> m a -> m a Instances of MonadPlus > instance MonadPlus Maybe where > mzero = Nothing > Nothing `mplus` ys = ys -- False || ys = ys > xs `mplus` ys = xs -- xs || ys = xs > instance MonadPlus [ ] where > mzero = [] > mplus = (++) ========================================== Parsing with monads ------------------- A "parse" is a pair: (recognized item, rest of input) parser :: String -> [(a,String)] Applied to a string, a parser consumes the first 0 or more characters and returns a list of 0 or more parses. File (Hugs lib/hugs/ParseLib.hs) defines ¥ some primitive parsers ¥ some parser-combining functions Haskell's nonstrictness => backtracking is implicit. Monadic composition => combining parsers is simple. Parser is a new type: > newtype Parser a = P (String -> [(a,String)]) ("newtype" Å "data") To apply a parser to a string, we must extract its function: > papply :: Parser a -> String -> [(a,String)] > papply (P p) inp = p inp Parser is declared to be an instance of Monad and of MonadPlus: > instance Monad Parser where > -- return :: a -> Parser a > return v = P (\inp -> [(v,inp)]) > > -- >>= :: Parser a -> (a -> Parser b) -> Parser b > (P p) >>= f = P (\inp -> concat [papply (f v) out | (v,out) <- p inp]) > instance MonadPlus Parser where > -- mzero :: Parser a > mzero = P (\inp -> []) > > -- mplus :: Parser a -> Parser a -> Parser a > (P p) `mplus` (P q) = P (\inp -> (p inp ++ q inp)) Primitive parsers (return v) defines a parser that "recognizes" v without consuming any input mzero is a parser that always "fails" the parser that returns input's first character: > item :: Parser Char > item = P (\inp -> case inp of > [] -> [] > (x:xs) -> [(x,xs)]) papply item "abc" Parser combinators mplus provides parallel (or) combination do-expressions provide sequential (and) combination Some simple parsers, recognizing ... a character satisfying a given predicate: > sat :: (Char -> Bool) -> Parser Char > sat p = do {x <- item; if p x then return x else mzero} papply (sat (\c -> c>'m')) "abc" papply (sat (\c -> c>'m')) "xyz" a given character: > char :: Char -> Parser Char > char x = sat (\y -> x == y) papply (char 'a') "abc" papply (char 'a') "bac" any digit character: > digit :: Parser Char > digit = sat isDigit papply digit "01234" a given String: > string :: String -> Parser String > string "" = return "" > string (x:xs) = do {char x; string xs; return (x:xs)} papply (string "xyz") "xyzpqr" papply (string "xyz") "1xyzpqr" Some parser combinators, recognizing ... either an item recognized by a given parser p or an item recognized by a given parser q, i.e., either a p or a q: > (+++) :: Parser a -> Parser a -> Parser a > p +++ q = first (p `mplus` q) papply (char 'x' +++ char 'y') "xyz" papply (char 'x' +++ char 'y') "yxz" papply (char 'x' +++ char 'y') "zyx" a sequence of 1 or more p's: > many1 :: Parser a -> Parser [a] > many1 p = do {x <- p; xs <- many p; return (x:xs)} papply (many1 digit) "x01234.5678" a sequence of 0 or more p's: > many :: Parser a -> Parser [a] > many p = many1 p +++ return [] papply (many digit) "01234.5678" papply (many digit) "x01234.5678" A simple expression evaluator: Expr ::= Num | App App ::= Ô(Õ Expr Op Expr Ô)Õ Num ::= [-] Digit { Digit } Op ::= + | - | * | / Digit ::= 0 | 1 | É | 9 > calc :: IO () > calc = do putStr "Expression evaluator" > run > run :: IO () > run > = do putStr "\n? " > s <- getLine > if s == quitSignal > then do { putStr "...bye"; return () } > else > do putStr (case papply expr s of > [(v,"")] -> " = "++ show v > _ -> " *** error") > run > where > expr :: Parser Int > expr = num +++ app > > app :: Parser Int > app = do char '(' > e1 <- expr > op <- char '+' +++ char '-' +++ > char '*' +++ char '/' > e2 <- expr > char ')' > case op of > '+' -> return (e1 + e2) > '-' -> return (e1 - e2) > '*' -> return (e1 * e2) > '/' -> if e2 /= 0 > then return (e1 `div` e2) > else mzero > nat :: Parser Int > nat = do d <- digit > s <- many digit > return (read (d:s)) > num = do char '-' > n <- nat > return (-n) > +++ nat > quitSignal = "#" More-powerful parsing tools a sequence of 0 or more p's separated by sep's: > sepby :: Parser a -> Parser b -> Parser [a] > p `sepby` sep = (p `sepby1` sep) +++ return [] a sequence of 1 or more p's separated by sep's: > sepby1 :: Parser a -> Parser b -> Parser [a] > p `sepby1` sep = do {x <- p; xs <- many (do {sep; p}); return (x:xs)} a p, preceded by an open and followed by a close: > bracket :: Parser a -> Parser b -> Parser c -> Parser b > bracket open p close = do {open; x <- p; close; return x} papply (bracket (char '<') (many digit `sepby` (char ',')) (char '>')) "<12,34,56>" a sequence of p's with meaningful left-associative separators: > chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a > chainl p op v = (p `chainl1` op) +++ return v > > chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a > p `chainl1` op = do {x <- p; rest x} > where > rest x = do {f <- op; y <- p; rest (x `f` y)} > +++ return x a sequence of p's with meaningful right-associative separators: > chainr :: Parser a -> Parser (a -> a -> a) -> a -> Parser a > chainr p op v = (p `chainr1` op) +++ return v > > chainr1 :: Parser a -> Parser (a -> a -> a) -> Parser a > p `chainr1` op = do {x <- p; rest x} > where > rest x = do {f <- op; y <- p `chainr1` op; return (x `f` y)} > +++ return x an operator symbol in a table, returning the corresponding function > ops :: [(Parser a, b)] -> Parser b > ops xs = foldr1 (+++) [do {p; return op} | (p,op) <- xs] > operators :: [(Parser Char, (Int->Int->Int))] > operators = [ (char '+', (+)) > , (char '-', (-)) > , (char '*', (*)) > , (char '/', div) > ] papply (nat `chainl1` (ops operators)) "1+4*3/2" papply (nat `chainr1` (ops operators)) "1+4*3/2" Utility parsers consumes spaces: > spaces :: Parser () > spaces = do {many1 (sat isSpace); return ()} consumes Haskell-style one-line comments: > comment :: Parser () > comment = do {string "--"; many (sat (\x -> x /= '\n')); return ()} consumes spaces and comments: > junk :: Parser () > junk = do {many (spaces +++ comment); return ()} consumes junk and then applies a given parser: > parse :: Parser a -> Parser a > parse p = do {junk; p} applies a given parser and consumes any subsequent junk: > token :: Parser a -> Parser a > token p = do {v <- p; junk; return v} recognizes a given string and consumes any subsequent junk: > symbol :: String -> Parser String > symbol xs = token (string xs) recognizes an identifier: > ident :: Parser String > ident = do {x <- lower; xs <- many alphanum; return (x:xs)} recognizes an identifier provided it is not in a given list of reserved names: > identifier :: [String] -> Parser String > identifier ks = token (do {x <- ident; if not (elem x ks) then return x > else mzero}) > nest :: Parser Char -> Parser Char -> Parser Int > nest left right = do left > n <- nest left right > right > return (n+1) > +++ > return 0 For more on monads: http://www.haskell.org/haskellwiki/Books_and_tutorials#Using_Monads Once you're up to speed on monads, have a look at arrows http://www.haskell.org/haskellwiki/Books_and_tutorials#Using_Arrows (of which monads are a special case).