In the previous post, we have written a toy example program in Type Level Haskell. This was hardly satisfying; time to write something more sophisticated - a parser. Since Haskellers know better than to handwrite parsers, we will make a small parser combinator library, taking inspiration from parsec.
This post is Literate Haskell. You can copy’n’paste it and watch your CPU heat up while GHC checks these bloated types.
Boilerplate out of the way!
{-# LANGUAGE DataKinds, TypeOperators, TypeInType, TypeFamilies, RankNTypes,
UndecidableInstances, GADTs, ConstraintKinds, TypeApplications #-}
module TypoParsec where
import Prelude hiding (Char)
import GHC.TypeLits
type Char = Nat
type family a && b where
True && b = b
False && _ = False
infixr 3 &&
type family If cond a b where
If True a _ = a
If False _ b = b
Parser
Our parser definition is very simple. There are three basic actions (consume a character, produce a value, fail) and two ways to combine them: sequential and parallel. We will not bother with error messages.
data Parser a where
Consume :: Parser Nat
Return :: a -> Parser a
Fail :: Parser a
(:>>=) :: Parser b -> (b ~> Parser a) -> Parser a
(:<|>) :: Parser a -> Parser a -> Parser a
type family RunParser (p :: Parser a) (s :: [Char]) :: Maybe (a, [Char])
The basic actions are easy:
type instance RunParser Consume (x ': xs) = Just '(x, xs)
type instance RunParser Consume '[] = Nothing
type instance RunParser (Return x) xs = Just '(x, xs)
type instance RunParser Fail _ = Nothing
For sequential composition, we need the continuations to be defunctionalized. But since this is a library, it would be unwise to use a closed data type. Let’s define generic defunctionalized functions:
data a ~> b where F :: sym -> (a ~> b)
type family Apply (f :: from ~> to) (x :: from) :: to
type family MkFun a b sym :: a ~> b where MkFun a b sym = F sym
For example, let’s define a function that will be useful later - that checks if a character is in range:
type IsBetween lo hi = MkFun Char Bool (IsBetween_ lo hi)
data IsBetween_ lo hi
type instance Apply (F (IsBetween_ lo hi)) c = lo <=? c && c <=? hi
Unit tests
This function is non-trivial, we should test it. Unit tests in TLH are easy:
-- like 'Dict' from ekmett's constraints library
data Test c where Test :: c => Test c
= Test @
isBetweenTests Apply (IsBetween 2 4) 1 ~ False
( Apply (IsBetween 2 4) 2 ~ True
, Apply (IsBetween 2 4) 3 ~ True
, Apply (IsBetween 2 4) 4 ~ True
, Apply (IsBetween 2 4) 5 ~ False
, )
That’s it. Use of the Test
data constructor will force the compiler to evaluate these constraints. If they can’t be satisfied, the code won’t even compile!
Back to Parser
.
Sequential composition
type instance RunParser (p :>>= k) s = RunParserSeqHandleResult (RunParser p s) k
type family RunParserSeqHandleResult result k where
RunParserSeqHandleResult (Just '(x, xs)) k = RunParser (Apply k x) xs
RunParserSeqHandleResult Nothing _ = Nothing
Does it work? Basic test:
type BasicTest = Consume :>>= Const Consume Char
= Test @( RunParser BasicTest [1,2,3] ~ Just '( 2 , '[3]) ) basicTest
To make it useful, we need some standark monadic combinators:
-- Const x b :: a ~> b where x :: a
-- Apply (Const x t) y = x
-- where y :: t
type family Const x b where Const (x :: a) (b :: k) = MkFun b a (Const_ x)
data Const_ x
type instance Apply (F (Const_ x)) _ = x
-- p1 *> p2 -- parse p1 and p2, return result of p2
type family p1 *> p2 where (p1 :: Parser a) *> (p2 :: Parser b) = p1 :>>= Const p2 a
-- p1 <* p2 -- parse p1 and p2, return result of p1.
type family p1 <* p2 where
p1 :: Parser a) <* (p2 :: Parser b) = p1 :>>= MkFun a (Parser a) (FollowedByCont_ a b p2)
(
data FollowedByCont_ a b p2
type instance Apply (F (FollowedByCont_ a b p2)) x = p2 *> Return x
= Test @( RunParser (CharP 1 <* CharP 2) '[1,2] ~ Just '( 1, '[] ) ) followedByTest
-- Fmap :: (a -> b) -> Parser a -> Parser b
type Fmap (f :: a -> b) (p :: Parser a) = p :>>= FmapCont f
type FmapCont (f :: a -> b) = MkFun a (Parser b) (FmapCont_ a b f)
data FmapCont_ a b (f :: a -> b)
type instance Apply (F (FmapCont_ a b f)) (x :: a) = Return (f x)
= Test @( RunParser (Just `Fmap` Return 1) '[] ~ Just '( Just 1 , '[]) ) fmapTest
-- Ap :: Parser (a -> b) -> Parser a -> Parser b
type Ap (pf :: Parser (a -> b)) (pa :: Parser a) = pf :>>= MkFun (a -> b) (Parser b) (ApCont0 a b pa)
data ApCont0 a b pa; type instance Apply (F (ApCont0 a b pa)) (f :: a -> b) = Fmap f pa
We can now write Applicative-style parsers:
= Test @( RunParser ( '(,) `Fmap` Consume `Ap` Consume ) [1,2,3]
fmapApTest ~ Just '( '(1, 2), '[3] ) )
Single-character utilities
-- Satisfy :: (Char ~> Bool) -> Parser Char
-- Consume character satisfying given predicate and return it
type Satisfy pred = Consume :>>= MkFun Char (Parser Char) (SatisfyCont pred)
data SatisfyCont pred
type instance Apply (F (SatisfyCont pred)) (c :: Char) =
If (Apply pred c)
Return c)
(Fail
-- CharP :: Char -> Parser Char
-- Consume exactly the given character
type CharP c = Satisfy (IsBetween c c)
= Test @( RunParser (CharP 1) '[1] ~ Just '( 1, '[] ) )
charTest1 = Test @( RunParser (CharP 1) '[2] ~ Nothing ) charTest2
Alternative
type instance RunParser (a :<|> b) s = RunParserAltHandleResult (RunParser a s) b s
type family RunParserAltHandleResult result b s where
RunParserAltHandleResult (Just r) _ _ = Just r
RunParserAltHandleResult Nothing b s = RunParser b s
= Test @
alternativeTests RunParser (Return 1 :<|> Return 2) '[] ~ Just '( 1, '[] )
( RunParser (Fail :<|> Return 1) '[] ~ Just '( 1, '[] )
, -- failed branch shouldn't consume input
RunParser ((Consume *> Fail) :<|> Consume) '[1] ~ Just '( 1, '[] )
, )
This lets us implement the many
combinator:
-- Many p -- parse p until it fails, return list of results.
type family Many p where Many (p :: Parser a) = (p :>>= ManyCont a p) :<|> Return '[]
type ManyCont a (p :: Parser a) = MkFun a (Parser [a]) (ManyCont_ a p)
data ManyCont_ a p; type instance Apply (F (ManyCont_ a p)) x = Fmap ('(:) x) (Many p)
= Test @( RunParser (Many Consume) '[] ~ Just '( '[], '[] ) )
manyTest1 = Test @( RunParser (Many Consume) '[1] ~ Just '( '[1], '[] ) )
manyTest2 = Test @( RunParser (Many Consume) '[1,2] ~ Just '( '[1,2], '[] ) )
manyTest3 = Test @( RunParser (Many (CharP 1)) '[1,1,1,2] ~ Just '( '[1,1,1], '[2] ) ) manyTest4
type Many1 p = '(:) `Fmap` p `Ap` Many p
A real example
For a test of this machinery, let’s build a s-expression parser.
data SExpr = Atom [Char] | List [SExpr]
For simplicity, atoms will be strings of lowercase Latin letters.
type IsAlpha = IsBetween 97 122 -- 'a'..'z'
type AtomP = Fmap Atom (Many1 (Satisfy IsAlpha))
= Test @( RunParser AtomP '[97,98,32] ~ Just '( Atom '[97,98], '[32] ) ) atomTest1
type IsSpace = IsBetween 32 32 -- ' '
type WhiteSpace = Many (Satisfy IsSpace)
type LeftParenC = 40 -- '('
type RightParenC = 41 -- ')'
type ListP = Fmap List (CharP LeftParenC *> Many ( WhiteSpace *> LazySExprP ) <* CharP RightParenC)
= Test @( RunParser ListP '[40,97,32,98,99,41]
listTest1 ~ Just '( List '[Atom '[97], Atom '[98,99]], '[] ) )
= Test @( RunParser ListP '[40,41] ~ Just '( List '[], '[] ) )
listTest2 = Test @( RunParser ListP '[40,40,41,32,40,41,41]
listTest3 ~ Just '( List '[List '[], List '[]], '[] ) )
type family SExprP where SExprP = AtomP :<|> ListP
Tying the knot
Notice we used LazySExprP
in the definition of ListP
. Otherwise we would create a cycle in type family declarations (using type families would put GHC in an infinite loop). We need to introduce another combinator, Lazy
.
-- Lazy p -- works the same as `Apply p '()`.
type Lazy (p :: () ~> Parser a) = Return '() :>>= p
= Test @( RunParser (Lazy (Const (Return 42) ())) '[] ~ Just '( 42, '[] ) ) lazyTest1
type family LazySExprP :: Parser SExpr where LazySExprP = Lazy (F "SExprP")
type instance Apply (F "SExprP") '() = SExprP
Final test:
-- "(a (c d) j kl)" -> List [Atom "a", List [Atom "c", Atom "d"], Atom "j", Atom "kl"]
= Test @( RunParser SExprP '[40,97,32,40,99,32,100,41,32,32,106,107,41]
sexprTest1 ~ Just '( List '[Atom '[97], List '[Atom '[99], Atom '[100]], Atom '[106,107]], '[] ) )
-- "(a b" -> failure
= Test @( RunParser SExprP '[40,97,32,98] ~ Nothing )
sexprTest2
-- "1" -> failure
= Test @( RunParser SExprP '[49] ~ Nothing ) sexprTest3
It works!
Conclusions
TLH is a very powerful language. We can write parsers with little effort, just like in Haskell. It also offers very good testing facilities - tests are ran automatically during compilation.