Template Haskell tutorial
Published on December 24, 2017, last updated October 31, 2021
- Motivation
-
The
Qmonad - Splicing
- Limitations of TH
- Quotation
- Typed expressions
-
A few words about
runQ - Names
- Retrieving information about things
- Example 1: instance generation
- Viewing the generated code
- Lifting Haskell values into TH expressions
- Example 2: creating refined values at compile time
-
Running
IOinQ -
Example 3: the
file-embedpackage - Conclusion
The tutorial aims to introduce the reader to Template Haskell (TH)—the language extension that adds meta-programming capabilities to the Haskell language. Here I assume some familiarly with Haskell, perhaps beginner or intermediate level, although these terms are rather nebulous and subjective. To express the prerequisites in a more tangible form: if you know what a monad is, you should probably be OK.
TH has the reputation of being an expert-level topic that mere mortals are not prepared to comprehend. I don’t think this is so. The ideas behind TH are simple and make sense, while specific details can be always looked up in the Haddocks.
The tutorial cannot possibly cover every use of TH, and so it is structured in such a way so we only get to see the most common, conventional, and benign uses of this GHC feature.
Motivation
One of the main difficulties with TH is perhaps deciding whether it is the best solution to a problem at hand. Writing code that generates code is generally considered an indication that the tools of expression provided by the language and/or programmer’s imagination have failed to address a particular problem and meta-programming is used as a last resort to get things done. True or not, TH is quite popular and so knowing your way around it is a valuable skill that can be used to do things that often cannot be achieved otherwise.
Let’s list some uses of TH:
-
Automatic deriving of type class instances is still perhaps the most common use case for TH. Even though the same problem can often be addressed by generics, they are known to make compilation times longer (compared to TH-based solutions), so TH is still the preferred method of automatic instance derivation in libraries like
aesonandlens. -
Creation of TH DSLs that are integrated into systems built in Haskell. Examples of such DSLs are the language for model declaration used in
persistent, and various other mini-languages used in theyesodweb framework. -
Compile-time construction of values of refined types that turns invalid inputs into compilation failures.
-
Compile-time loading and processing of data from external files, which is very useful sometimes. Even though this involves running
IOduring compilation, it’s a relatively innocent use case of that dangerous feature.
Reasons not to use TH:
-
TH helpers are often viewed as black boxes that do “magic”. It is not clear at all what a thing of the type
Q [Dec]does, it might do anything (we will see that any code that generates declarations has the sameQ [Dec]type, no matter what sort of declarations it generates). Documentation becomes the main source of information about semantics of TH code. -
TH imposes restrictions on where the user should define TH functions themselves and sometimes also how to order definitions in files where TH functions are used.
The Q monad
Generation of code requires certain features to be available to us:
-
The ability to generate new unique names that cannot be captured.
-
The ability to retrieve information about a thing by its name. Usually we want to know about functions and types, but there are also ways to learn about a module, get collection of instances of a particular type class, etc.
-
The ability to put and get some custom state that is then shared by all TH code in the same module.
-
The ability to run
IOduring compilation, so we can e.g. read something from a file.
These features are usually achieved through monads in Haskell, and so it
should not come as a surprise that there is a special monad called Q
(short for “quotation”) that hosts all functions provided by TH.
Splicing
The only purpose of having a value of the type Q a is to use a in a
Haskell program somehow. a can be anything in intermediate monadic
expressions, but when we’re about to insert the generated code into a
Haskell source file, there are only five options:
-
Declaration
Dec, which includes the top-level things like function and data type definitions. In fact, we would like to be able to generate several declarations at a time, so the type that is actually used (and expected by the interpolating machinery) is[Dec]. -
Expression
Exp, such asx + 1or\x -> x + 1. It is probably the most common thing to generate. -
Typed expression
TExp, which is identical to expressionExp, but has a phantom type tag corresponding to the type of the expression inside. For example,TExp Intmeans that the expression evaluates to anInt. -
Type
Typesuch asIntorMaybe Intor justMaybe. The type doesn’t have to be saturated (i.e. may have any kind), so it may be pretty much anything one can encounter on the type level. -
Pattern
Patthat we use for pattern-matching.
I suggest you follow the links in the list above and glance at the
definitions of Dec, Exp, TExp, Type, and Pat. Note the naming
convention: the data constructors are suffixed with letters that hint about
the data type they belong to: Dec constructors end with a “D”, Exp
constructors end with an “E”, Type constructors end with a “T”, and Pat
constructors end with a “P”. This makes it easy to distinguish e.g. an
expression variable VarE from a pattern variable VarP.
Using the data types, slowly but surely, we can indeed construct an expression:
myFunc :: Q Exp
myFunc = do
x <- newName "x" -- generate a unique variable name, we'll cover names later
return $ LamE -- lambda expression
[VarP x] -- pattern matching on 'x'
(InfixE (Just (VarE x)) (VarE '(+)) (Just (LitE (IntegerL 1))))
-- here we have an infix expression: we apply (+) to 'x' and integer
-- literal 1
The TemplateHaskell language extension enables the special syntax $(exp)
where exp is an arbitrary expression producing Q [Dec], Q Exp, Q Type, or Q Pat. This allows us to interpolate the generated code into
normal Haskell source code.
For example, I can now use myFunc like this:
λ> :set -XTemplateHaskell -- don't forget to enable the extension
λ> $(myFunc) 3
4
-- The parentheses are not necessary if 'myFunc' doesn't take any arguments.
-- If it did, it would be something like '$(myFunc arg) 3'. In other words,
-- parentheses are only needed around expressions.
λ> $myFunc 3
4
λ> let f = (* 2) . $myFunc
λ> f 10
22
This is called splicing. The expression following after the dollar sign is
called a splice. A splice can occur in place of an expression, a pattern,
a type, or as a top-level declaration. It’s worth noting that declarations
may be spliced without the preceding $ because they live on the
top-level and there is no syntactical ambiguity. makeLens from the lens
package is a common example:
makeLens ''MyRecord -- Yes, we'll get to this quoting style too!
-- the same:
$(makeLens ''MyRecord)
Note that the $ symbol now has an additional meaning and so ambiguity is
possible in some cases. When $ is used in splices, there must be no space
between $ and the identifier or opening parenthesis that follows after it.
To use ($)—the application operator, be sure to add at least one space
between the operator and the following code.
Limitations of TH
Using TH currently has some limitations:
-
Staging restriction, which means that inside a splice one can only use functions that are already compiled, i.e. defined in other modules, not in the same module that contains the splice. This is a pretty nasty limitation that makes developers have a separate module for TH code, typically called
TH. -
TH often makes you order your definitions in a particular way. To quote the GHC user manual:
Top-level declaration splices break up a source file into declaration groups. A declaration group is the group of declarations created by a top-level declaration splice, plus those following it, down to but not including the next top-level declaration splice. N.B. only top-level splices delimit declaration groups, not expression splices. The first declaration group in a module includes all top-level definitions down to but not including the first top-level declaration splice.
Each declaration group is mutually recursive only within the group. Declaration groups can refer to definitions within previous groups, but not later ones.
Let’s see an example of this. Suppose we want to use the lens library to
generate some lenses. We could have code like this:
data MyRecord = MyRecord -- <<< first declaration group
{ _myRecordFoo :: Foo --
, _myRecordBar :: Bar --
, _myRecordBaz :: Baz --
} --
--
getRecordFoo :: MyRecord -> Foo --
getRecordFoo = view myRecordFoo --
--
makeLenses ''MyRecord -- <<< second declaration group
-- ^ Generates lenses: 'myRecordFoo', 'myRecordBar' and 'myRecordBaz'.
Sadly, this code won’t compile. The first declaration group includes the
definitions of MyRecord and getRecordFoo, but not the generated lenses.
This means that myRecordFoo is out of scope in getRecordFoo.
We could fix this by placing getRecordFoo after the makeLenses ''MyRecord splice:
data MyRecord = MyRecord -- <<< first declaration group
{ _myRecordFoo :: Foo --
, _myRecordBar :: Bar --
, _myRecordBaz :: Baz --
} --
--
makeLenses ''MyRecord -- <<< second declaration group
--
getRecordFoo :: MyRecord -> Foo -- can see 'MyRecord' from the
getRecordFoo = view myRecordFoo -- previous group
The first declaration group, consisting of just MyRecord now cannot see
getRecordFoo, and in case you need it, you’ll be forced to move all the
code that uses getRecordFoo into the second declaration group, after
makeLenses ''MyRecord. In most cases this is not a big deal (after all, in
many languages you must define a function before you use it), but
nevertheless we’re used to the fact that Haskell does not care about
ordering of our definitions, so this limitation is a pity.
Quotation
As we have seen, the Haskell AST that TH can build and manipulate is not small and not easy to work with at all. Unfortunately, it’s also possible to produce an AST of a correct shape that does not represent a Haskell program that compiles. In other words, manual construction of AST is tedious and error-prone.
Luckily, there is a way to get AST of arbitrary Haskell code by using
quotation. There are four types of quotations that are enabled by the
TemplateHaskell language extension:
| Thing produced | Quotation syntax | Type |
|---|---|---|
| Declaration | [d| … |] | Q [Dec] |
| Expression | [e| … |] | Q Exp |
| Typed expression | [|| … ||] | Q (TExp a) |
| Type | [t| … |] | Q Type |
| Pattern | [p| … |] | Q Pat |
Indeed, we need several different quoters because the same code may mean different things is different contexts, for example:
λ> runQ [e| Just x |] -- an expression
AppE (ConE GHC.Base.Just) (UnboundVarE x)
λ> runQ [p| Just x |] -- a pattern
ConP GHC.Base.Just [VarP x_0]
Since most of the time we work with expressions, the more lightweight quote
syntax [| … |] is equivalent to [e| … |]:
λ> runQ [| Just x |] -- an expression again
AppE (ConE GHC.Base.Just) (UnboundVarE x)
Not only quotation can be used to quickly discover representation of a piece of Haskell code, it can be used in place of manually constructed ASTs:
myFunc :: Q Exp
myFunc = [| \x -> x + 1 |]
I think you’ll agree that this version of myFunc is shorter and easier to
understand. The most wonderful thing about quoters is that we can actually
use splicing inside them:
add2 :: Q Exp
add2 = [| $myFunc . $myFunc |]
This way we can write the code we want to generate almost as usual, using splicing just to vary pieces of code that need to change algorithmicly. Note though that as of GHC 8.2.2, splicing of declarations inside declaration quoters does not work yet.
Let’s try add2:
λ> $add2 10
12
λ> runQ add2
InfixE
(Just (LamE [VarP x_2] -- lambda
(InfixE (Just (VarE x_2))
(VarE GHC.Num.+)
(Just (LitE (IntegerL 1))))))
(VarE GHC.Base..) -- functional composition
(Just (LamE [VarP x_3] -- lambda
(InfixE (Just (VarE x_3))
(VarE GHC.Num.+)
(Just (LitE (IntegerL 1))))))
It seems to work.
Typed expressions
Quotation for typed expressions is a bit special: it is the only way to
create values of the type TExp a, i.e. it’s the introduction form for
TExp. This way the compiler can ensure that the phantom type always
corresponds to what is inside. For example, let’s try and re-write myFunc
using quotation for typed expression splices:
myFuncTyped :: Q (TExp a)
myFuncTyped = [|| \x -> x + 1 ||]
I left a there on purpose to check what GHC will propose as inferred type:
Couldn’t match type
awithInteger -> Integer
Thus:
myFuncTyped :: Q (TExp (Integer -> Integer))
myFuncTyped = [|| \x -> x + 1 ||]
It appears that returning something polymorphic is not yet possible:
myFuncTyped :: Q (TExp (Num a => a -> a))
myFuncTyped = [|| \x -> x + 1 ||]
GHC says:
Illegal qualified type:
Num a => a -> a
GHC doesn’t yet support impredicative polymorphism
Impredicative polymorphism is when you try replace a polymorphic variable
with an expression which itself contains a forall. In the case above,
there is an implicit forall before the Num a constraint.
Further, there is a special syntax for splicing of typed expressions. Let’s
try to write a typed version of add2:
add2Typed :: Q (TExp (Integer -> Integer))
add2Typed = [|| $$myFuncTyped . $$myFuncTyped ||]
Normal splices cannot be used in quotations for typed expressions and vice
versa—typed splices cannot be used in quotations for untyped expressions.
This is way we simply had to start by writing a typed version of myFunc!
When using the double dollar syntax the compiler will make sure that we’re splicing our typed expression in a correct context so there won’t be type errors.
Apart from splicing, there is another way to eliminate a value of type TExp a—just use unType:
unType :: TExp a -> Exp
A bit more information about typed expressions can be found in this blog post.
A few words about runQ
What is that runQ thing though? In GHCi we work in the IO monad, so it’s
natural to assume from the examples above that it should have the type:
runQ :: Q a -> IO a
-- ^ ^
-- | |
-- we have but we want
-- this this
runQ is usually used just to play with TH in GHCi (we’ll see the reason
behind this shortly), so for that purpose we can safely assume that it has
this type indeed. If you are a beginner or you just don’t want to know
additional (and quite optional) details, just skip to the next
section now.
For those who want to dig it further, we can see that things are a bit more complicated:
runQ :: Quasi m => Q a -> m a
Quasi is the type class for monads that provide all the
capabilities for meta-programming we have mentioned in the beginning when we
introduced Q. You can click that link and take a look for yourself.
In fact, Q a is just a wrapper around Quasi m => m a under the hood:
newtype Q a = Q { unQ :: forall m. Quasi m => m a }
runQ :: Quasi m => Q a -> m a
runQ (Q m) = m
There are two instances of Quasi that are visible to users: Q and IO.
The instance for Q is trivial and the instance of IO is simply very
limited in functionality: from the numerous methods of Quasi it only
supports four: newName, runIO, reportError and reportWarning,
throwing exceptions when any other method is called. So IO can’t be used
to run any non-trivial TH code, only for debugging purposes we have just
seen.
Such definition of Q suggests that the authors of TH wanted us to work in
a concrete monad and at the same time they wanted to leave themselves an
opportunity to define the instance of Quasi that actually does all the
work somewhere else (it’s apparently not for us to see).
Names
As we know, the same name can refer to different things depending on the context where it is used. This is why working with names has its own subtleties we’re going to discuss now.
When we generate or manipulate code, we work with two types of names:
-
Names that mean something in the current context. Current context may be the context of meta-program that generates code we’re going to splice, or it may be the context where we do splicing. In both cases we may want to just name a thing that is currently in scope and then do something with it.
-
Names that do not correspond to anything in current context. For example, if we generate a lambda expression, we may want to bind its arguments and for that we need such “new” names.
This second group of names can be divided into two subgroups:
-
Names that can be captured. This means that that after we do splicing we end up with generated code that contains capturable names that can be actually bound or used in the enclosing lexical context.
-
Names that cannot be captured.
-
First of all, there is the syntax for quoting names of functions and types
(it’s also enabled by the TemplateHaskell extension):
-
To quote a function name, add a single quote in the front of it:
id→'id. -
To quote a type, add two single quotes in the front of it:
MyRecord→''MyRecord. The quoting convention follows from the fact that Haskell has different name spaces for values and types, and so we must be able to quote a data constructor'MyRecordas well as type constructor''MyRecordwithout ambiguity.
This method always produces names that refer to the thing that is currently
in scope. We saw this in the example with makeLenses :: Name -> Q [Dec]
where we passed it the name of our record ''MyRecord. Similarly, we saw it
in the first definition of myFunc, in the AST for the infix expression
involving the quoted (+) function:
InfixE (Just (VarE x)) (VarE '(+)) (Just (LitE (IntegerL 1)))
-- ^^^^
When we defined myFunc, (+) that comes from Prelude was in the scope
and so we were able to refer to it as '(+).
When we use quotation, it works absolutely the same. Every name in a TH quote is looked up in the current scope. In other words, the scope we’re operating in when we use one of the quotes determines directly what we will get in the resulting AST:
λ> runQ [| x |]
UnboundVarE x
x is not defined in this GHCi session and I get UnboundVarE x. However,
if I define x first and then run the same code, the result will be
different:
λ> let x = 42
λ> runQ [| x |]
VarE Ghci4.x
This Ghci4.x is the name of the variable, it is bound, and it cannot be
captured:
λ> let withX = it -- 'it' is bound to the result of last evaluated
-- expression in GHCi
λ> let x = 99 in $(return withX) -- binding 'x' has no effect on the result
42
The quoted Haskell code produces the same AST that the code placed in the
same module/scope/context would produce. If we modify the last example so
that the x that is bound to 99 is in scope when we quote x, we’ll get
an expression referring to that x:
λ> let x = 99 in $( [| x |] )
99
Even though the quotes lookup everything from the current scope, it does not mean that new names cannot be generated this way:
λ> runQ [| \x -> x + 1 |]
LamE [VarP x_4]
(InfixE (Just (VarE x_4))
(VarE GHC.Num.+)
(Just (LitE (IntegerL 1))))
This x_4 name was generated automatically for us. This is the same sort of
name we introduced with the newName :: String -> Q Name function in the
first implementation of myFunc. It’s new and it cannot be captured.
One way to introduce a capturable name is via the mkName :: String -> Name
function:
λ> runQ [| $(varE (mkName "x")) + 1 |]
InfixE (Just (VarE x)) (VarE GHC.Num.+) (Just (LitE (IntegerL 1)))
λ> let xPlus1 = it
λ> let x = 99 in $(return xPlus1) -- value of variable named 'x' influences
-- the result of evaluation
100
The Language.Haskell.TH.Lib module contains helper functions
that take and return AST values in the Q monad, which sometimes helps
produce shorter code, because these helpers compose well with quotation and
splicing. Here we used varE :: Name -> Q Exp instead of VarE :: Name -> Exp.
Another way to introduce a capturable name is apparently by using an unbound name in a quote:
λ> withZ <- runQ [| z + 1 |]
λ> withZ
InfixE (Just (UnboundVarE z)) (VarE GHC.Num.+) (Just (LitE (IntegerL 1)))
λ> let z = 100 in $(return withZ)
101
But this approach seems quite fragile for my taste. (What if we later define
z somewhere in the same module?)
Capturable names are sometimes useful. For example, the hamlet
template system allows to use this syntax #{name} to refer to values in a
template. The template then generates Haskell code where such names come out
as capturable names, so they can be bound. The resulting effect is that
values that are bound in the context where a template is used can be
accessed in templates, which is pretty cool.
Retrieving information about things
Now that we know a little about names, we can go on to learn how to lookup information about named things.
There are quite a few “reifying” functions that allow to do that:
-
reify :: Name -> Q Infois the most commonly used one. It allows us to look up general informationInfoabout a thing. -
extsEnabled :: Q [Extension]returns the list of all enabled language extensions at the splicing site. -
isExtEnabled :: Extension -> Q Boolallows us to check whether a particular language extension is enabled. -
reifyInstances :: Name -> [Type] -> Q [InstanceDec]returns a list of visible instances ofName(type class name) for types[Type]. -
There are more of them, for more rare use cases:
reifyFixity,reifyRoles,reifyAnnotations,reifyConStrictness.
Reifying functions take Names, but there is one more question to ask about
a name: does it name a thing that is in scope when we write our meta-program
or does it name a thing that is in scope when we execute the meta-program at
splicing site? So far the names were looked up in the scope of meta-program,
not in the scope of splicing site. If we need to access a thing from the
latter scope, there are two ways to do that:
-
We could take the name as an argument, like the
makeLensesfunction does. In that case we construct theNameat the splicing site (e.g. by quoting it) and it ends up naming a thing from that scope. -
We can use the
lookupTypeNameandlookupValueNamefunctions, which lookup names at the splicing site.
Note the signatures of these functions:
lookupTypeName :: String -> Q (Maybe Name)
lookupValueName :: String -> Q (Maybe Name)
Name itself cannot change meaning depending on context. When you have a
Name, it names one specific thing, always. So it makes sense that
lookupValueName and lookupTypeName take Strings and return Names.
Let’s now use the reifying functions for something more practical.
Example 1: instance generation
This example is going to be a little contrived. The aim is to show how all the tools we have seen so far work together, but without throwing a “wall of code” at the reader.
Suppose we want to know how many different non-bottom values inhabit a type. We could start first without TH like this:
class Countable a where
count :: Proxy a -> Integer
The Proxy is needed here because methods of a type class cannot lack a
“connection” to the type that is an instance of the type class. In other
words, there must be an a somewhere in the signature of count. We are
not interested in a particular value of the type a, but still we must
clarify which a we mean by passing Proxy a. If this doesn’t make any
sense, it’s OK, you can still continue reading.
How do we write the instances? It looks like we could leverage the existing
Enum and Bounded type classes, which already solve the problem, but only
for a limited set of types. If a type is an instance of both Enum and
Bounded, then we can define count like so:
instance (Enum a, Bounded a) => Countable a where
count Proxy = fromIntegral $
1 + fromEnum (maxBound :: a) - fromEnum (minBound :: a)
This is not going to work though if we want to be able to define instances
of Countable for more complex product and sum types. The reason is that
the instance above already defines Countable for any a possible, but
with this additional constraint (Enum a, Bounded a) added. In other words,
when Haskell searches for an instance, it only looks at the right-hand side
ignoring the constraints, and so a matches everything.
We could do better by writing a TH helper that handles two cases:
-
If a type is an instance of
EnumandBounded, then generate the instance like the one we have just seen, but for a concrete type. -
Otherwise analyze the type to figure out if it’s a product or sum type (or indeed something mixed) and use arithmetic to calculate the number of non-bottom values in the assumption that
Countableis defined for the types inside such a composite type.
Let’s solve the first part of the task:
deriveCountableSimple :: Name -> Q [Dec]
deriveCountableSimple name = [d|
instance Countable $a where
count Proxy = fromIntegral $
1 + fromEnum (maxBound :: $a) - fromEnum (minBound :: $a)
|]
where
a = conT name
conT is just return . ConT and ConT is a data constructor of Type
that represents a type constructor. Quoting and splicing go well together
and defining deriveCountableSimple was indeed simple.
To try this out, I derived a few instances this way:
deriveCountableSimple ''Bool
deriveCountableSimple ''Word8
deriveCountableSimple ''Char
We can try it now:
λ> count (Proxy :: Proxy Bool)
2
λ> count (Proxy :: Proxy Word8)
256
λ> count (Proxy :: Proxy Char)
1114112
Looks reasonable. Let’s handle the second case:
deriveCountableComposite :: Name -> Q [Dec]
deriveCountableComposite name = do
TyConI (DataD _ _ _ _ cons' _) <- reify name
[d|
instance Countable $(conT name) where
count Proxy = $(foldr addE [| 0 |] $ f <$> cons')
|]
where
f (NormalC _ ts) = handleCon (snd <$> ts)
f (RecC _ ts) = handleCon (thd <$> ts)
f _ = fail "unsupported data type"
handleCon ts = foldr mulE [| 1 |] (countTypeE <$> ts)
countTypeE t = [| count (Proxy :: Proxy $(return t)) |]
addE x y = [| $x + $y |]
mulE x y = [| $x * $y |]
thd (_,_,x) = x
Let’s see what is going on:
-
We first
reifythe givennameand get information about it. We are only interested in type constructors, so we pattern onTyConI. Further, from the information thatTyConIcontains we’re only interested in the collection of data constructorscons'. -
For every constructor we take every sub-type and construct an expression that counts the number of values that inhabit the type, this is done in
countTypeE. -
For
NormalCandRecCconstructors we just multiply expressions we’ve got for the individual types, this is done inhandleCon(we handle product type this way). -
Finally, we add together the expressions for all data constructors—this way we handle sum types.
Let’s play with it now:
data Foo
= Foo Bool Bool
deriveCountableComposite ''Foo
λ> count (Proxy :: Proxy Foo)
4 -- = 2 + 2
This makes sense, let’s see if it can handle a sum type:
data Foo
= Foo Bool Bool
| Bar Word8 Bool
deriveCountableComposite ''Foo
λ> count (Proxy :: Proxy Foo)
516 -- = 2 * 2 + 256 * 2
It works! Let’s combine the two cases into the single deriveCountable
helper:
deriveCountable :: Name -> Q [Dec]
deriveCountable name = do
let ts = [ConT name]
hasEnum <- isInstance ''Enum ts
hasBounded <- isInstance ''Bounded ts
if hasEnum && hasBounded
then deriveCountableSimple name
else deriveCountableComposite name
Done, now we can use deriveCountable in both cases and it’ll figure out
what to do on its own.
Viewing the generated code
Sometimes it is helpful to be able to see the code we’re generating at
splice sites. GHC allows us to do that with the
-ddump-splices flag. Stack seems to eat that output though, so I had to
add also -ddump-to-file and search for a file with the suffix -splices
in the .stack-work/dist directory.
Here is what I’ve got:
src/Main.hs:22:1-22: Splicing declarations
deriveCountable ''Bool
======>
instance Countable Bool where
count Proxy
= (fromIntegral
$ ((1 + (fromEnum (maxBound :: Bool)))
- (fromEnum (minBound :: Bool))))
src/Main.hs:23:1-23: Splicing declarations
deriveCountable ''Word8
======>
instance Countable Word8 where
count Proxy
= (fromIntegral
$ ((1 + (fromEnum (maxBound :: Word8)))
- (fromEnum (minBound :: Word8))))
src/Main.hs:24:1-22: Splicing declarations
deriveCountable ''Char
======>
instance Countable Char where
count Proxy
= (fromIntegral
$ ((1 + (fromEnum (maxBound :: Char)))
- (fromEnum (minBound :: Char))))
src/Main.hs:25:1-21: Splicing declarations
deriveCountable ''Foo
======>
instance Countable Foo where
count Proxy
= (((count (Proxy :: Proxy Bool))
* ((count (Proxy :: Proxy Bool)) * 1))
+ (((count (Proxy :: Proxy Word8))
* ((count (Proxy :: Proxy Bool)) * 1))
+ 0))
This is a useful debugging tool.
Lifting Haskell values into TH expressions
So far we have been constructing expressions manually or by using quotation.
What about getting an expression that “re-constructs” a value we already
have? This could be useful to deliver values generated in the Q monad to
the outside world.
The solution comes naturally in the form of the Lift type class:
class Lift t where
lift :: t -> Q Exp
lift takes a value and returns an expression that re-constructs it. We
could define some instances like so:
instance Lift Integer where
lift x = return (LitE (IntegerL x))
instance Lift Int where
lift x = return (LitE (IntegerL (fromIntegral x)))
instance Lift Char where
lift x = return (LitE (CharL x))
The template-haskell package defines Lift instances for all common data
types. GHC also knows how to define Lift for new types. It is enough to
enable the DeriveLift language extension and we’re done (example from the
Haddocks):
{-# LANGUAGE DeriveLift #-}
module Main (main) where
import Language.Haskell.TH.Syntax
data Bar a
= Bar1 a (Bar a)
| Bar2 String
deriving Lift
However, sometimes we want to lift values of types that do not define or
derive Lift and so we risk introducing orphan
instances. Even worse, even if we were OK with defining
orphan instances for things like Text and ByteString, we can’t (at least
not by using the DeriveLift extension):
{-# LANGUAGE DeriveLift #-}
{-# LANGUAGE StandaloneDeriving #-}
module Main (main) where
import Language.Haskell.TH
import Language.Haskell.TH.Syntax (Lift)
deriving instance Lift Text
This produces the following compilation error:
• Can't make a derived instance of ‘Lift Text’:
The data constructors of ‘Text’ are not all in scope
so you cannot derive an instance for it
• In the stand-alone deriving instance for ‘Lift Text’
The text and bytestring libraries do not expose the data constructors,
so DeriveLift refuses to do its magic.
Not everything is lost though. It so happens that the Data class provides
enough introspection capabilities for lifting, so TH has the following
helper:
liftData :: Data a => a -> Q Exp
If something is an instance of the Data type class, we can just lift it
with the liftData function. This is great because no orphan instances are
necessary, let’s try it out.
In one module we define a function that takes Text and generates an
expression that appends "!" to it:
foo :: Text -> Q Exp
foo txt = [| $(liftData txt) <> "!" |]
In another module we try to use it:
main :: IO ()
main = TIO.putStrLn $(foo "Here goes the text")
And this is when it blows up:
• Can't find interface-file declaration for variable Data.Text.Internal.pack
Probable cause: bug in .hi-boot file, or inconsistent .hi file
Use -ddump-if-trace to get an idea of which file caused the error
This is scary. I’ll save your time and we won’t go into the internals of
liftData here. Suffice it to say that liftData uses
toConstr internally which returns pack for Text. The rest
of the machinery apparently expects this pack function to be in the same
module the data type is defined, Data.Text.Internal, but pack is defined
in Data.Text, thus we get the error.
Text is a pretty common type, how do we lift it? The first step would be
to define a lifting function manually:
liftText :: Text -> Q Exp
liftText txt = AppE (VarE 'T.pack) <$> lift (T.unpack txt)
Here we first lift a String and then apply T.pack (assuming we have
imported Data.Text qualified as T). Now we could use liftText directly
to lift a Text value, but what if it’s inside a data structure? The
general solution to lifting involving Text seems to be the following:
foo :: Text -> Q Exp
foo txt = [| $e <> "!" |]
where
e = dataToExpQ (fmap liftText . cast) txt
This dataToExpQ function in combination with
cast (that comes from Data.Typeable) does the trick.
Let’s see what dataToExpQ does:
dataToExpQ :: Data a => (forall b. Data b => b -> Maybe (Q Exp)) -> a -> Q Exp
dataToExpQ works just like liftData but it allows us to overwrite
lifting for the values for which forall b. Data b => b -> Maybe (Q Exp)
returns Just. Don’t be afraid of the rank-2-type here. The forall
quantification of b inside that function in parentheses means that the
function literally works for all b, but the choice of b is made not at
the call site of dataToExpQ, but at the call site of this forall b. Data b => b -> Maybe (Q Exp) function. Similarly, the choice of a is made at
the call site of dataToExpQ which also has an implicit forall a. at the
beginning of its type signature. See the symmetry? (If you’re a beginner,
you may not understand rank-N-types immediately, in that case don’t
despair.)
cast performs type-safe casting between two types:
cast :: (Typeable a, Typeable b) => a -> Maybe b
Here, if a has the same type representation (which the Typeable type
class allows us to query via the typeRep function) as b, we get a b
value inside Just.
We can use cast here because Typeable is a superclass of Data:
class Typeable a => Data a where
-- …
If something from the above is not clear, it’s OK. Just grab this trick and
use it next time you need to lift data that contains Text or similar
types. One more thing: you need at least GHC 8 to use dataToExpQ.
Example 2: creating refined values at compile time
Now we are prepared to write a TH helper that allows us to construct values of refined types at compile time turning invalid inputs into compilation errors.
Our practical example will be taken (although in a simplified form) from an
existing library I wrote, called modern-uri. In the library
we have a function that takes Text representing a URI as input and outputs
Maybe URI:
data URI = URI
{ uriScheme :: Maybe (RText 'Scheme)
, uriAuthority :: Either Bool Authority
, uriPath :: [RText 'PathPiece]
, uriQuery :: [QueryParam]
, uriFragment :: Maybe (RText 'Fragment)
} deriving (Show, Eq, Ord, Data, Typeable, Generic)
mkURI :: Text -> Maybe URI
Nothing means that the input was not a correct URI. Our task thus
becomes:
-
Run
mkURIat compile time. -
If the returned value is
Nothing, signal a compile-time error. Otherwise lift the entireURIdata structure we have parsed.
By now we know how to tackle every part of the task. Here is the complete TH helper:
mkURI' :: Text -> Q Exp
mkURI txt =
case mkURI txt of
-- Instead of 'fail' we could also use 'reportError'. There is also
-- 'reportWarning' just in case you ever want to report warnings.
Nothing -> fail "The input does not contain a valid URI"
Just uri -> dataToExpQ (fmap liftText . cast) uri -- the same trick
We could finish the section here, but there is a nicer way, syntax-wise, to
make use of such a validating helper. The feature we’re going to explore is
called quasi-quotes. It turns out that TH allows us to define our own
custom quasi-quoters that are like d, e, t, and p we saw earlier.
Defining a quasi-quoter is easy. It is enough to import the
QuasiQuoter data type from Langauge.Haskell.TH.Quote:
data QuasiQuoter = QuasiQuoter
{ quoteExp :: String -> Q Exp
, quotePat :: String -> Q Pat
, quoteType :: String -> Q Type
, quoteDec :: String -> Q [Dec]
}
A quasi quoter may be used in the four familiar contexts, so it has four
corresponding functions that take the String from the quote and return
something to splice.
Usually we only want to use a quasi quoter in one context, so the others are
either omitted or undefined or replaced by errors. These failures will
be at compile time, so it’s OK to do the following:
uri :: QuasiQuoter
uri = QuasiQuoter
{ quoteExp = \str ->
case mkURI (T.pack str) of
Nothing -> fail "The input does not contain a valid URI"
Just x -> dataToExpQ (fmap liftText . cast) x
, quotePat = error "Usage as a parttern is not supported"
, quoteType = error "Usage as a type is not supported"
, quoteDec = error "Usage as a declaration is not supported" }
I like to use error with helpful messages instead of undefined some
people use.
To use our new quasi-quoter we need to enable the QuasiQuotes language
extension:
{-# LANGUAGE QuasiQuotes #-}
module Main (main) where
import TH
import qualified Data.Text.IO as TIO
import qualified Text.URI as URI
main :: IO ()
main = TIO.putStrLn (URI.render x)
where
x = [uri| https://markkarpov.com |]
If the string inside the uri quasi-quoter is not a valid URI, the
compilation will fail. One more type of error is caught at complie time!
Running IO in Q
Using IO in TH generally makes the compilation process dependent on
external conditions that may contribute to unexpected compilation failures.
Thus, it makes sense to think twice before running IO from TH.
That said, the function that lifts IO into Q is called simply
runIO:
runIO :: IO a -> Q a
Needless to say, one can do a lot with such a tool, for good or for evil.
One example of a good use is the gitrev package which allows us
to insert information about active branch and last commit of code that is
being compiled. It works by literally running the git executable at
complie time and then lifting the fetched data.
A far more common use case for IO in Q is reading from files. In that
case compilation usually starts to depend on contents of the file being
read, and so it’s a good idea to tell GHC that changes in that file should
cause re-compilation of the module where the file-reading TH helper is
spliced. This is done via the addDependentFile
function:
addDependentFile :: FilePath -> Q ()
(Why it lives in Language.Haskell.TH.Syntax and not in
Language.Haskell.TH is beyond me. What does it have to do with syntax?)
Example 3: the file-embed package
Finally, in our last example, let’s re-implement (in a simplified form) the
popular package file-embed, which allows to load contents of
a file and splice them as a IsString a => a value (the type of string
literals in Haskell in the presence of the OverloadedStrings language
extension).
If we have this in TH.hs file:
{-# LANGUAGE TemplateHaskell #-}
module TH
( embedFile )
where
import Data.String (IsString (..))
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
embedFile :: FilePath -> Q Exp
embedFile path = do
str <- runIO (readFile path)
addDependentFile path
-- We lift the 'String' literal to the polymorphic 'IsString a => a' form.
[| fromString str |]
Then we can use it like this:
{-# LANGUAGE TemplateHaskell #-}
module Main (main) where
import TH
import qualified Data.Text.IO as TIO
main :: IO ()
main = TIO.putStrLn $(embedFile "src/Main.hs")
The program outputs its own source code. No src/Main.hs file is expected
to exist when we run the binary, the source code is stored in the executable
itself. Note how the IsString a => a value was instantiated to Text
automatically because Text is an instance of IsString.
Conclusion
This is by no means a complete TH tutorial, some more rarely used tools and functions have not been covered. Still, the tutorial should get you up to speed and give a taste of what meta-programming in Haskell looks like. For further information refer directly to the Haddocks:
https://hackage.haskell.org/package/template-haskell
The new th-abstraction package may also be of interest.
It normalizes variations in the interface for inspecting datatype
information between different versions of the template-haskell library.
Good luck!