GHC.Generics
============
The term “generic programming” means wildly different things in different
contexts (ironically). The “generics” of GHC.Generics means the following
specifically:
GHC.Generics enables writing a general algorithm that is instantiable to any
ADT, if the generality is achieved by informing the algorithm of the ADT
definition, so the algorithm just needs to work through the given ADT structure
mechanically. Some examples are serialization such as to/from JSON, re-inventing
automatic implementation of `Show`, `Read`, `Eq` instances, and extending this
technique to user-defined classes.
Common ADT Representation
-------------------------
All ADTs are different, but they all fit the same framework of sums of products
of fields. To enable writing an algorithm that works for every possibility in
this framework, naturally one follows and posits a
common representation format that covers all possibilities! Then the algorithm
just needs to input and output this representation.
The common representation comes not as one type however, but several types, each
standing for one aspect of an ADT; then they are composed together for the whole
thing. Some aspects come to mind immediately:
* The sum structure, if there are at least 2 constructors.
This is represented by a binary sum type `:+:`, similar to `Either`; its data
constructors are `L1` and `R1`. If your ADT has more than 2 constructors, the
representation nests `:+:`.
(If there are no constructors—the ADT is an empty type—this is represented by
an empty type `V1`, “V” for “void”.)
* Each product structure, if a constructor takes at least 2 fields.
This is represented by a binary tuple type `:*:`, similar to `(,)`; its data
constructor is `:*:`. If a constructor has more than 2 fields, the
representation nests `:*:`.
(If a constructor takes no fields, this is represented by a unit type `U1`,
“U” for “unit”.)
* Each field type and value.
This is represented by a newtype wrapper `Rec0` wrapping the actual field; its
data constructor is `K1`. Further detail: `Rec0` is synonym for `K1 R`, in
which `K1` is the newtype wrapper, `R` is a hardcoded phantom type parameter
that is a historical relic. So for example `Rec0 Char` means `K1 R Char`.
This additional wrapping is actually helpful: Your algorithm, when it comes
down to individual fields, just has to handle or produce a `Rec0` wrapper
specifically, as opposed to expressing “any type that is not `:+:`, `:*:`,
`U1`, or `V1`” which would be fraught with troubles in Haskell.
At this point it may be useful to show an example; but beware that it is a
preview, not a complete example, because the story of the common representation
is not done yet. For the following ADT example
```
data SoCD = Str String | MkCD Char Double
```
the representation goes like
```
-- Mock code, incomplete story.
-- Representation type:
Rec0 String
:+:
Rec0 Char :*: Rec0 Double
-- Representation value for Str "hi":
L1 (K1 "hi")
-- Representation value for D 'x' 4.5:
R1 (K1 'x' :*: K1 4.5)
```
The remaining aspects of an ADT are metadata: ADT name, constructor names, field
names (if record syntax is given), etc. They are needed by some algorithms such
as making/parsing JSON. These are carried by extra newtype wrappers at various
levels, all of the newtype `M1` (“M” for “meta”; data constructor `M1`) but with
different phantom type parameters indicating different levels:
* At the outermost, a `type D1 = M1 D` wrapper (“D” for “data type”) carries the
name of the ADT, which module it's from, which package it's from (“main” is
the default package), and whether it is a newtype or not. This wraps around
the sum representation.
(If only one data constructor, around the `C1` wrapper below; if no data
constructor, around `V1`.)
* For each constructor, a `type C1 = M1 C` wrapper (“C” for “constructor”)
carries the name, fixity (prefix vs infix, and if infix, associativity and
precedence level), and whether record syntax is given or not. This wraps
around the respective product representation.
(If only one field, around the `S1` wrapper below; If no field, around `U1`.)
* Around each field, an `type S1 = M1 S` wrapper (“S” for “selector”) carries
field annotations: Field name (if record syntax is given), strictness, and
unpackedness. This wraps around the `Rec0` field wrapper.
Strictness and unpackedness come in two versions: what the ADT definition says
(“source”) using `!` and `{-# UNPACK #-}`, and what the compiler settles for
(“decided”). They can differ due to code optimization levels and the
StrictData extension.
All metadata are carried at the type level (e.g., type-level strings, type-level
booleans), and still queryable at the term level.
Using `SoCD` again as an example, and the module name is “ADTs”, the complete
representation is:
```
-- Representation type
D1 ('MetaData "SoCD" "ADTs" "main" 'False) -- False: not newtype
( C1 ('MetaCons "S" 'PrefixI 'False) -- False: not record syntax
(S1 ('MetaSel 'Nothing -- Nothing: no field name
'NoSourceUnpackedness
'NoSourceStrictness
'DecidedLazy)
(Rec0 String))
:+:
C1 ('MetaCons "MkCD" 'PrefixI 'False)
( (S1 ('MetaSel 'Nothing
'NoSourceUnpackedness
'NoSourceStrictness
'DecidedLazy)
(Rec0 Char))
:*:
(S1 ('MetaSel 'Nothing
'NoSourceUnpackedness
'NoSourceStrictness
'DecidedLazy)
(Rec0 Double))))
-- Representation value for Str "hi"
-- The comment outlines the type, to help you pair with the data constructors
-- D1 (:+: (C1 (S1 (Rec0 ))))
M1 (L1 (M1 (M1 (K1 "hi"))))
-- Representation value for MkCD 'x' 4.5
-- D1 (:+: (C1 (S1 (Rec0 ) :*: S1 (Rec0 ))))
M1 (R1 (M1 (M1 (K1 'x') :*: M1 (K1 4.5)))))
```
Here is another example, to show what happens to another ADT, record syntax, and
infix constructors, and what doesn't happen to a recursive ADT. The example ADT
is:
```
data Expr = I{int :: Integer} | Expr :^ Expr
infixr 8 :^
```
The representation is:
```
-- Representation type
D1 ('MetaData "Expr" "ADTs" "main" 'False)
( C1 ('MetaCons "I" 'PrefixI 'True) -- True: record syntax
(S1
('MetaSel ('Just "int")
'NoSourceUnpackedness
'NoSourceStrictness
'DecidedLazy)
(Rec0 Integer))
:+:
C1 ('MetaCons ":^" ('InfixI 'RightAssociative 8) 'False)
( S1 ('MetaSel 'Nothing
'NoSourceUnpackedness
'NoSourceStrictness
'DecidedLazy)
(Rec0 Expr)
:*:
S1 ('MetaSel 'Nothing
'NoSourceUnpackedness
'NoSourceStrictness
'DecidedLazy)
(Rec0 Expr)))
-- Representation value for I 4
-- D1 (:+: (C1 (S1 (Rec0 ))))
M1 (L1 (M1 (M1 (K1 4))))
-- Representation value for I 4 :^ I 5
-- D1 (:+: (C1 (S1 (Rec0 ) :*: S1 (Rec0 ))))
M1 (R1 (M1 (M1 (K1 (I 4)) :*: M1 (K1 (I 5)))))
```
Note that the representation does not recursively expand nested `Expr` in the
type or the values. The representation scheme is for exposing the sum-of-product
structure by just one unfolding. It's up to your algorithm whether and how to do
recursion.
Another note is that different ADTs get formally different representation types,
in fact there is enough type-level information to uniquely identify the ADT and
describe its structure. A benefit is statically known correspondence: An ADT and
its representation type are in bijection. A downside is that an algorithm cannot
be expressed as one single function of one type signature, but instead a type
class has to be defined, and the algorithm has to be split across instances, one
instance for `:+:`, one instance for `:*:`, one instance for `Rec0`, etc.
Lastly, a technicality. All representation types take one more type parameter
than shown, e.g., the `M1` type actually takes 4 type parameters, but only 3 are
shown: 1st for `D` or `C` or `S`, 2nd for the metadata, and 3rd for the wrapped
representation. Here are the definitions of the representation types in full
glory:
```
newtype M1 i c f p = M1{unM1 :: f p} -- i ∈ {D, C, S}
-- c carries metadata
-- f is wrapped representation
data (:+:) f g p = L1 (f p)
| R1 (g p)
data V1 p
data (:*:) f g p = f p :*: g p
data U1 p = U1
newtype K1 i t p = K1{unK1 :: t} -- i = R
-- t is wrapped field
```
Throughout, there is one last type variable `p`. It is an unused dummy when
representing ADTs of kind `Type`. Note that `p` must be omitted when wrapping
one thing inside another, e.g., it is `K1 R Char`, not `K1 R Char p`, that is
wrapped inside `M1 S ... (K1 R Char)`, and generally anything that takes the
place of `f` or `g` above.
(Later, these representation types are also used for representing ADTs of kind
`Type -> Type`, enabling general algorithms like `fmap`. Then `p` is necessary.)
The `Generic` Class
-------------------
Representation types and conversion functions are members of the `Generic` type
class:
```
class Generic a where
type family Rep a :: * -> *
from :: a -> Rep a p
to :: Rep a p -> a
```
Since different ADTs have formally different representation types, but we need a
uniform way to refer to them, `Rep` is made a type synonym family, so that for
example `Rep SoCD` is a synonym for `SoCD`'s representation type. Also recall
that there is one last dummy type parameter `p` that is better omitted in most
contexts, so it is `Rep SoCD p` when referring to types for values.
Instances are auto-generated by GHC when the DeriveGeneric extension is turned
on and the ADT in question has `deriving Generic` (at definition site or
standalone deriving). For example, deriving `Generic` for `SoCD` yields the
representation type as shown in the last section, and the methods `from` and `to`
convert between `SoCD` values and representation values as shown.
The file contains `SoCD`, `Expr`, a unit type (one constructor, no
field), and an empty type. As an exercise, please load it into ghci, use `:info
SoCD` or `:kind! Rep SoCD` to see the representation type, `from` to see sample
representation values. Similarly for the other types.
If an ADT has 3 or more constructors, what will be the auto-generated `:+:` tree
shape? Is it left leaning, right leaning, or somewhat balanced? Likewise if a
constructor has 3 or more fields, what will be the `:*:` tree shape? Answer:
Implementation defined—different GHC versions can give different tree shapes. So
try to write algorithms to be independent of this detail.
Basic Example: Show but prefix (Polish) notation and list of pieces
-------------------------------------------------------------------