GHC.Generics ============ The term “generic programming” means wildly different things in different contexts (ironically). The “generics” of GHC.Generics means the following specifically: GHC.Generics enables writing a general algorithm that is instantiable to any ADT, if the generality is achieved by informing the algorithm of the ADT definition, so the algorithm just needs to work through the given ADT structure mechanically. Some examples are serialization such as to/from JSON, re-inventing automatic implementation of `Show`, `Read`, `Eq` instances, and extending this technique to user-defined classes. Common ADT Representation ------------------------- All ADTs are different, but they all fit the same framework of sums of products of fields. To enable writing an algorithm that works for every possibility in this framework, naturally one follows and posits a common representation format that covers all possibilities! Then the algorithm just needs to input and output this representation. The common representation comes not as one type however, but several types, each standing for one aspect of an ADT; then they are composed together for the whole thing. Some aspects come to mind immediately: * The sum structure, if there are at least 2 constructors. This is represented by a binary sum type `:+:`, similar to `Either`; its data constructors are `L1` and `R1`. If your ADT has more than 2 constructors, the representation nests `:+:`. (If there are no constructors—the ADT is an empty type—this is represented by an empty type `V1`, “V” for “void”.) * Each product structure, if a constructor takes at least 2 fields. This is represented by a binary tuple type `:*:`, similar to `(,)`; its data constructor is `:*:`. If a constructor has more than 2 fields, the representation nests `:*:`. (If a constructor takes no fields, this is represented by a unit type `U1`, “U” for “unit”.) * Each field type and value. This is represented by a newtype wrapper `Rec0` wrapping the actual field; its data constructor is `K1`. Further detail: `Rec0` is synonym for `K1 R`, in which `K1` is the newtype wrapper, `R` is a hardcoded phantom type parameter that is a historical relic. So for example `Rec0 Char` means `K1 R Char`. This additional wrapping is actually helpful: Your algorithm, when it comes down to individual fields, just has to handle or produce a `Rec0` wrapper specifically, as opposed to expressing “any type that is not `:+:`, `:*:`, `U1`, or `V1`” which would be fraught with troubles in Haskell. At this point it may be useful to show an example; but beware that it is a preview, not a complete example, because the story of the common representation is not done yet. For the following ADT example ``` data SoCD = Str String | MkCD Char Double ``` the representation goes like ``` -- Mock code, incomplete story. -- Representation type: Rec0 String :+: Rec0 Char :*: Rec0 Double -- Representation value for Str "hi": L1 (K1 "hi") -- Representation value for D 'x' 4.5: R1 (K1 'x' :*: K1 4.5) ``` The remaining aspects of an ADT are metadata: ADT name, constructor names, field names (if record syntax is given), etc. They are needed by some algorithms such as making/parsing JSON. These are carried by extra newtype wrappers at various levels, all of the newtype `M1` (“M” for “meta”; data constructor `M1`) but with different phantom type parameters indicating different levels: * At the outermost, a `type D1 = M1 D` wrapper (“D” for “data type”) carries the name of the ADT, which module it's from, which package it's from (“main” is the default package), and whether it is a newtype or not. This wraps around the sum representation. (If only one data constructor, around the `C1` wrapper below; if no data constructor, around `V1`.) * For each constructor, a `type C1 = M1 C` wrapper (“C” for “constructor”) carries the name, fixity (prefix vs infix, and if infix, associativity and precedence level), and whether record syntax is given or not. This wraps around the respective product representation. (If only one field, around the `S1` wrapper below; If no field, around `U1`.) * Around each field, an `type S1 = M1 S` wrapper (“S” for “selector”) carries field annotations: Field name (if record syntax is given), strictness, and unpackedness. This wraps around the `Rec0` field wrapper. Strictness and unpackedness come in two versions: what the ADT definition says (“source”) using `!` and `{-# UNPACK #-}`, and what the compiler settles for (“decided”). They can differ due to code optimization levels and the StrictData extension. All metadata are carried at the type level (e.g., type-level strings, type-level booleans), and still queryable at the term level. Using `SoCD` again as an example, and the module name is “ADTs”, the complete representation is: ``` -- Representation type D1 ('MetaData "SoCD" "ADTs" "main" 'False) -- False: not newtype ( C1 ('MetaCons "S" 'PrefixI 'False) -- False: not record syntax (S1 ('MetaSel 'Nothing -- Nothing: no field name 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 String)) :+: C1 ('MetaCons "MkCD" 'PrefixI 'False) ( (S1 ('MetaSel 'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Char)) :*: (S1 ('MetaSel 'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Double)))) -- Representation value for Str "hi" -- The comment outlines the type, to help you pair with the data constructors -- D1 (:+: (C1 (S1 (Rec0 )))) M1 (L1 (M1 (M1 (K1 "hi")))) -- Representation value for MkCD 'x' 4.5 -- D1 (:+: (C1 (S1 (Rec0 ) :*: S1 (Rec0 )))) M1 (R1 (M1 (M1 (K1 'x') :*: M1 (K1 4.5))))) ``` Here is another example, to show what happens to another ADT, record syntax, and infix constructors, and what doesn't happen to a recursive ADT. The example ADT is: ``` data Expr = I{int :: Integer} | Expr :^ Expr infixr 8 :^ ``` The representation is: ``` -- Representation type D1 ('MetaData "Expr" "ADTs" "main" 'False) ( C1 ('MetaCons "I" 'PrefixI 'True) -- True: record syntax (S1 ('MetaSel ('Just "int") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Integer)) :+: C1 ('MetaCons ":^" ('InfixI 'RightAssociative 8) 'False) ( S1 ('MetaSel 'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Expr) :*: S1 ('MetaSel 'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Expr))) -- Representation value for I 4 -- D1 (:+: (C1 (S1 (Rec0 )))) M1 (L1 (M1 (M1 (K1 4)))) -- Representation value for I 4 :^ I 5 -- D1 (:+: (C1 (S1 (Rec0 ) :*: S1 (Rec0 )))) M1 (R1 (M1 (M1 (K1 (I 4)) :*: M1 (K1 (I 5))))) ``` Note that the representation does not recursively expand nested `Expr` in the type or the values. The representation scheme is for exposing the sum-of-product structure by just one unfolding. It's up to your algorithm whether and how to do recursion. Another note is that different ADTs get formally different representation types, in fact there is enough type-level information to uniquely identify the ADT and describe its structure. A benefit is statically known correspondence: An ADT and its representation type are in bijection. A downside is that an algorithm cannot be expressed as one single function of one type signature, but instead a type class has to be defined, and the algorithm has to be split across instances, one instance for `:+:`, one instance for `:*:`, one instance for `Rec0`, etc. Lastly, a technicality. All representation types take one more type parameter than shown, e.g., the `M1` type actually takes 4 type parameters, but only 3 are shown: 1st for `D` or `C` or `S`, 2nd for the metadata, and 3rd for the wrapped representation. Here are the definitions of the representation types in full glory: ``` newtype M1 i c f p = M1{unM1 :: f p} -- i ∈ {D, C, S} -- c carries metadata -- f is wrapped representation data (:+:) f g p = L1 (f p) | R1 (g p) data V1 p data (:*:) f g p = f p :*: g p data U1 p = U1 newtype K1 i t p = K1{unK1 :: t} -- i = R -- t is wrapped field ``` Throughout, there is one last type variable `p`. It is an unused dummy when representing ADTs of kind `Type`. Note that `p` must be omitted when wrapping one thing inside another, e.g., it is `K1 R Char`, not `K1 R Char p`, that is wrapped inside `M1 S ... (K1 R Char)`, and generally anything that takes the place of `f` or `g` above. (Later, these representation types are also used for representing ADTs of kind `Type -> Type`, enabling general algorithms like `fmap`. Then `p` is necessary.) The `Generic` Class ------------------- Representation types and conversion functions are members of the `Generic` type class: ``` class Generic a where type family Rep a :: * -> * from :: a -> Rep a p to :: Rep a p -> a ``` Since different ADTs have formally different representation types, but we need a uniform way to refer to them, `Rep` is made a type synonym family, so that for example `Rep SoCD` is a synonym for `SoCD`'s representation type. Also recall that there is one last dummy type parameter `p` that is better omitted in most contexts, so it is `Rep SoCD p` when referring to types for values. Instances are auto-generated by GHC when the DeriveGeneric extension is turned on and the ADT in question has `deriving Generic` (at definition site or standalone deriving). For example, deriving `Generic` for `SoCD` yields the representation type as shown in the last section, and the methods `from` and `to` convert between `SoCD` values and representation values as shown. The file contains `SoCD`, `Expr`, a unit type (one constructor, no field), and an empty type. As an exercise, please load it into ghci, use `:info SoCD` or `:kind! Rep SoCD` to see the representation type, `from` to see sample representation values. Similarly for the other types. If an ADT has 3 or more constructors, what will be the auto-generated `:+:` tree shape? Is it left leaning, right leaning, or somewhat balanced? Likewise if a constructor has 3 or more fields, what will be the `:*:` tree shape? Answer: Implementation defined—different GHC versions can give different tree shapes. So try to write algorithms to be independent of this detail. Basic Example: Show but prefix (Polish) notation and list of pieces -------------------------------------------------------------------