What's new with GHC generics in 8.0
GHC 8.0 will be released soon, and with it comes many new additions and improvements to the base
library. In particular, the API in the GHC.Generics
module (and the underlying machinery provided by the DeriveGeneric
GHC extension) have undergone quite a few improvements. However, these changes haven’t been very well advertised outside of the GHC dev community, so hopefully this blog post will spread awareness of some of the new bells and whistles in GHC generics.
Type-level metadata
When you type deriving Generic
, it will spit out a Generic
instance when compiled. You can observe this for yourself by passing -ddump-deriv
as an option when compiling. For example, compiling the following code:
with ghc -ddump-deriv
will spit out something like this in GHC 7.10 and earlier (after some cleanup):
But that’s not the only thing it generates. It also generates some empty datatypes and typeclass instances for those datatypes:
What’s going on here? The reason GHC generics does this is to provide programmers with a datatype’s metadata. For example, the Datatype
typeclass gives you the ability to reify the name of a datatype and the module in which it is defined. This can be used when creating generic instances, since the generated Generic
instance uses D1Foo
and C1_0Foo
(the generated datatypes) in Rep Foo
. For example, Niklas Hambüchen uses Datatype
to come up with a datatype’s name for use in an error message:
But encoding metadata this way has some drawbacks:
- This information can only be accessed at runtime. In particular, there’s no way to, say, disallow creating instances of a typeclass for a datatype named
Kerfuffle
, or only allow instances fornewtype
s, at compile-time. The best you could do is check the value ofdatatypeName
orisNewtype
at runtime, and throw an error if those criteria aren’t met. - There is a measurable compilation performance hit because of this approach. In particular, you have to generate a datatype and typeclass instance for every datatype, constructor, and record selector in your module that has
Generic
derived for it.
Luckily, we can do better. José Pedro Magalhães devised a clever way to encode all of this metadata at the type level using DataKinds
. The key is to first define new datatypes:
With Meta
, we can encode all of the properties of a datatype (MetaData
), constructor (MetaCons
), or selector (MetaSel
) that we wish. Now our derived Generic
instance from earlier will look a little different:
The implementations of from
and to
are the same, but D1
and C1
now use the promoted Meta
type to represent Foo
’s metadata. This alleviates the need to generate extra datatypes.
Now comes the amazing part. Before, we had to generate several Datatype
, Constructor
, and Selector
instances for every deriving Generic
line we used. But now, there are only three such instances we will ever need!
That’s it! We no longer need to generate any auxiliary datatypes or typeclass instances, because the above three instances will work for any possible Rep
that GHC generates. Don’t worry if you don’t understand how it’s implemented—it uses quite a bit of trickery inspired by the singletons
library to produce values from their type-level equivalents. (The full source for this can be found here.)
Of course, if you wish to you can use this new type-level encoding at compile-time instead of at runtime. For example, Ben Gamari has defined a type family for determining whether a datatype is strict in all its fields by examining its generic Rep
. More on this in a bit.
More metadata
You might have noticed in the definition of Meta
earlier that there was quite a bit of new information when compared to what Datatype
, Constructor
, and Selector
have in GHC 7.10. That is no coincidence—GHC 8.0 enriches generics with more metadata. Here is a full list of new additions:
Package names
The Datatype
class has a new method (added by Oleg Grenrus):
As its name suggests, this tells you the name of the package a datatype is defined in. As an example of its utility, you can use packageName
to generically define instances for the Lift
typeclass from template-haskell
.
Selector strictness
The Selector
class has three new methods:
where SourceUnpackedness
, SourceStrictness
, and DecidedStrictness
are defined as follows:
These methods allow you to determine strictness properties of a datatype’s fields. SourceUnpackedness
tells you whether a field is marked with an {-# UNPACK #-}
pragma, a {-# NOUNPACK #-}
pragma, or neither. Similarly, SourceStrictness
tells you whether a field is marked with a strict annotation (a.k.a. a BangPattern
, or a !
), a lazy annotation (a ~
, which was introduced due to Adam Sandberg Ericsson’s work on the -XStrict
extension), or neither.
Whereas SourceUnpackedness
and SourceStrictness
reflect what is written in the source code, the actual strictness that GHC decides on for a particular field is slightly more complex, since it takes into account things like -funbox-strict-fields
and -XStrict
. For example, consider the following datatype:
The fields of ExampleConstructor
will have different DecidedStrictness
depending on what flags are used to compile GHC:
- If compiled without optimization or other language extensions, then the fields of
ExampleConstructor
will haveDecidedStrict
,DecidedStrict
, andDecidedLazy
, respectively. - If compiled with
-XStrict
enabled, then the fields will haveDecidedStrict
,DecidedStrict
, andDecidedStrict
, respectively. - If compiled with
-O2
enabled, then the fields will haveDecidedUnpack
,DecidedStrict
, andDecidedLazy
, respectively.
Unlifted type representations
Previously, Generic
couldn’t be derived at all for any datatype containing unlifted arguments (e.g., Int#
or Double#
). This made GHC generics quite poor in comparison to GHC’s other derivable classes (e.g., you can derive Eq
and Show
for some unlifted argument types).
To achieve feature parity, GHC generics was enhanced with a new data family for unlifted types. Currently, there are six data instances, corresponding to those unlifted types which at least one other derivable class can handle:
Now, the following datatype can have a derived Generic
instance:
Other improvements
Along with major API changes came some other improvements and bugfixes. They include:
- Thanks to Oliver Charles, Ben Gamari, and others, the datatypes in
GHC.Generics
now have many more typeclass instances, includingEnum
,Bounded
,Ix
,Functor
,Applicative
,Monad
,MonadFix
,MonadPlus
,MonadZip
,Foldable
,Foldable
,Traversable
,Generic1
, andData
. - Thanks to Simon Peyton Jones,
DeriveAnyClass
no longer crashes when used with a multi-parameter typeclass. DeriveAnyClass
now fills in associated type defaults.
Things to come
Unfortunately, I wanted to make some more changes to GHC generics before the final 8.0.1 release, but I simply ran out of time. Here are some things to look forward to in future releases:
Poly-kinded Generic1
Previously, the definition of Generic1
was entirely monomorphic with respect to the kind of its argument:
But if you look closely, you’ll notice that this is too restrictive! The definition of Generic1
permits it to range over even more types, which we can achieve with a little bit of PolyKinds
:
Similarly, we can kind-generalize most of the datatypes in the GHC.Generics
module:
(The exception being Par1
, of course, since its type parameter is forced to be of kind *
.)
With this, we can derive Generic1
for more datatypes than we could before. For example, Derek Elkins uses GHC generics to automatically define Authenticated
instances for a datatype that is parameterized over a type that uses DataKinds
in this example.
The above changes are slated to land in GHC 8.2.
Generics compilation speed
Unfortunately, recent GHC releases seem to have regressed with respect to how fast it takes to define generic typeclass instances for large datatypes using DefaultSignatures
. It’s suspected that there is some quadratic blowup with respect to code size, and there are currently several GHC Trac tickets (this, this, and this) about the issue.
There have been several notable Haskell libraries that have been bitten by this issue. Two noteworthy examples are binary
and aeson
, both of which use GHC generics and DefaultSignatures
to allow users to define Binary
and ToJSON
/FromJSON
instances easily. While there is a workaround to alleviate compilation times (see these pull requests for binary
and aeson
), it’s not a robust solution.
I plan to investigate this compilation speed regression further in the future. If you wish to help, feel free to talk to me (RyanGlScott) on the #ghc
IRC channel on freenode.