At the term level, Core resembles a explicitly-typed polymorphic lambda calculus (F_{ω}), with the addition of local `let`

bindings, algebraic type definitions, constructors, and `case`

expressions, and primitive types, literals and operators. Its type system is richer than that of System F, supporting explicit type equality coercions and type functions.[system-fc]

In this section we concentrate on the less obvious points about Core.

Core programs are organized into *modules*, corresponding directly to source-level Haskell modules. Each module has a identifying name *mident*. A module identifier consists of a *package name* followed by a module name, which may be hierarchical: for example, `base:GHC.Base`

is the module identifier for GHC’s Base module. Its name is `Base`

, and it lives in the GHC hierarchy within the `base`

package. Section 5.8 of the GHC users’ guide explains package names [ghc-user-guide]. In particular, note that a Core program may contain multiple modules with the same (possibly hierarchical) module name that differ in their package names. In some of the code examples that follow, we will omit package names and possibly full hierarchical module names from identifiers for brevity, but be aware that they are always required.^{[18]}

Each module may contain the following kinds of top-level declarations:

- Algebraic data type declarations, each defining a type constructor and one or more data constructors;
- Newtype declarations, corresponding to Haskell
`newtype`

declarations, each defining a type constructor and a coercion name; and - Value declarations, defining the types and values of top-level variables.

No type constructor, data constructor, or top-level value may be declared more than once within a given module. All the type declarations are (potentially) mutually recursive. Value declarations must be in dependency order, with explicit grouping of potentially mutually recursive declarations.

Identifiers defined in top-level declarations may be *external* or *internal*. External identifiers can be referenced from any other module in the program, using conventional dot notation (e.g., `base:GHC.Base.Bool`

, `base:GHC.Base.True`

). Internal identifiers are visible only within the defining module. All type and data constructors are external, and are always defined and referenced using fully qualified names (with dots).

A top-level value is external if it is defined and referenced using a fully qualified name with a dot (e.g., `main:MyModule.foo = ...`

); otherwise, it is internal (e.g., `bar = ...`

). Note that Core’s notion of an external identifier does not necessarily coincide with that of 「exported」 identifier in a Haskell source module. An identifier can be an external identifier in Core, but not be exported by the original Haskell source module.^{[19]} However, if an identifier was exported by the Haskell source module, it will appear as an external name in Core.

Core modules have no explicit import or export lists. Modules may be mutually recursive. Note that because of the latter fact, GHC currently prints out the top-level bindings for every module as a single recursive group, in order to avoid keeping track of dependencies between top-level values within a module. An external Core tool could reconstruct dependencies later, of course.

There is also an implicitly-defined module `ghc-prim:GHC.Prim`

, which exports the 「built-in」 types and values that must be provided by any implementation of Core (including GHC). Details of this module are in the Primitive Module section.

A Core *program* is a collection of distinctly-named modules that includes a module called main:Main having an exported value called `main:ZCMain.main`

of type `base:GHC.IOBase.IO a`

(for some type `a`

). (Note that the strangely named wrapper for `main`

is the one exception to the rule that qualified names defined within a module `m`

must have module name `m`

.)

Many Core programs will contain library modules, such as `base:GHC.Base`

, which implement parts of the Haskell standard library. In principle, these modules are ordinary Haskell modules, with no special status. In practice, the requirement on the type of `main:Main.main`

implies that every program will contain a large subset of the standard library modules.

There are five distinct namespaces:

- module identifiers (
`mident`

), - type constructors (
`tycon`

), - type variables (
`tyvar`

), - data constructors (
`dcon`

), - term variables (
`var`

).

Spaces (1), (2+3), and (4+5) can be distinguished from each other by context. To distinguish (2) from (3) and (4) from (5), we require that data and type constructors begin with an upper-case character, and that term and type variables begin with a lower-case character.

Primitive types and operators are not syntactically distinguished.

Primitive *coercion* operators, of which there are six, *are* syntactically distinguished in the grammar. This is because these coercions must be fully applied, and because distinguishing their applications in the syntax makes typechecking easier.

A given variable (type or term) may have multiple definitions within a module. However, definitions of term variables never 「shadow」 one another: the scope of the definition of a given variable never contains a redefinition of the same variable. Type variables may be shadowed. Thus, if a term variable has multiple definitions within a module, all those definitions must be local (let-bound). The only exception to this rule is that (necessarily closed) types labelling `%external`

expressions may contain `tyvar`

bindings that shadow outer bindings.

Core generated by GHC makes heavy use of encoded names, in which the characters `Z`

and `z`

are used to introduce escape sequences for non-alphabetic characters such as dollar sign `$`

(`zd`

), hash `#`

(`zh`

), plus `+`

(`zp`

), etc. This is the same encoding used in `.hi`

files and in the back-end of GHC itself, except that we sometimes change an initial `z`

to `Z`

, or vice-versa, in order to maintain case distinctions.

Finally, note that hierarchical module names are z-encoded in Core: for example, `base:GHC.Base.foo`

is rendered as `base:GHCziBase.foo`

. A parser may reconstruct the module hierarchy, or regard `GHCziBase`

as a flat name.

In Core, all type abstractions and applications are explicit. This make it easy to typecheck any (closed) fragment of Core code. An full executable typechecker is available separately.

Types are described by type expressions, which are built from named type constructors and type variables using type application and universal quantification. Each type constructor has a fixed arity ≥ 0. Because it is so widely used, there is special infix syntax for the fully-applied function type constructor (`->`

). (The prefix identifier for this constructor is `ghc-prim:GHC.Prim.ZLzmzgZR`

; this should only appear in unapplied or partially applied form.)

There are also a number of other primitive type constructors (e.g., `Intzh`

) that are predefined in the `GHC.Prim`

module, but have no special syntax. `%data`

and `%newtype`

declarations introduce additional type constructors, as described below. Type constructors are distinguished solely by name.

A type may also be built using one of the primitive coercion operators, as described in the Namespaces section. For details on the meanings of these operators, see the System FC paper [system-fc]. Also see the Newtypes section for examples of how GHC uses coercions in Core code.

As described in the Haskell definition, it is necessary to distinguish well-formed type-expressions by classifying them into different *kinds* [haskell98, p. 41]. In particular, Core explicitly records the kind of every bound type variable.

In addition, Core’s kind system includes equality kinds, as in System FC [system-fc]. An application of a built-in coercion, or of a user-defined coercion as introduced by a `newtype`

declaration, has an equality kind.

Semantically, a type is *lifted* if and only if it has bottom as an element. We need to distinguish them because operationally, terms with lifted types may be represented by closures; terms with unlifted types must not be represented by closures, which implies that any unboxed value is necessarily unlifted. We distinguish between lifted and unlifted types by ascribing them different kinds.

Currently, all the primitive types are unlifted (including a few boxed primitive types such as `ByteArrayzh`

). Peyton-Jones and Launchbury [pj:unboxed] described the ideas behind unboxed and unlifted types.

Every type constructor has a kind, depending on its arity and whether it or its arguments are lifted.

Term variables can only be assigned types that have base kinds: the base kinds are `*`

, `#`

, and `?`

. The three base kinds distinguish the liftedness of the types they classify: `*`

represents lifted types; `#`

represents unlifted types; and `?`

is the 「open」 kind, representing a type that may be either lifted or unlifted. Of these, only `*`

ever appears in Core type declarations generated from user code; the other two are needed to describe certain types in primitive (or otherwise specially-generated) code (which, after optimization, could potentially appear anywhere).

In particular, no top-level identifier (except in `ghc-prim:GHC.Prim`

) has a type of kind `#`

or `?`

.

Nullary type constructors have base kinds: for example, the type `Int`

has kind `*`

, and `Int#`

has kind `#`

.

Non-nullary type constructors have higher kinds: kinds that have the form k_{1}`->`

k_{2}, where k_{1} and k_{2} are kinds. For example, the function type constructor `->`

has kind `* -> (* -> *)`

. Since Haskell allows abstracting over type constructors, type variables may have higher kinds; however, much more commonly they have kind `*`

, so that is the default if a type binder omits a kind.

There is no mechanism for defining type synonyms (corresponding to Haskell `type`

declarations).

Type equivalence is just syntactic equivalence on type expressions (of base kinds) modulo:

- alpha-renaming of variables bound in
`%forall`

types; - the identity a
`->`

b ≡`ghc-prim:GHC.Prim.ZLzmzgZR`

a b

Each data declaration introduces a new type constructor and a set of one or more data constructors, normally corresponding directly to a source Haskell `data`

declaration. For example, the source declaration

data Bintree a = Fork (Bintree a) (Bintree a) | Leaf a

might induce the following Core declaration

%data Bintree a = { Fork (Bintree a) (Bintree a); Leaf a)}

which introduces the unary type constructor Bintree of kind `*->*`

and two data constructors with types

Fork :: %forall a . Bintree a -> Bintree a -> Bintree a Leaf :: %forall a . a -> Bintree a

We define the *arity* of each data constructor to be the number of value arguments it takes; e.g. `Fork`

has arity 2 and `Leaf`

has arity 1.

For a less conventional example illustrating the possibility of higher-order kinds, the Haskell source declaration

data A f a = MkA (f a)

might induce the Core declaration

%data A (f::*->*) a = { MkA (f a) }

which introduces the constructor

MkA :: %forall (f::*->*) a . (f a) -> (A f) a

GHC (like some other Haskell implementations) supports an extension to Haskell98 for existential types such as

data T = forall a . MkT a (a -> Bool)

This is represented by the Core declaration

%data T = {MkT @a a (a -> Bool)}

which introduces the nullary type constructor T and the data constructor

MkT :: %forall a . a -> (a -> Bool) -> T

In general, existentially quantified variables appear as extra universally quantified variables in the data contructor types. An example of how to construct and deconstruct values of type `T`

is shown in the Expression Forms section.

Each Core `%newtype`

declaration introduces a new type constructor and an associated representation type, corresponding to a source Haskell `newtype`

declaration. However, unlike in source Haskell, a `%newtype`

declaration does not introduce any data constructors.

Each `%newtype`

declaration also introduces a new coercion (syntactically, just another type constructor) that implies an axiom equating the type constructor, applied to any type variables bound by the `%newtype`

, to the representation type.

For example, the Haskell fragment

newtype U = MkU Bool u = MkU True v = case u of MkU b -> not b

might induce the Core fragment

%newtype U ZCCoU = Bool; u :: U = %cast (True) ((%sym ZCCoU)); v :: Bool = not (%cast (u) ZCCoU);

The `newtype`

declaration implies that the types `U`

and `Bool`

have equivalent representations, and the coercion axiom `ZCCoU`

provides evidence that `U`

is equivalent to `Bool`

. Notice that in the body of `u`

, the boolean value `True`

is cast to type `U`

using the primitive symmetry rule applied to `ZCCoU`

: that is, using a coercion of kind `Bool :=: U`

. And in the body of `v`

, `u`

is cast back to type `Bool`

using the axiom `ZCCoU`

.

Notice that the `case`

in the Haskell source code above translates to a `cast`

in the corresponding Core code. That is because operationally, a `case`

on a value whose type is declared by a `newtype`

declaration is a no-op. Unlike a `case`

on any other value, such a `case`

does no evaluation: its only function is to coerce its scrutinee’s type.

Also notice that unlike in a previous draft version of External Core, there is no need to handle recursive newtypes specially.

Variables and data constructors are straightforward.

Literal (lit) expressions consist of a literal value, in one of four different formats, and a (primitive) type annotation. Only certain combinations of format and type are permitted; see the Primitive Module section. The character and string formats can describe only 8-bit ASCII characters.

Moreover, because the operational semantics for Core interprets strings as C-style null-terminated strings, strings should not contain embedded nulls.

In Core, value applications, type applications, value abstractions, and type abstractions are all explicit. To tell them apart, type arguments in applications and formal type arguments in abstractions are preceded by an `@ symbol`

. (In abstractions, the `@`

plays essentially the same role as the more usual Λ symbol.) For example, the Haskell source declaration

f x = Leaf (Leaf x)

might induce the Core declaration

f :: %forall a . a -> BinTree (BinTree a) = \ @a (x::a) -> Leaf @(Bintree a) (Leaf @a x)

Value applications may be of user-defined functions, data constructors, or primitives. None of these sorts of applications are necessarily saturated.

Note that the arguments of type applications are not always of kind `*`

. For example, given our previous definition of type `A`

:

data A f a = MkA (f a)

the source code

MkA (Leaf True)

becomes

(MkA @Bintree @Bool) (Leaf @Bool True)

Local bindings, of a single variable or of a set of mutually recursive variables, are represented by `%let`

expressions in the usual way.

By far the most complicated expression form is `%case`

. `%case`

expressions are permitted over values of any type, although they will normally be algebraic or primitive types (with literal values). Evaluating a `%case`

forces the evaluation of the expression being tested (the 「scrutinee」). The value of the scrutinee is bound to the variable following the `%of`

keyword, which is in scope in all alternatives; this is useful when the scrutinee is a non-atomic expression (see next example). The scrutinee is preceded by the type of the entire `%case`

expression: that is, the result type that all of the `%case`

alternatives have (this is intended to make type reconstruction easier in the presence of type equality coercions).

In an algebraic `%case`

, all the case alternatives must be labeled with distinct data constructors from the algebraic type, followed by any existential type variable bindings (see below), and typed term variable bindings corresponding to the data constructor’s arguments. The number of variables must match the data constructor’s arity.

For example, the following Haskell source expression

case g x of Fork l r -> Fork r l t@(Leaf v) -> Fork t t

might induce the Core expression

%case ((Bintree a)) g x %of (t::Bintree a) Fork (l::Bintree a) (r::Bintree a) -> Fork @a r l Leaf (v::a) -> Fork @a t t

When performing a `%case`

over a value of an existentially-quantified algebraic type, the alternative must include extra local type bindings for the existentially-quantified variables. For example, given

data T = forall a . MkT a (a -> Bool)

the source

case x of MkT w g -> g w

becomes

%case x %of (x’::T) MkT @b (w::b) (g::b->Bool) -> g w

In a `%case`

over literal alternatives, all the case alternatives must be distinct literals of the same primitive type.

The list of alternatives may begin with a default alternative labeled with an underscore (`%_`

), whose right-hand side will be evaluated if none of the other alternatives match. The default is optional except for in a case over a primitive type, or when there are no other alternatives. If the case is over neither an algebraic type nor a primitive type, then the list of alternatives must contain a default alternative and nothing else. For algebraic cases, the set of alternatives need not be exhaustive, even if no default is given; if alternatives are missing, this implies that GHC has deduced that they cannot occur.

`%cast`

is used to manipulate newtypes, as described in the Newtype section. The `%cast`

expression takes an expression and a coercion: syntactically, the coercion is an arbitrary type, but it must have an equality kind. In an expression `(cast e co)`

, if `e :: T`

and `co`

has kind `T :=: U`

, then the overall expression has type `U`

[ghc-fc-commentary]. Here, `co`

must be a coercion whose left-hand side is `T`

.

Note that unlike the `%coerce`

expression that existed in previous versions of Core, this means that `%cast`

is (almost) type-safe: the coercion argument provides evidence that can be verified by a typechecker. There are still unsafe `%cast`

s, corresponding to the unsafe `%coerce`

construct that existed in old versions of Core, because there is a primitive unsafe coercion type that can be used to cast arbitrary types to each other. GHC uses this for such purposes as coercing the return type of a function (such as error) which is guaranteed to never return:

case (error "") of True -> 1 False -> 2

becomes:

%cast (error @ Bool (ZMZN @ Char)) (%unsafe Bool Integer);

`%cast`

has no operational meaning and is only used in typechecking.

A `%note`

expression carries arbitrary internal information that GHC finds interesting. The information is encoded as a string. Expression notes currently generated by GHC include the inlining pragma (`InlineMe`

) and cost-center labels for profiling.

A `%external`

expression denotes an external identifier, which has the indicated type (always expressed in terms of Haskell primitive types). External Core supports two kinds of external calls: `%external`

and `%dynexternal`

. Only the former is supported by the current set of stand-alone Core tools. In addition, there is a `%label`

construct which GHC may generate but which the Core tools do not support.

The present syntax for externals is sufficient for describing C functions and labels. Interfacing to other languages may require additional information or a different interpretation of the name string.

The dynamic semantics of Core are defined on the type-erasure of the program: for example, we ignore all type abstractions and applications. The denotational semantics of the resulting type-free program are just the conventional ones for a call-by-name language, in which expressions are only evaluated on demand. But Core is intended to be a call-by-*need* language, in which expressions are only evaluated once. To express the sharing behavior of call-by-need, we give an operational model in the style of Launchbury [launchbury93natural].

This section describes the model informally; a more formal semantics is separately available as an executable interpreter.

To simplify the semantics, we consider only 「well-behaved」 Core programs in which constructor and primitive applications are fully saturated, and in which non-trivial expresssions of unlifted kind (`#`

) appear only as scrutinees in `%case`

expressions. Any program can easily be put into this form; a separately available preprocessor illustrates how. In the remainder of this section, we use 「Core」 to mean 「well-behaved」 Core.

Evaluating a Core expression means reducing it to *weak-head normal form (WHNF)*, i.e., a primitive value, lambda abstraction, or fully-applied data constructor. Evaluating a program means evaluating the expression `main:ZCMain.main`

.

To make sure that expression evaluation is shared, we make use of a *heap*, which contains *heap entries*. A heap entry can be:

- A
*thunk*, representing an unevaluated expression, also known as a suspension. - A
*WHNF*, representing an evaluated expression. The result of evaluating a thunk is a WHNF. A WHNF is always a closure (corresponding to a lambda abstraction in the source program) or a data constructor application: computations over primitive types are never suspended.

*Heap pointers* point to heap entries: at different times, the same heap pointer can point to either a thunk or a WHNF, because the run-time system overwrites thunks with WHNFs as computation proceeds.

The suspended computation that a thunk represents might represent evaluating one of three different kinds of expression. The run-time system allocates a different kind of thunk depending on what kind of expression it is:

- A thunk for a value definition has a group of suspended defining expressions, along with a list of bindings between defined names and heap pointers to those suspensions. (A value definition may be a recursive group of definitions or a single non-recursive definition, and it may be top-level (global) or
`let`

-bound (local)). - A thunk for a function application (where the function is user-defined) has a suspended actual argument expression, and a binding between the formal argument and a heap pointer to that suspension.
- A thunk for a constructor application has a suspended actual argument expression; the entire constructed value has a heap pointer to that suspension embedded in it.

As computation proceeds, copies of the heap pointer for a given thunk propagate through the executing program. When another computation demands the result of that thunk, the thunk is *forced*: the run-time system computes the thunk’s result, yielding a WHNF, and overwrites the heap entry for the thunk with the WHNF. Now, all copies of the heap pointer point to the new heap entry: a WHNF. Forcing occurs only in the context of

- evaluating the operator expression of an application;
- evaluating the scrutinee of a
`case`

expression; or - evaluating an argument to a primitive or external function application

When no pointers to a heap entry (whether it is a thunk or WHNF) remain, the garbage collector can reclaim the space it uses. We assume this happens implicitly.

With the exception of functions, arrays, and mutable variables, we intend that values of all primitive types should be held *unboxed*: they should not be heap-allocated. This does not violate call-by-need semantics: all primitive types are *unlifted*, which means that values of those types must be evaluated strictly. Unboxed tuple types are not heap-allocated either.

Certain primitives and `%external`

functions cause side-effects to state threads or to the real world. Where the ordering of these side-effects matters, Core already forces this order with data dependencies on the pseudo-values representing the threads.

An implementation must specially support the `raisezh`

and `handlezh`

primitives: for example, by using a handler stack. Again, real-world threading guarantees that they will execute in the correct order.

A possible improvement to the Core syntax would be to add explicit import lists to Core modules, which could be used to specify abbrevations for long qualified names. This would make the code more human-readable.

Two examples of such identifiers are: data constructors, and values that potentially appear in an unfolding. For an example of the latter, consider

`Main.foo = ... Main.bar ...`

, where `Main.foo`

is inlineable. Since `bar`

appears in `foo`

’s unfolding, it is defined and referenced with an external name, even if `bar`

was not exported by the original source module.