第10章 An External Representation for the GHC Core Language (For GHC 6.10)

目次

10.1. Introduction
10.2. External Grammar of Core
10.3. Informal Semantics
10.3.1. Program Organization and Modules
10.3.2. Namespaces
10.3.3. Types and Kinds
10.3.3.1. Types
10.3.3.2. Coercions
10.3.3.3. Kinds
10.3.3.4. Lifted and Unlifted Types
10.3.3.5. Type Constructors; Base Kinds and Higher Kinds
10.3.3.6. Type Synonyms and Type Equivalence
10.3.4. Algebraic data types
10.3.5. Newtypes
10.3.6. Expression Forms
10.3.7. Expression Evaluation
10.4. Primitive Module
10.4.1. Non-concurrent Back End
10.4.2. Literals

Andrew Tolmach, Tim Chevalier ({apt,tjc}@cs.pdx.edu) and The GHC Team

This chapter provides a precise definition for the GHC Core language, so that it can be used to communicate between GHC and new stand-alone compilation tools such as back-ends or optimizers.[15] The definition includes a formal grammar and an informal semantics. An executable typechecker and interpreter (in Haskell), which formally embody the static and dynamic semantics, are available separately.

10.1. Introduction

The Glasgow Haskell Compiler (GHC) uses an intermediate language, called Core, as its internal program representation within the compiler’s simplification phase. Core resembles a subset of Haskell, but with explicit type annotations in the style of the polymorphic lambda calculus (Fω).

GHC’s front-end translates full Haskell 98 (plus some extensions) into Core. The GHC optimizer then repeatedly transforms Core programs while preserving their meaning. A Core Lint pass in GHC typechecks Core in between transformation passes (at least when the user enables linting by setting a compiler flag), verifying that transformations preserve type-correctness. Finally, GHC’s back-end translates Core into STG-machine code [stg-machine] and then into C or native code.

Two existing papers discuss the original rationale for the design and use of Core [ghc-inliner,comp-by-trans-scp], although the (two different) idealized versions of Core described therein differ in significant ways from the actual Core language in current GHC. In particular, with the advent of GHC support for generalized algebraic datatypes (GADTs) [gadts] Core was extended beyond its previous Fω-style incarnation to support type equality constraints and safe coercions, and is now based on a system known as FC [system-fc].

Researchers interested in writing just part of a Haskell compiler, such as a new back-end or a new optimizer pass, might like to use GHC to provide the other parts of the compiler. For example, they might like to use GHC’s front-end to parse, desugar, and type-check source Haskell, then feeding the resulting code to their own back-end tool. As another example, they might like to use Core as the target language for a front-end compiler of their own design, feeding externally synthesized Core into GHC in order to take advantage of GHC’s optimizer, code generator, and run-time system. Without external Core, there are two ways for compiler writers to do this: they can link their code into the GHC executable, which is an arduous process, or they can use the GHC API [ghc-api] to do the same task more cleanly. Both ways require new code to be written in Haskell.

We present a precisely specified external format for Core files. The external format is text-based and human-readable, to promote interoperability and ease of use. We hope this format will make it easier for external developers to use GHC in a modular way.

It has long been true that GHC prints an ad-hoc textual representation of Core if you set certain compiler flags. But this representation is intended to be read by people who are debugging the compiler, not by other programs. Making Core into a machine-readable, bi-directional communication format requires:

  1. precisely specifying the external format of Core;
  2. modifying GHC to generate external Core files (post-simplification; as always, users can control the exact transformations GHC does with command-line flags);
  3. modifying GHC to accept external Core files in place of Haskell source files (users will also be able to control what GHC does to those files with command-line flags).

The first two facilities will let developers couple GHC’s front-end (parser, type-checker, desugarer), and optionally its optimizer, with new back-end tools. The last facility will let developers write new Core-to-Core transformations as an external tool and integrate them into GHC. It will also allow new front-ends to generate Core that can be fed into GHC’s optimizer or back-end.

However, because there are many (undocumented) idiosyncracies in the way GHC produces Core from source Haskell, it will be hard for an external tool to produce Core that can be integrated with GHC-produced Core (e.g., for the Prelude), and we don’t aim to support this. Indeed, for the time being, we aim to support only the first two facilities and not the third: we define and implement Core as an external format that GHC can use to communicate with external back-end tools, and defer the larger task of extending GHC to support reading this external format back in.

This document addresses the first requirement, a formal Core definition, by proposing a formal grammar for an external representation of Core, and an informal semantics.

GHC supports many type system extensions; the External Core printer built into GHC only supports some of them. However, External Core should be capable of representing any Haskell 98 program, and may be able to represent programs that require certain type system extensions as well. If a program uses unsupported features, GHC may fail to compile it to Core when the -fext-core flag is set, or GHC may successfully compile it to Core, but the external tools will not be able to typecheck or interpret it.

Formal static and dynamic semantics in the form of an executable typechecker and interpreter are available separately in the GHC source tree [16] under utils/ext-core.

References

[ghc-user-guide] The GHC Team [FAMILY Given]. The Glorious Glasgow Haskell Compilation System User's Guide, Version 6.8.2」. 2008. http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html.

[ghc-fc-commentary] GHC Wiki [FAMILY Given]. System FC: equality constraints and coercions」. 2006. http://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC.

[ghc-api] Haskell Wiki [FAMILY Given]. Using GHC as a library」. 2007. http://haskell.org/haskellwiki/GHC/As_a_library.

[haskell98] Peyton-Jones Simon [FAMILY Given]. Haskell 98 Language and Libraries: The Revised Report」. Cambridge University Press. Cambridge> UK . 2003.

[system-fc] Sulzmann Martin [FAMILY Given], Chakravarty Manuel M.T. [FAMILY Given], Peyton-Jones Simon [FAMILY Given], 、 Donnelly Kevin [FAMILY Given]. System F with type equality coercions」. ACM. New York NY USA . 53-66. 2007. http://portal.acm.org/citation.cfm?id=1190324.

[gadts] Peyton-Jones Simon [FAMILY Given], Vytiniotis Dimitrios [FAMILY Given], Weirich Stephanie [FAMILY Given], 、 Washburn Geoffrey [FAMILY Given]. Simple unification-based type inference for GADTs」. ACM. New York NY USA . 50-61. 2006. http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm.

[Launchbury94] Launchbury John [FAMILY Given] 、 Peyton-Jones Simon L. [FAMILY Given]. Lazy Functional State Threads」. 24-35. 1994. http://citeseer.ist.psu.edu/article/launchbury93lazy.html.

[pj:unboxed] Peyton-Jones Simon L. [FAMILY Given], Launchbury John [FAMILY Given], 、 Hughes J. [FAMILY Given]. Unboxed Values as First Class Citizens in a Non-strict Functional Language」. Springer-Verlag LNCS523. Cambridge Massachussetts USA . 636-666. 1991, August 26-28. http://citeseer.ist.psu.edu/jones91unboxed.html.

[ghc-inliner] Peyton-Jones Simon [FAMILY Given] 、 Marlow Simon [FAMILY Given]. Secrets of the Glasgow Haskell Compiler inliner」. 1999. Paris France . http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf.

[comp-by-trans-scp] Peyton-Jones Simon L. [FAMILY Given] 、 Santos A. L. M. [FAMILY Given]. A transformation-based optimiser for Haskell」. Science of Computer Programming. 32. 1-3. 3-47. 1998. http://citeseer.ist.psu.edu/peytonjones98transformationbased.html.

[stg-machine] Peyton-Jones Simon L. [FAMILY Given]. Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-Machine」. Journal of Functional Programming. 2. 2. 127-202. 1992. http://citeseer.ist.psu.edu/peytonjones92implementing.html.

[launchbury93natural] Launchbury John [FAMILY Given]. A Natural Semantics for Lazy Evaluation」. 144-154. Charleston South Carolina . 1993. http://citeseer.ist.psu.edu/launchbury93natural.html.

[ghcprim] The GHC Team [FAMILY Given]. Library documentation: GHC.Prim」. 2008. http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html.

[15]

This is a draft document, which attempts to describe GHC’s current behavior as precisely as possible. Working notes scattered throughout indicate areas where further work is needed. Constructive comments are very welcome, both on the presentation, and on ways in which GHC could be improved in order to simplify the Core story.

Support for generating external Core (post-optimization) was originally introduced in GHC 5.02. The definition of external Core in this document reflects the version of external Core generated by the HEAD (unstable) branch of GHC as of May 3, 2008 (version 6.9), using the compiler flag -fext-core. We expect that GHC 6.10 will be consistent with this definition.