Brainfuck on the C preprocessor

This is a Brainfuck interpreter implemented on top of the C preprocessor.

$ cat hello.txt
# include "bfi.h"

BFI_RUN1(x x x x x x x x x (b x x x x x x x x b x x x x x x x x x x x b x x x x x d d d _) b w b x x w x x x x x x x w w x x x w b _ w
    _ _ _ _ _ _ _ _ _ _ _ _ w d x x x x x x x x w _ _ _ _ _ _ _ _ w x x x w _ _ _ _ _ _ w _ _ _ _ _ _ _ _ w b x w)
$ cpp -P hello.txt


Hello, world!

Usage

The body of the interpreter is a macro named BFI_RUN, defined in "bfi.h". Call it with a brainfuck program as the first argument, and a character sequence as the second argument. It will excute the program and pretty prints its output. The supplied character sequence will be fed to the program as the standard input. When you don't want to give any input to the program, use the BFI_RUN1 macro instead.

Since the C preprocessor cannot parse a BF program written in the standard notation with the +-><.,[] characters, you must instead use x_bdwrLR, respectively. For example, consider the following program which copies stdin to stdout.

,[.[-],]

This program should be presented:

r L w L _ R r R

In places of L and R, you can also use ( and ), the ordinary parentheses. These might be more readable. Note that in this form, the sequence between the two matching parentheses must not be empty. Using this notation, the above program will be:

r (w (_) r)

ppsyntax.c is a script that converts a BF program from the usual notation into the PP syntax.

The second argument of BFI_RUN describes the content of the standard input. It is in the form of a byte sequence, with each byte represented in a 2-digit lowercase hexadecimal. Characters are assumed to be encoded in ASCII.

BFI_RUN(r (w (_) r), 0x61 0x62 0x63) /* => abc */

You can direlctly use alphanumeric characters (including the underscore) inside the sequence to indicate the corresponding byte. You can also use parentheses, as long as they nest properly and the contents are not empty. Moreover, there are number of shorthands defined, such as 0comma for comma or 0_n for newline. See lex.h for details.

BFI_RUN(r (w (_) r), a b c (F 0comma 0space G) ) /* => abc(F, G) */

Specs and limitations

Mechanics

BFI_RUN is defined roughly as follows.

# define BFI_RUN(code, input) \
  BFI_FORMAT_PRETTY(          \
    BFI_EXEC(                 \
      BFI_BUILD(              \
        BFI_OPTIMIZE(         \
          BFI_GEN_FLAT(       \
            BFI_LEX(code)))), \
      BFI_LEX(input)))

I use Daniel B. Cristofani's "Hello, world" as an example.

++++++++[>++++[>++>+++>+++>+<<<<-]>+>->+>>+[<]<-]>>.>
>---.+++++++..+++.>.<<-.>.+++.------.--------.>+.>++.

In the PP syntax:

x x x x x x x x (b x x x x (b x x b x x x b x x x b x d d d d _) b x b _ b x b b x (d) d _) b b w b
b _ _ _ w x x x x x x x w w x x x w b w d d _ w b w x x x w _ _ _ _ _ _ w _ _ _ _ _ _ _ _ w b x w b x x w

BFI_LEX outputs a sequence consisting of ASCII code.

BFI_LEX(
  x x x x x x x x (b x x x x (b x x b x x x b x x x b x d d d d _) b x b _ b x b b x (d) d _) b b w b
  b _ _ _ w x x x x x x x w w x x x w b w d d _ w b w x x x w _ _ _ _ _ _ w _ _ _ _ _ _ _ _ w b x w b x x w)
  /* =>
      (0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x28)(0x62)(0x78)(0x78)(0x78)(0x78)(0x28)(0x62)(0x78)
      (0x78)(0x62)(0x78)(0x78)(0x78)(0x62)(0x78)(0x78)(0x78)(0x62)(0x78)(0x64)(0x64)(0x64)(0x64)(0x5f)(0x29)
      (0x62)(0x78)(0x62)(0x5f)(0x62)(0x78)(0x62)(0x62)(0x78)(0x28)(0x64)(0x29)(0x64)(0x5f)(0x29)(0x62)(0x62)
      (0x77)(0x62)(0x62)(0x5f)(0x5f)(0x5f)(0x77)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x77)(0x77)(0x78)
      (0x78)(0x78)(0x77)(0x62)(0x77)(0x64)(0x64)(0x5f)(0x77)(0x62)(0x77)(0x78)(0x78)(0x78)(0x77)(0x5f)(0x5f)
      (0x5f)(0x5f)(0x5f)(0x5f)(0x77)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x77)(0x62)(0x78)(0x77)
      (0x62)(0x78)(0x78)(0x77)(0eof) */

BFI_GEN_FLAT transforms the byte sequence into intermediate code.

BFI_GEN_FLAT(
  (0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x28)(0x62)(0x78)(0x78)(0x78)(0x78)(0x28)(0x62)(0x78)
  (0x78)(0x62)(0x78)(0x78)(0x78)(0x62)(0x78)(0x78)(0x78)(0x62)(0x78)(0x64)(0x64)(0x64)(0x64)(0x5f)(0x29)
  (0x62)(0x78)(0x62)(0x5f)(0x62)(0x78)(0x62)(0x62)(0x78)(0x28)(0x64)(0x29)(0x64)(0x5f)(0x29)(0x62)(0x62)
  (0x77)(0x62)(0x62)(0x5f)(0x5f)(0x5f)(0x77)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x78)(0x77)(0x77)(0x78)
  (0x78)(0x78)(0x77)(0x62)(0x77)(0x64)(0x64)(0x5f)(0x77)(0x62)(0x77)(0x78)(0x78)(0x78)(0x77)(0x5f)(0x5f)
  (0x5f)(0x5f)(0x5f)(0x5f)(0x77)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x5f)(0x77)(0x62)(0x78)(0x77)
  (0x62)(0x78)(0x78)(0x77)(0eof)
) /* =>
      (6add(0x08))(6loop)(6next)(6add(0x04))(6loop)(6next)(6add(0x02))(6next)(6add(0x03))(6next)(6add(0x03))
      (6next)(6inc)(6prev)(6prev)(6prev)(6prev)(6dec)(6end(6loop))(6next)(6inc)(6next)(6dec)(6next)(6inc)(6next)
      (6next)(6inc)(6loop)(6prev)(6end(6loop))(6prev)(6dec)(6end(6loop))(6next)(6next)(6write)(6next)(6next)
      (6add(0xfd))(6write)(6add(0x07))(6write)(6write)(6add(0x03))(6write)(6next)(6write)(6prev)(6prev)(6dec)
      (6write)(6next)(6write)(6add(0x03))(6write)(6add(0xfa))(6write)(6add(0xf8))(6write)(6next)(6inc)(6write)
      (6next)(6add(0x02))(6write) */

At this stage, consecutive + and - instructions are glued together.

Next, BFI_OPTIMIZE optimizes the intermediate code.

BFI_OPTIMIZE(
  (6add(0x08))(6loop)(6next)(6add(0x04))(6loop)(6next)(6add(0x02))(6next)(6add(0x03))(6next)(6add(0x03))
  (6next)(6inc)(6prev)(6prev)(6prev)(6prev)(6dec)(6end(6loop))(6next)(6inc)(6next)(6dec)(6next)(6inc)(6next)
  (6next)(6inc)(6loop)(6prev)(6end(6loop))(6prev)(6dec)(6end(6loop))(6next)(6next)(6write)(6next)(6next)
  (6add(0xfd))(6write)(6add(0x07))(6write)(6write)(6add(0x03))(6write)(6next)(6write)(6prev)(6prev)(6dec)
  (6write)(6next)(6write)(6add(0x03))(6write)(6add(0xfa))(6write)(6add(0xf8))(6write)(6next)(6inc)(6write)
  (6next)(6add(0x02))(6write)
) /* =>
      (6add(0x08))(6loop)(6next1)(6add(0x04))(6senter)(6next1)(6sadd(0x02))(6next1)(6sadd(0x03))(6next1)
      (6sadd(0x03))(6next1)(6sadd(0x01))(6prevA(0x04))(6sexit)(6next1)(6inc)(6next1)(6dec)(6next1)(6inc)
      (6nextA(0x02))(6inc)(6loop)(6prev1)(6end(6loopA))(6prev1)(6dec)(6end(6loopA))(6nextA(0x02))(6writeA)
      (6nextA(0x02))(6add(0xfd))(6writeA)(6add(0x07))(6writeA)(6writeB)(6add(0x03))(6writeA)(6next1)(6writeA)
      (6prevA(0x02))(6dec)(6writeA)(6next1)(6writeA)(6add(0x03))(6writeA)(6add(0xfa))(6writeA)(6add(0xf8))
      (6writeA)(6next1)(6inc)(6writeA)(6next1)(6add(0x02))(6writeA) */

Now a simple inner loop is eliminated.

BFI_BUILD generates virtual machine code. The virtual machine code contains nests, whereas the intermediate code does not.

BFI_BUILD(
  (6add(0x08))(6loop)(6next1)(6add(0x04))(6senter)(6next1)(6sadd(0x02))(6next1)(6sadd(0x03))(6next1)
  (6sadd(0x03))(6next1)(6sadd(0x01))(6prevA(0x04))(6sexit)(6next1)(6inc)(6next1)(6dec)(6next1)(6inc)
  (6nextA(0x02))(6inc)(6loop)(6prev1)(6end(6loopA))(6prev1)(6dec)(6end(6loopA))(6nextA(0x02))(6writeA)
  (6nextA(0x02))(6add(0xfd))(6writeA)(6add(0x07))(6writeA)(6writeB)(6add(0x03))(6writeA)(6next1)(6writeA)
  (6prevA(0x02))(6dec)(6writeA)(6next1)(6writeA)(6add(0x03))(6writeA)(6add(0xfa))(6writeA)(6add(0xf8))
  (6writeA)(6next1)(6inc)(6writeA)(6next1)(6add(0x02))(6writeA)
) /* =>
      (6add,0x08)(6loopA,(6next1,~)(6add,0x04)(6senter,~)(6next1,~)(6sadd,0x02)(6next1,~)(6sadd,0x03)(6next1,~)
      (6sadd,0x03)(6next1,~)(6sadd,0x01)(6prevA,0x04)(6sexit,~)(6next1,~)(6inc,0x01)(6next1,~)(6dec,0xff)
      (6next1,~)(6inc,0x01)(6nextA,0x02)(6inc,0x01)(6loopA,(6prev1,~))(6prev1,~)(6dec,0xff))(6nextA,0x02)
      (6writeA,~)(6nextA,0x02)(6add,0xfd)(6writeA,~)(6add,0x07)(6writeA,~)(6writeB,~)(6add,0x03)(6writeA,~)
      (6next1,~)(6writeA,~)(6prevA,0x02)(6dec,0xff)(6writeA,~)(6next1,~)(6writeA,~)(6add,0x03)(6writeA,~)
      (6add,0xfa)(6writeA,~)(6add,0xf8)(6writeA,~)(6next1,~)(6inc,0x01)(6writeA,~)(6next1,~)(6add,0x02)(6writeA,~) */

Now we use BFI_EXEC to execute the code and obtain the output sequence of bytes.

BFI_EXEC(
  (6add,0x08)(6loopA,(6next1,~)(6add,0x04)(6senter,~)(6next1,~)(6sadd,0x02)(6next1,~)(6sadd,0x03)(6next1,~)
  (6sadd,0x03)(6next1,~)(6sadd,0x01)(6prevA,0x04)(6sexit,~)(6next1,~)(6inc,0x01)(6next1,~)(6dec,0xff)
  (6next1,~)(6inc,0x01)(6nextA,0x02)(6inc,0x01)(6loopA,(6prev1,~))(6prev1,~)(6dec,0xff))(6nextA,0x02)
  (6writeA,~)(6nextA,0x02)(6add,0xfd)(6writeA,~)(6add,0x07)(6writeA,~)(6writeB,~)(6add,0x03)(6writeA,~)
  (6next1,~)(6writeA,~)(6prevA,0x02)(6dec,0xff)(6writeA,~)(6next1,~)(6writeA,~)(6add,0x03)(6writeA,~)
  (6add,0xfa)(6writeA,~)(6add,0xf8)(6writeA,~)(6next1,~)(6inc,0x01)(6writeA,~)(6next1,~)(6add,0x02)(6writeA,~),
  (0eof)
) /* => (0x48)(0x65)(0x6c)(0x6c)(0x6f)(0x20)(0x57)(0x6f)(0x72)(0x6c)(0x64)(0x21)(0x0a)(0eof) */

BFI_FORMAT_PRETTY gives the final result.

BFI_FORMAT_PRETTY((0x48)(0x65)(0x6c)(0x6c)(0x6f)(0x20)(0x57)(0x6f)(0x72)(0x6c)(0x64)(0x21)(0x0a)(0eof))
  /* => Hello, World!'\n' */

The idea of implementing a virtual machine on the C preprocessor to construct a interpreter, and an efficient implementation of such machine is largely taken from Order Preprocessor.