Pseudo C++ is a compact compiler frontend implemented using Flex and Bison. It translates a minimal subset of C++-like syntax into standard C++, then compiles and runs it inside a Linux (amd64) container.
This setup allows the entire toolchain to stay compatible with future x86-64 NASM backend development, even when run from macOS or ARM hosts.
.
โโโ .gitignore
โโโ 01-grammar.txt # EBNF grammar reference
โโโ Dockerfile # amd64 Ubuntu 24.04 build environment
โโโ run_dev.sh # Entry script for macOS (host-side)
โโโ to_scan/
โโโ build.sh # Build script executed inside Docker
โโโ input.txt # Sample Pseudo C++ program
โโโ parser.y # Bison grammar
โโโ scanner.l # Flex lexical analyzer
This section explains the Pseudo C++ grammar in plain English. It is a minimal subset of C/C++, designed to preserve the essential structure of imperative programming โ variables, expressions, control flow, and simple I/O โ while staying simple enough to parse and translate directly.
Identifiers
-
Must start with a letter or underscore (
_), followed by any combination of letters, digits, or underscores. -
Example:
a var1 _temp counter_2
Numbers
-
Only integers are supported.
-
Floating-point numbers are not recognized by the grammar.
-
Example:
0 42 1001
String Literals
-
Written inside double quotes (
"..."). -
Can contain letters, digits, spaces, underscores, and simple punctuation such as
. , ! ?. -
Strings are not first-class objects; they are just static text passed to
prints(). -
Example:
"hello" "result = " "a, b!"
Expressions describe arithmetic computations and can include:
- Identifiers
- Integer constants
- Parentheses for grouping
- Binary operators:
+,-,*,/
Examples:
a + 1
x * (y - 3)
n / 2 + 10All arithmetic is integer-only โ there are no booleans or floating types.
Operator precedence follows the usual C-style convention:
* and / have higher precedence than + and -.
A condition compares two arithmetic expressions using:
> < >= <= == !=
Examples:
a > b
x + 1 <= 10
n != 0Every condition produces a boolean result (true/false),
used primarily in if and while statements.
A statement is one line of action or declaration. The grammar defines six kinds of statements:
Declares one or more integer variables, optionally with initialization.
int a;
int x = 5;
int i = 0, j = 10, k;All variables are of type int.
No arrays, structs, or pointers are supported.
Assigns a computed value to an existing variable.
a = 10;
b = a + 3;
total = x * y;Executes a block only if a condition is true.
if (a > b) {
prints("a is larger");
}Currently, there is no else clause.
Nested if blocks are allowed.
Repeats a block while a condition remains true.
while (n > 0) {
print(n);
n = n - 1;
}Prints an integer value followed by a newline.
print(a);
print(a + b);Translated internally to:
printf("%d\n", value);Prints a string literal (no formatting, newline included).
prints("hello world");
prints("a, b values:");Translated internally to:
puts("hello world");A complete program is simply a list of statements, executed sequentially:
int a;
a = 5;
prints("result:");
print(a);Formally, this corresponds to:
Program โ { Statement }
Each statement ends with a semicolon (;)
except block statements (if, while) which use { ... }.
- No
float,double,char, orstringtypes - No function definitions or calls (other than
print,prints) - No arrays, structs, or classes
- No
else,for, orbreak - No preprocessor directives (
#include,#define) in source input
These restrictions make the language small and predictable โ ideal for translating directly into either C++ or x86 assembly.
Pseudo C++ preserves the feel of C syntax while remaining lightweight enough to serve as an educational compiler frontend.
The goal is not to replicate full C semantics, but to maintain a minimal imperative language that can be easily lowered to assembly.
This approach ensures:
- Easy mapping from high-level constructs to NASM instructions
- Deterministic, single-pass translation
- Predictable symbol and scope management
-
run_dev.sh(host-side)- Builds and runs a Docker container with
--platform=linux/amd64 - Mounts
to_scan/into/workspaceinside the container - Automatically executes
/workspace/build.sh
- Builds and runs a Docker container with
-
build.sh(inside container)- Runs Flex โ Bison โ g++ โ execute
- Translates
input.txtโgenerated.ccโprogram
Because of volume mapping between host and container:
-
Inside Docker,
build.shruns as root, so it already has full permission. -
On the host, both
run_dev.shandto_scan/build.shmust be executable before running:chmod +x run_dev.sh to_scan/build.sh
Otherwise Docker will mount them as read-only, leading to
Permission deniedorbad interpretererrors.
The current Pseudo C++ subset supports:
-
intdeclarations and assignments -
if/whilecontrol flow -
Integer arithmetic and comparison expressions
-
Output via:
print(expr); prints("literal");
These map to:
#define print(d) printf("%d\n", d)
#define prints(s) puts(s)All compilation happens in a Linux amd64 environment:
--platform=linux/amd64Even on Apple Silicon (ARM64), this guarantees:
- Compatibility with future x86 NASM backend
- Consistent system ABI and calling conventions
- Seamless transition to native
nasm + ldcompilation later
/bin/zsh ./run_dev.shThis will:
- Build the container (Ubuntu amd64)
- Compile the translator (Flex + Bison)
- Translate
input.txttogenerated.cc - Compile and run the generated C++ program
Example output:
var1, var2
9
1
var1, var2
7
2
...
The next phase will:
- Replace the C++ backend with a NASM x86-64 code generator
- Produce
.asmand.ofiles directly - Link via
nasmandldinto native ELF executables
This will remove the dependency on g++ entirely.