Skip to content

ISEP-Projects-JH/Pseudo_Cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงฉ Pseudo C++ โ€” Minimal Compiler Frontend

Overview

Pseudo C++ is a compact compiler frontend implemented using Flex and Bison. It translates a minimal subset of C++-like syntax into standard C++, then compiles and runs it inside a Linux (amd64) container.

This setup allows the entire toolchain to stay compatible with future x86-64 NASM backend development, even when run from macOS or ARM hosts.


๐Ÿ“ Project Structure

.
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ 01-grammar.txt        # EBNF grammar reference
โ”œโ”€โ”€ Dockerfile            # amd64 Ubuntu 24.04 build environment
โ”œโ”€โ”€ run_dev.sh            # Entry script for macOS (host-side)
โ””โ”€โ”€ to_scan/
    โ”œโ”€โ”€ build.sh          # Build script executed inside Docker
    โ”œโ”€โ”€ input.txt         # Sample Pseudo C++ program
    โ”œโ”€โ”€ parser.y          # Bison grammar
    โ””โ”€โ”€ scanner.l         # Flex lexical analyzer

๐Ÿงฎ Language Grammar Explained (Human Readable)

This section explains the Pseudo C++ grammar in plain English. It is a minimal subset of C/C++, designed to preserve the essential structure of imperative programming โ€” variables, expressions, control flow, and simple I/O โ€” while staying simple enough to parse and translate directly.


๐Ÿ”  Identifiers and Literals

Identifiers

  • Must start with a letter or underscore (_), followed by any combination of letters, digits, or underscores.

  • Example:

    a
    var1
    _temp
    counter_2
    

Numbers

  • Only integers are supported.

  • Floating-point numbers are not recognized by the grammar.

  • Example:

    0
    42
    1001
    

String Literals

  • Written inside double quotes ("...").

  • Can contain letters, digits, spaces, underscores, and simple punctuation such as . , ! ?.

  • Strings are not first-class objects; they are just static text passed to prints().

  • Example:

    "hello"
    "result = "
    "a, b!"
    

โž• Expressions

Expressions describe arithmetic computations and can include:

  • Identifiers
  • Integer constants
  • Parentheses for grouping
  • Binary operators: +, -, *, /

Examples:

a + 1
x * (y - 3)
n / 2 + 10

All arithmetic is integer-only โ€” there are no booleans or floating types. Operator precedence follows the usual C-style convention: * and / have higher precedence than + and -.


โš–๏ธ Comparisons and Conditions

A condition compares two arithmetic expressions using:

>   <   >=   <=   ==   !=

Examples:

a > b
x + 1 <= 10
n != 0

Every condition produces a boolean result (true/false), used primarily in if and while statements.


๐Ÿงฑ Statements

A statement is one line of action or declaration. The grammar defines six kinds of statements:


1๏ธโƒฃ Declaration

Declares one or more integer variables, optionally with initialization.

int a;
int x = 5;
int i = 0, j = 10, k;

All variables are of type int. No arrays, structs, or pointers are supported.


2๏ธโƒฃ Assignment

Assigns a computed value to an existing variable.

a = 10;
b = a + 3;
total = x * y;

3๏ธโƒฃ If Statement

Executes a block only if a condition is true.

if (a > b) {
    prints("a is larger");
}

Currently, there is no else clause. Nested if blocks are allowed.


4๏ธโƒฃ While Statement

Repeats a block while a condition remains true.

while (n > 0) {
    print(n);
    n = n - 1;
}

5๏ธโƒฃ Print Statement

Prints an integer value followed by a newline.

print(a);
print(a + b);

Translated internally to:

printf("%d\n", value);

6๏ธโƒฃ Prints Statement

Prints a string literal (no formatting, newline included).

prints("hello world");
prints("a, b values:");

Translated internally to:

puts("hello world");

๐Ÿงฉ Program Structure

A complete program is simply a list of statements, executed sequentially:

int a;
a = 5;
prints("result:");
print(a);

Formally, this corresponds to:

Program โ†’ { Statement }

Each statement ends with a semicolon (;) except block statements (if, while) which use { ... }.


๐Ÿšซ Unsupported Features (by design)

  • No float, double, char, or string types
  • No function definitions or calls (other than print, prints)
  • No arrays, structs, or classes
  • No else, for, or break
  • No preprocessor directives (#include, #define) in source input

These restrictions make the language small and predictable โ€” ideal for translating directly into either C++ or x86 assembly.


๐Ÿ’ก Design Intent

Pseudo C++ preserves the feel of C syntax while remaining lightweight enough to serve as an educational compiler frontend.

The goal is not to replicate full C semantics, but to maintain a minimal imperative language that can be easily lowered to assembly.

This approach ensures:

  • Easy mapping from high-level constructs to NASM instructions
  • Deterministic, single-pass translation
  • Predictable symbol and scope management

โš™๏ธ Build Pipeline

  1. run_dev.sh (host-side)

    • Builds and runs a Docker container with --platform=linux/amd64
    • Mounts to_scan/ into /workspace inside the container
    • Automatically executes /workspace/build.sh
  2. build.sh (inside container)

    • Runs Flex โ†’ Bison โ†’ g++ โ†’ execute
    • Translates input.txt โ†’ generated.cc โ†’ program

๐Ÿงฑ Execution Notes

๐Ÿ”’ Permissions

Because of volume mapping between host and container:

  • Inside Docker, build.sh runs as root, so it already has full permission.

  • On the host, both run_dev.sh and to_scan/build.sh must be executable before running:

    chmod +x run_dev.sh to_scan/build.sh

    Otherwise Docker will mount them as read-only, leading to Permission denied or bad interpreter errors.


๐Ÿงฉ Language Summary

The current Pseudo C++ subset supports:

  • int declarations and assignments

  • if / while control flow

  • Integer arithmetic and comparison expressions

  • Output via:

    print(expr);
    prints("literal");

These map to:

#define print(d)  printf("%d\n", d)
#define prints(s) puts(s)

๐Ÿง  Target Architecture

All compilation happens in a Linux amd64 environment:

--platform=linux/amd64

Even on Apple Silicon (ARM64), this guarantees:

  • Compatibility with future x86 NASM backend
  • Consistent system ABI and calling conventions
  • Seamless transition to native nasm + ld compilation later

๐Ÿš€ Run Example (macOS Host)

/bin/zsh ./run_dev.sh

This will:

  1. Build the container (Ubuntu amd64)
  2. Compile the translator (Flex + Bison)
  3. Translate input.txt to generated.cc
  4. Compile and run the generated C++ program

Example output:

var1, var2
9
1
var1, var2
7
2
...

๐Ÿ”ฎ Future Work

The next phase will:

  • Replace the C++ backend with a NASM x86-64 code generator
  • Produce .asm and .o files directly
  • Link via nasm and ld into native ELF executables

This will remove the dependency on g++ entirely.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors