Skip to content

Latest commit

 

History

History
39 lines (30 loc) · 2.7 KB

File metadata and controls

39 lines (30 loc) · 2.7 KB

Aburiscript Compiler Architecture

This document provides a high-level overview of the Aburiscript compiler's architecture, outlining how source code is transformed into an executable or object file.

Pipeline Overview

The compiler follows a traditional frontend pipeline design, lowering source code through several distinct stages before generating LLVM IR.

1. Driver Setup (main.cpp)

  • Parses command-line arguments.
  • Determines compilation mode (e.g., -E, -S, -c, or full compilation/linking).
  • Configures language options (-std, -x) and target ABIs.
  • Routes input files: C/C++ files go to the frontend, object files go to the linker, and unsupported languages can be passed through to the system compiler.

2. Preprocessing (preprocessor.h, preprocessor.cpp)

  • Resolves #include directives.
  • Handles conditional compilation (#ifdef, #if, etc.).
  • Expands macros.
  • Converts source text into a stream of tokens for the parser.

3. Parsing and Semantic Analysis (parser/*, collect/*, ast/*)

  • Parser: Consumes the token stream and builds an Clang-like Abstract Syntax Tree (AST) using recursive descent. It attempts tentative parsing to resolve ambiguities (especially prevalent in C++).
  • Sema (Collect): Tightly coupled with the parser, it performs on-the-fly semantic checks, type resolution, scoping, and AST node creation. If there is a semantic error, compilation halts with diagnostic messages.
  • AST Layer (ast/*): Owns the AST node definitions, clone helpers, side tables, and the tightly-coupled type/symbol support used by parsing, semantic analysis, and codegen.
  • Deep Dive: Read Parser and Collect Internals for a detailed look at scope management, rollback, and C++ resolution.

4. Constant Evaluation (constexpr/*)

  • Evaluates compile-time expressions (e.g., array sizes, _Static_assert predicates, enum values).

5. LLVM Code Generation (ast2llvm/*, abi/*)

  • Translates the typed, semantic AST into LLVM Intermediate Representation (IR).
  • Manages target-specific ABI lowering, including struct layouts, bitfields, and mangling conventions (targeting the Itanium C++ ABI).
  • Handles the generation of complex control flow (like exceptions and switch statements) into LLVM blocks.
  • Deep Dive: Read Code Generation Internals for a detailed look at the C++ object model lowering, vtable emission, and exception handling.

6. Backend and Linking

  • Passes the generated LLVM IR to LLVM's optimization passes (depending on the -O flag).
  • Emits target assembly (.s) or object code (.o).
  • Invokes the system linker (e.g., /usr/bin/cc) to produce the final executable, unless disabled.