Skip to content

Latest commit

 

History

History
210 lines (169 loc) · 6.29 KB

File metadata and controls

210 lines (169 loc) · 6.29 KB

Parser Component Documentation

Overview

The parser component is responsible for transforming source code text into an Abstract Syntax Tree (AST). It is implemented using the nom parser combinator library and follows a modular design pattern, breaking down the parsing logic into several specialized modules.

Architecture

The parser is organized into the following modules:

  • parser.rs: The main entry point that coordinates the parsing process
  • parser_common.rs: Common parsing utilities and shared functions
  • parser_expr.rs: Expression parsing functionality
  • parser_type.rs: Type system parsing
  • parser_stmt.rs: Statement and control flow parsing

Module Responsibilities and Public Interface

1. parser.rs

The main parser module that provides the entry point for parsing complete programs:

pub fn parse(input: &str) -> IResult<&str, Vec<Statement>>

2. parser_common.rs

Common parsing utilities used across other modules:

pub fn is_string_char(c: char) -> bool
pub fn separator<'a>(sep: &'static str) -> impl FnMut(&'a str) -> IResult<&'a str, &'a str>
pub fn keyword<'a>(kw: &'static str) -> impl FnMut(&'a str) -> IResult<&'a str, &'a str>
pub fn identifier(input: &str) -> IResult<&str, &str>

3. parser_expr.rs

Expression parsing functionality:

pub fn parse_expression(input: &str) -> IResult<&str, Expression>
pub fn parse_actual_arguments(input: &str) -> IResult<&str, Vec<Expression>>

4. parser_type.rs

Type system parsing:

pub fn parse_type(input: &str) -> IResult<&str, Type>

5. parser_stmt.rs

Statement and control flow parsing:

pub fn parse_statement(input: &str) -> IResult<&str, Statement>

Parser Features

Statement Parsing

The parser supports various types of statements:

  • Variable declarations and assignments
  • Control flow (if-else, while, for)
  • Function definitions
  • Assert statements
  • ADT (Algebraic Data Type) declarations

For statement semantics (iterables and scoping)

  • Syntax: for <identifier> in <expression>: <block> end
  • Supported iterables at parse time (resolved at runtime/type-check):
    • Lists: [e1, e2, ...]
    • Strings: "abc" (iterates over characters as 1-length strings)
    • Tuples: (e1, e2, ...) (see tuple literals below)
  • Scoping: the loop variable is bound in an inner scope created for the loop body. It is not visible outside the loop.
    • The variable is considered immutable by the type checker; each iteration rebinds it.

Expression Parsing

Handles different types of expressions:

  • Arithmetic expressions
  • Boolean expressions
  • Function calls
  • Variables
  • Literals (numbers, strings, booleans)
  • ADT constructors and pattern matching

Tuple literals and parenthesis grouping

The parser supports tuple literals and distinguishes them from parenthesized groupings:

  • Empty tuple: () -> Expression::Tuple([])
  • Single-element tuple: (x,) -> Expression::Tuple([x])
  • Multi-element tuple: (x, y, z) -> Expression::Tuple([x, y, z])
  • Grouping (no comma): (expr) -> parsed as expr (not a tuple)

Tuples may be nested, e.g., ((1, 2), (3, 4)).

Type System

Supports a rich type system including:

  • Basic types (Int, Real, Boolean, String, Unit, Any)
  • Complex types (List, Tuple, Maybe)
  • ADT declarations
  • Function types

nom Parser Combinators

The parser extensively uses the nom parser combinator library. Here are the key combinators used:

Basic Combinators

  • tag: Matches exact string patterns
  • char: Matches single characters
  • digit1: Matches one or more digits
  • alpha1: Matches one or more alphabetic characters
  • space0/space1: Matches zero or more/one or more whitespace characters

Sequence Combinators

  • tuple: Combines multiple parsers in sequence
  • preceded: Matches a prefix followed by a value
  • terminated: Matches a value followed by a suffix
  • delimited: Matches a value between two delimiters

Branch Combinators

  • alt: Tries multiple parsers in order
  • map: Transforms the output of a parser
  • opt: Makes a parser optional

Multi Combinators

  • many0/many1: Matches zero or more/one or more occurrences
  • separated_list0: Matches items separated by a delimiter

Example Usage

Here's an example of how the parser handles a simple assignment statement:

x = 42

This is parsed using the following combinators:

fn parse_assignment_statement(input: &str) -> IResult<&str, Statement> {
    map(
        tuple((
            preceded(multispace0, identifier),
            preceded(multispace0, tag("=")),
            preceded(multispace0, parse_expression),
        )),
        |(var, _, expr)| Statement::Assignment(var.to_string(), Box::new(expr)),
    )(input)
}

AST Structure

The parser produces an Abstract Syntax Tree (AST) with the following main types:

Statements

pub enum Statement {
    VarDeclaration(Name),
    ValDeclaration(Name),
    Assignment(Name, Box<Expression>),
    IfThenElse(Box<Expression>, Box<Statement>, Option<Box<Statement>>),
    While(Box<Expression>, Box<Statement>),
    For(Name, Box<Expression>, Box<Statement>),
    Block(Vec<Statement>),
    Assert(Box<Expression>, Box<Expression>),
    FuncDef(Function),
    Return(Box<Expression>),
    ADTDeclaration(Name, Vec<ValueConstructor>),
    // ... other variants
}

Types

pub enum Type {
    TInteger,
    TReal,
    TBool,
    TString,
    TList(Box<Type>),
    TTuple(Vec<Type>),
    TMaybe(Box<Type>),
    TResult(Box<Type>, Box<Type>),
    TFunction(Box<Option<Type>>, Vec<Type>),
    // ... other variants
}

Error Handling

The parser implements error handling through the nom error system:

pub enum ParseError {
    IndentationError(usize),
    UnexpectedToken(String),
    InvalidExpression(String),
}

Testing

The parser includes a comprehensive test suite in tests/parser_tests.rs that verifies:

  • Simple assignments
  • Complex expressions
  • Control flow structures
  • Type annotations
  • Complete programs
  • Error handling
  • Whitespace handling

Documentation Generation Note
This documentation was automatically generated by Claude (Anthropic), an AI assistant, through analysis of the codebase. While the content accurately reflects the implementation, it should be reviewed and maintained by the development team. Last generated: June 2025.