How the code works: a guide for students

This document explains how expression.py parses and evaluates a mathematical expression like a = 10±2. No programming expertise required.

Overview

The program follows three steps to understand and compute an expression:

"10±2 + 5"  →  Tokenize  →  Build RPN  →  Evaluate  →  15.00 ± 2.00

Tokenize: split the text into pieces (tokens)
Build RPN: rearrange the tokens to respect operator priorities (+, *, etc.) into what is called Reverse Polish Notation
Evaluate: compute the result by walking through the rearranged list

It's like reading a sentence: first you identify the words (tokenize), then you understand the grammatical structure (RPN), then you extract the meaning (evaluate).

Step 1: Tokenize (split into tokens)

A token is the smallest meaningful piece of an expression. For example:

"10±2 + 5"  →  [NUMBER(10), PLUSMINUS(±), NUMBER(2), ADD(+), NUMBER(5), END]

Each token has a type (what it is) and text (what was read). The possible types are:

Type	Examples	Description
`NUMBER`	`10`, `3.14`, `1e-3`	A number
`NUMBERS`	`[10, 11, 12]`	A list of numbers
`VAR`	`a`, `pi`, `x`	A variable name
`ADD`, `SUB`, `MUL`, `DIV`	`+`, `-`, `*`, `/`	Basic operators
`PLUSMINUS`	`±`	The uncertainty operator
`ASSIGN`	`=`	Variable assignment
`DEFINE`	`:=`	Definition (lazy)
`FUNCTION`	`sin`, `cos`	Function call
`LEFT_PAR`, `RIGHT_PAR`	`(`, `)`	Parentheses

How it works in the code

The next_token() method tries to match the beginning of the text with regular expressions (regex). A regex is a text pattern. For example, the regex for a number is:

r"(-?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)"

This means: "an optional negative sign, followed by digits, with an optional decimal point and exponent". It matches 10, 3.14, -2, 1.5e-3, etc. Regular expressions are to text what algebra is to mathematics: the best functional description of a textual expression.

The program tries each regex in order. The first one that matches determines the token type. Then it removes the matched text and starts over with the rest, until there's no text left.

"10±2 + 5"
  → match NUMBER "10", remainder: "±2 + 5"
  → match PLUSMINUS "±", remainder: "2 + 5"
  → match NUMBER "2", remainder: " + 5"
  → match ADD "+", remainder: " 5"
  → match NUMBER "5", remainder: ""
  → add END

Step 2: Build RPN (rearrange by priority)

We now have a list of tokens, but in the order we read them (left to right). The problem: 1 + 2 * 3 must give 7 (not 9), because * has priority over +.

The solution: rearrange the tokens into Reverse Polish Notation (RPN). In RPN, operators come after their operands:

Normal:  1 + 2 * 3
RPN:     1, 2, 3, *, +

Why is this useful? Because in RPN, you no longer need priorities or parentheses. You read left to right and apply each operator to the most recent values.

The algorithm

The code uses an operator stack (operator_stack) to sort. The rules are simple:

A number or variable goes directly into the RPN list
An operator is compared with what's on top of the stack:
- If its priority is lower or equal, the top operator goes into the RPN list
- Otherwise, it's pushed onto the stack
An opening parenthesis is pushed onto the stack
A closing parenthesis empties the stack until the opening parenthesis

Example with 1 + 2 * 3:

Token       Action                          RPN          Stack
1           → rpn                           [1]          []
+           → stack (empty)                 [1]          [+]
2           → rpn                           [1, 2]       [+]
*           → stack (* > +, push)           [1, 2]       [+, *]
3           → rpn                           [1, 2, 3]    [+, *]
END         → empty stack                   [1, 2, 3, *, +]  []

The result [1, 2, 3, *, +] reads: "take 2 and 3, multiply (=6), then take 1 and 6, add (=7)".

Priorities in the code

precedence = {
    NUMBER: 0,  VAR: 0,        # values have no priority
    ADD: 5,  SUB: 5,           # + and - have the same priority
    MUL: 6,  DIV: 6,           # * and / are higher priority
    FUNCTION: 7,               # functions are even higher
    PLUSMINUS: 5,              # ± has the same priority as + and -
    ASSIGN: 3,  DEFINE: 3,    # = and := are the lowest priority
}

Step 3: Evaluate (compute the result)

Now that we have the RPN list, the calculation is simple. We use a value stack (eval_stack):

Read each token from left to right
If it's a number: push it onto the stack
If it's a variable: look up its value in the dictionary and push it
If it's an operator: take values from the top of the stack, perform the operation, and push the result back

Example with [1, 2, 3, *, +]:

Token    Action                    Stack
1        push                      [1]
2        push                      [1, 2]
3        push                      [1, 2, 3]
*        pop 3 and 2, push 2*3    [1, 6]
+        pop 6 and 1, push 1+6    [7]

Result: 7. It's the only element left on the stack.

Uncertainty (Monte Carlo)

When the program encounters ±, it doesn't generate a single number. It generates an array of 500,000 random values following a Gaussian distribution:

# Instead of: result = 10
# We do:      result = [9.8, 10.3, 10.1, 9.7, 10.5, ...]  (500,000 values)
distribution = np.random.normal(loc=10, scale=2, size=500000)

Then, all operations are performed on these 500,000 values in parallel (thanks to numpy). At the end, we compute the mean and standard deviation of the resulting array.

The variable dictionary

The program keeps a dictionary global_variables that maps names to values:

global_variables = {
    "pi": 3.14159...,
    "e": 2.71828...,
    "format": 2,
}

When you type a = 10±2, the program:

Evaluates 10±2 (generates an array of 500,000 values)
Stores this array in global_variables["a"]

When you then type a + 5, the program:

Looks up a in the dictionary, finds the array
Adds 5 to each element of the array
Computes the mean and standard deviation of the result

Functions are stored in a separate dictionary functions. When you type f(x) := x*2, the complete Expression object is stored. When you call f(3), the expression is re-evaluated with x=3.

Complete flow summary

User input: "a + b"

1. Tokenize:  [VAR(a), ADD(+), VAR(b), END]

2. Build RPN:  [VAR(a), VAR(b), ADD]

3. Evaluate:
   - VAR(a) → look up in global_variables → array [9.8, 10.3, ...]
   - VAR(b) → look up in global_variables → array [19.5, 20.1, ...]
   - ADD → add the two arrays element by element
   - Result: array [29.3, 30.4, ...]
   - Mean: 30.00, Standard deviation: 2.24

4. Display: "30.00 ± 2.24"

Going further

Look at the tokenize() method to see how regexes are used
Look at build_rpn_list() for the priority sorting algorithm
Look at evaluate() for the stack-based calculation
Try adding a new mathematical function to the functions dictionary
Try to understand how ±% is handled (it's a combination of two tokens merged in the RPN)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How the code works: a guide for students

Overview

Step 1: Tokenize (split into tokens)

How it works in the code

Step 2: Build RPN (rearrange by priority)

The algorithm

Priorities in the code

Step 3: Evaluate (compute the result)

Uncertainty (Monte Carlo)

The variable dictionary

Complete flow summary

Going further

FilesExpand file tree

README-programming.md

Latest commit

History

README-programming.md

File metadata and controls

How the code works: a guide for students

Overview

Step 1: Tokenize (split into tokens)

How it works in the code

Step 2: Build RPN (rearrange by priority)

The algorithm

Priorities in the code

Step 3: Evaluate (compute the result)

Uncertainty (Monte Carlo)

The variable dictionary

Complete flow summary

Going further