Skip to content

Latest commit

 

History

History
198 lines (143 loc) · 7.49 KB

File metadata and controls

198 lines (143 loc) · 7.49 KB

Version française

How the code works: a guide for students

This document explains how expression.py parses and evaluates a mathematical expression like a = 10±2. No programming expertise required.

Overview

The program follows three steps to understand and compute an expression:

"10±2 + 5"  →  Tokenize  →  Build RPN  →  Evaluate  →  15.00 ± 2.00
  1. Tokenize: split the text into pieces (tokens)
  2. Build RPN: rearrange the tokens to respect operator priorities (+, *, etc.) into what is called Reverse Polish Notation
  3. Evaluate: compute the result by walking through the rearranged list

It's like reading a sentence: first you identify the words (tokenize), then you understand the grammatical structure (RPN), then you extract the meaning (evaluate).

Step 1: Tokenize (split into tokens)

A token is the smallest meaningful piece of an expression. For example:

"10±2 + 5"  →  [NUMBER(10), PLUSMINUS(±), NUMBER(2), ADD(+), NUMBER(5), END]

Each token has a type (what it is) and text (what was read). The possible types are:

Type Examples Description
NUMBER 10, 3.14, 1e-3 A number
NUMBERS [10, 11, 12] A list of numbers
VAR a, pi, x A variable name
ADD, SUB, MUL, DIV +, -, *, / Basic operators
PLUSMINUS ± The uncertainty operator
ASSIGN = Variable assignment
DEFINE := Definition (lazy)
FUNCTION sin, cos Function call
LEFT_PAR, RIGHT_PAR (, ) Parentheses

How it works in the code

The next_token() method tries to match the beginning of the text with regular expressions (regex). A regex is a text pattern. For example, the regex for a number is:

r"(-?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)"

This means: "an optional negative sign, followed by digits, with an optional decimal point and exponent". It matches 10, 3.14, -2, 1.5e-3, etc. Regular expressions are to text what algebra is to mathematics: the best functional description of a textual expression.

The program tries each regex in order. The first one that matches determines the token type. Then it removes the matched text and starts over with the rest, until there's no text left.

"10±2 + 5"match NUMBER "10", remainder: "±2 + 5"match PLUSMINUS "±", remainder: "2 + 5"match NUMBER "2", remainder: " + 5"match ADD "+", remainder: " 5"match NUMBER "5", remainder: ""add END

Step 2: Build RPN (rearrange by priority)

We now have a list of tokens, but in the order we read them (left to right). The problem: 1 + 2 * 3 must give 7 (not 9), because * has priority over +.

The solution: rearrange the tokens into Reverse Polish Notation (RPN). In RPN, operators come after their operands:

Normal:  1 + 2 * 3
RPN:     1, 2, 3, *, +

Why is this useful? Because in RPN, you no longer need priorities or parentheses. You read left to right and apply each operator to the most recent values.

The algorithm

The code uses an operator stack (operator_stack) to sort. The rules are simple:

  • A number or variable goes directly into the RPN list
  • An operator is compared with what's on top of the stack:
    • If its priority is lower or equal, the top operator goes into the RPN list
    • Otherwise, it's pushed onto the stack
  • An opening parenthesis is pushed onto the stack
  • A closing parenthesis empties the stack until the opening parenthesis

Example with 1 + 2 * 3:

Token       Action                          RPN          Stack
1           → rpn                           [1]          []
+           → stack (empty)                 [1]          [+]
2           → rpn                           [1, 2]       [+]
*           → stack (* > +, push)           [1, 2]       [+, *]
3           → rpn                           [1, 2, 3]    [+, *]
END         → empty stack                   [1, 2, 3, *, +]  []

The result [1, 2, 3, *, +] reads: "take 2 and 3, multiply (=6), then take 1 and 6, add (=7)".

Priorities in the code

precedence = {
    NUMBER: 0,  VAR: 0,        # values have no priority
    ADD: 5,  SUB: 5,           # + and - have the same priority
    MUL: 6,  DIV: 6,           # * and / are higher priority
    FUNCTION: 7,               # functions are even higher
    PLUSMINUS: 5,              # ± has the same priority as + and -
    ASSIGN: 3,  DEFINE: 3,    # = and := are the lowest priority
}

Step 3: Evaluate (compute the result)

Now that we have the RPN list, the calculation is simple. We use a value stack (eval_stack):

  • Read each token from left to right
  • If it's a number: push it onto the stack
  • If it's a variable: look up its value in the dictionary and push it
  • If it's an operator: take values from the top of the stack, perform the operation, and push the result back

Example with [1, 2, 3, *, +]:

Token    Action                    Stack
1        push                      [1]
2        push                      [1, 2]
3        push                      [1, 2, 3]
*        pop 3 and 2, push 2*3    [1, 6]
+        pop 6 and 1, push 1+6    [7]

Result: 7. It's the only element left on the stack.

Uncertainty (Monte Carlo)

When the program encounters ±, it doesn't generate a single number. It generates an array of 500,000 random values following a Gaussian distribution:

# Instead of: result = 10
# We do:      result = [9.8, 10.3, 10.1, 9.7, 10.5, ...]  (500,000 values)
distribution = np.random.normal(loc=10, scale=2, size=500000)

Then, all operations are performed on these 500,000 values in parallel (thanks to numpy). At the end, we compute the mean and standard deviation of the resulting array.

The variable dictionary

The program keeps a dictionary global_variables that maps names to values:

global_variables = {
    "pi": 3.14159...,
    "e": 2.71828...,
    "format": 2,
}

When you type a = 10±2, the program:

  1. Evaluates 10±2 (generates an array of 500,000 values)
  2. Stores this array in global_variables["a"]

When you then type a + 5, the program:

  1. Looks up a in the dictionary, finds the array
  2. Adds 5 to each element of the array
  3. Computes the mean and standard deviation of the result

Functions are stored in a separate dictionary functions. When you type f(x) := x*2, the complete Expression object is stored. When you call f(3), the expression is re-evaluated with x=3.

Complete flow summary

User input: "a + b"

1. Tokenize:  [VAR(a), ADD(+), VAR(b), END]

2. Build RPN:  [VAR(a), VAR(b), ADD]

3. Evaluate:
   - VAR(a) → look up in global_variables → array [9.8, 10.3, ...]
   - VAR(b) → look up in global_variables → array [19.5, 20.1, ...]
   - ADD → add the two arrays element by element
   - Result: array [29.3, 30.4, ...]
   - Mean: 30.00, Standard deviation: 2.24

4. Display: "30.00 ± 2.24"

Going further

  • Look at the tokenize() method to see how regexes are used
  • Look at build_rpn_list() for the priority sorting algorithm
  • Look at evaluate() for the stack-based calculation
  • Try adding a new mathematical function to the functions dictionary
  • Try to understand how ±% is handled (it's a combination of two tokens merged in the RPN)