This document explains how expression.py parses and evaluates a mathematical expression like a = 10±2. No programming expertise required.
The program follows three steps to understand and compute an expression:
"10±2 + 5" → Tokenize → Build RPN → Evaluate → 15.00 ± 2.00
- Tokenize: split the text into pieces (tokens)
- Build RPN: rearrange the tokens to respect operator priorities (+, *, etc.) into what is called Reverse Polish Notation
- Evaluate: compute the result by walking through the rearranged list
It's like reading a sentence: first you identify the words (tokenize), then you understand the grammatical structure (RPN), then you extract the meaning (evaluate).
A token is the smallest meaningful piece of an expression. For example:
"10±2 + 5" → [NUMBER(10), PLUSMINUS(±), NUMBER(2), ADD(+), NUMBER(5), END]
Each token has a type (what it is) and text (what was read). The possible types are:
| Type | Examples | Description |
|---|---|---|
NUMBER |
10, 3.14, 1e-3 |
A number |
NUMBERS |
[10, 11, 12] |
A list of numbers |
VAR |
a, pi, x |
A variable name |
ADD, SUB, MUL, DIV |
+, -, *, / |
Basic operators |
PLUSMINUS |
± |
The uncertainty operator |
ASSIGN |
= |
Variable assignment |
DEFINE |
:= |
Definition (lazy) |
FUNCTION |
sin, cos |
Function call |
LEFT_PAR, RIGHT_PAR |
(, ) |
Parentheses |
The next_token() method tries to match the beginning of the text with regular expressions (regex). A regex is a text pattern. For example, the regex for a number is:
r"(-?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)"This means: "an optional negative sign, followed by digits, with an optional decimal point and exponent". It matches 10, 3.14, -2, 1.5e-3, etc. Regular expressions are to text what algebra is to mathematics: the best functional description of a textual expression.
The program tries each regex in order. The first one that matches determines the token type. Then it removes the matched text and starts over with the rest, until there's no text left.
"10±2 + 5"
→ match NUMBER "10", remainder: "±2 + 5"
→ match PLUSMINUS "±", remainder: "2 + 5"
→ match NUMBER "2", remainder: " + 5"
→ match ADD "+", remainder: " 5"
→ match NUMBER "5", remainder: ""
→ add ENDWe now have a list of tokens, but in the order we read them (left to right). The problem: 1 + 2 * 3 must give 7 (not 9), because * has priority over +.
The solution: rearrange the tokens into Reverse Polish Notation (RPN). In RPN, operators come after their operands:
Normal: 1 + 2 * 3
RPN: 1, 2, 3, *, +
Why is this useful? Because in RPN, you no longer need priorities or parentheses. You read left to right and apply each operator to the most recent values.
The code uses an operator stack (operator_stack) to sort. The rules are simple:
- A number or variable goes directly into the RPN list
- An operator is compared with what's on top of the stack:
- If its priority is lower or equal, the top operator goes into the RPN list
- Otherwise, it's pushed onto the stack
- An opening parenthesis is pushed onto the stack
- A closing parenthesis empties the stack until the opening parenthesis
Example with 1 + 2 * 3:
Token Action RPN Stack
1 → rpn [1] []
+ → stack (empty) [1] [+]
2 → rpn [1, 2] [+]
* → stack (* > +, push) [1, 2] [+, *]
3 → rpn [1, 2, 3] [+, *]
END → empty stack [1, 2, 3, *, +] []
The result [1, 2, 3, *, +] reads: "take 2 and 3, multiply (=6), then take 1 and 6, add (=7)".
precedence = {
NUMBER: 0, VAR: 0, # values have no priority
ADD: 5, SUB: 5, # + and - have the same priority
MUL: 6, DIV: 6, # * and / are higher priority
FUNCTION: 7, # functions are even higher
PLUSMINUS: 5, # ± has the same priority as + and -
ASSIGN: 3, DEFINE: 3, # = and := are the lowest priority
}Now that we have the RPN list, the calculation is simple. We use a value stack (eval_stack):
- Read each token from left to right
- If it's a number: push it onto the stack
- If it's a variable: look up its value in the dictionary and push it
- If it's an operator: take values from the top of the stack, perform the operation, and push the result back
Example with [1, 2, 3, *, +]:
Token Action Stack
1 push [1]
2 push [1, 2]
3 push [1, 2, 3]
* pop 3 and 2, push 2*3 [1, 6]
+ pop 6 and 1, push 1+6 [7]
Result: 7. It's the only element left on the stack.
When the program encounters ±, it doesn't generate a single number. It generates an array of 500,000 random values following a Gaussian distribution:
# Instead of: result = 10
# We do: result = [9.8, 10.3, 10.1, 9.7, 10.5, ...] (500,000 values)
distribution = np.random.normal(loc=10, scale=2, size=500000)Then, all operations are performed on these 500,000 values in parallel (thanks to numpy). At the end, we compute the mean and standard deviation of the resulting array.
The program keeps a dictionary global_variables that maps names to values:
global_variables = {
"pi": 3.14159...,
"e": 2.71828...,
"format": 2,
}When you type a = 10±2, the program:
- Evaluates
10±2(generates an array of 500,000 values) - Stores this array in
global_variables["a"]
When you then type a + 5, the program:
- Looks up
ain the dictionary, finds the array - Adds 5 to each element of the array
- Computes the mean and standard deviation of the result
Functions are stored in a separate dictionary functions. When you type f(x) := x*2, the complete Expression object is stored. When you call f(3), the expression is re-evaluated with x=3.
User input: "a + b"
1. Tokenize: [VAR(a), ADD(+), VAR(b), END]
2. Build RPN: [VAR(a), VAR(b), ADD]
3. Evaluate:
- VAR(a) → look up in global_variables → array [9.8, 10.3, ...]
- VAR(b) → look up in global_variables → array [19.5, 20.1, ...]
- ADD → add the two arrays element by element
- Result: array [29.3, 30.4, ...]
- Mean: 30.00, Standard deviation: 2.24
4. Display: "30.00 ± 2.24"
- Look at the
tokenize()method to see how regexes are used - Look at
build_rpn_list()for the priority sorting algorithm - Look at
evaluate()for the stack-based calculation - Try adding a new mathematical function to the
functionsdictionary - Try to understand how
±%is handled (it's a combination of two tokens merged in the RPN)