Welcome! If you're a Python, JavaScript, or PHP developer curious about how programming languages actually work, you're in the right place.
Absolutely! This guide is specifically written for you. We'll use analogies from languages you already know.
# Python
print("Hello")You write this, hit run, and it works. But what happens behind the scenes?
Your code goes through a pipeline (like a factory assembly line):
Your Code → Tokenizer → Parser → Compiler → Execution
📝 🔤 🧱 🔧 ⚡
Think of it like cooking:
- Recipe (Your Code): "say('Hello')"
- Reading ingredients (Tokenizer): Identify "say", "(", "Hello", ")", ";"
- Understanding steps (Parser): "Oh, this is a print command with a string"
- Preparing (Compiler): Convert to machine-friendly instructions
- Cooking (Execution): Actually run it!
In Python terms: Remember how Python knows that print("hello") has a function name, parentheses, and a string? The tokenizer does that!
Real-world analogy: Like reading a sentence and identifying: verb, noun, punctuation.
Your code:
say("Hello, World!");
What the tokenizer sees:
1. "say" → KEYWORD (like 'print' in Python)
2. "(" → OPENING PARENTHESIS
3. "Hello, World!" → STRING (text in quotes)
4. ")" → CLOSING PARENTHESIS
5. ";" → SEMICOLON (end of statement)
Computers don't understand text. The tokenizer converts your code into labeled pieces that the next stage can process.
JavaScript equivalent:
// When you write this:
console.log("Hi");
// JavaScript internally breaks it into:
// - Identifier: "console"
// - Dot: "."
// - Identifier: "log"
// - Parenthesis: "("
// - String: "Hi"
// - Parenthesis: ")"
// - Semicolon: ";"In Python terms: The parser is like Python checking if your indentation is correct and if if statements have colons.
Real-world analogy: A grammar checker that ensures your sentence makes sense.
Tokens from previous step:
say ( "Hello" ) ;
What the parser does:
✓ Checks: "say" is followed by "("
✓ Checks: Inside parentheses is a valid value
✓ Checks: Statement ends with ";"
✓ Creates: A "SAY instruction" node
Just because you have valid words doesn't mean they form a valid sentence!
Invalid examples:
say "Hello"; // ✗ Missing parentheses
say("Hello" // ✗ Missing closing )
say()Hello"; // ✗ Nonsense order
The parser catches these errors and tells you exactly what's wrong.
In Python terms: When Python reads if x > 5: print(x), it creates a mental model: "IF condition THEN action". That's an AST!
Real-world analogy: A family tree, but for your code.
Your code:
keep x = 10;
when (x > 5) {
say(x);
}
The AST (tree structure):
Program
├── KEEP
│ ├── name: "x"
│ └── value: 10
└── WHEN
├── condition: (x > 5)
└── body: SAY(x)
Trees make it easy to:
- Understand relationships: "This
sayis INSIDE thewhenblock" - Transform code: Optimize, compile, analyze
- Execute in order: Walk the tree from top to bottom
JavaScript equivalent:
// When you write:
if (x > 5) {
console.log(x);
}
// JavaScript creates a similar tree internally:
// IfStatement
// ├── test: BinaryExpression (x > 5)
// └── consequent: CallExpression (console.log)In Python terms: Python compiles your code to .pyc files (bytecode). JECH does the same!
Real-world analogy: Like translating a recipe from English to a numbered list of steps.
Your code:
keep x = 10;
say(x);
Bytecode (simplified instructions):
[0] OP_KEEP name="x" value=10
[1] OP_SAY operand="x"
[2] OP_END
Bytecode is:
- Faster to execute than parsing text every time
- Portable (can be saved and run later)
- Optimizable (can be improved before running)
Python comparison:
# Your Python code
x = 10
print(x)
# Gets compiled to bytecode (you can see it with dis.dis()):
# LOAD_CONST 10
# STORE_NAME x
# LOAD_NAME x
# PRINT_ITEMIn Python terms: The Python VM (CPython) reads bytecode and executes it. JECH has its own mini-VM!
Real-world analogy: A chef following the numbered recipe steps.
Bytecode:
[0] OP_KEEP name="x" value=10
[1] OP_SAY operand="x"
What the VM does:
Step 0: OP_KEEP
→ Create variable "x"
→ Store value 10
→ Variables table: { x: 10 }
Step 1: OP_SAY
→ Look up "x" in variables table
→ Found: 10
→ Print to screen: "10"
The VM is the actual executor. Everything before this was preparation!
say("Hello!");
Step-by-step:
-
Tokenizer breaks it into pieces:
[say] [(] ["Hello!"] [)] [;] -
Parser understands the pattern:
"This is a SAY command with a string argument" -
AST creates a tree node:
SAY_NODE { value: "Hello!" } -
Bytecode generates instructions:
OP_SAY operand="Hello!" -
VM executes:
Print "Hello!" to screen
Output: Hello!
Compiled (like C):
Code → Compiler → Machine Code → Run
(Happens once) (Fast!)
Interpreted (like Python):
Code → Interpreter → Run
(Happens every time) (Flexible!)
JECH (hybrid, like Python):
Code → Tokenizer → Parser → Bytecode → VM
(Compile once) (Run fast!)
Q: "Why is JECH written in C if I don't know C?"
A: C is like the "assembly language of high-level languages":
- Python (CPython) is written in C
- JavaScript (V8) is written in C++
- PHP is written in C
Learning how JECH works teaches you how your favorite languages work internally!
Python (dynamic):
x = 10 # x is a number
x = "hi" # now x is a string - no problem!JECH (currently simple):
keep x = 10;
x = "hi"; // This works too! (for now)
JECH is designed to be simple first, then we can add complexity.
Now that you understand the concepts, dive into the detailed docs:
- Tokenizer Deep Dive - See actual C code with explanations
- Parser Deep Dive - Learn pattern matching
- VM Deep Dive - Understand execution
# Install JECH
git clone <repo>
cd jech
make
# Run the REPL (like Python's interactive mode)
./build/jech
# Try some code!
>>> keep x = 10;
>>> say(x);
10
>>> keep arr = [1, 2, 3];
>>> say(arr[0]);
1No! The concepts are language-agnostic. We explain everything in terms you already know.
Yes! The principles are identical. Python uses:
- Tokenizer (lexer)
- Parser (generates AST)
- Compiler (generates bytecode)
- VM (CPython executes bytecode)
Absolutely! You can:
- Write example programs
- Improve documentation
- Report bugs
- Suggest features
- Write tests
Great! JECH is an excellent C learning project because:
- Small codebase (easy to read)
- Well-documented
- Practical application
- Modular structure
| Term | Python Equivalent | Simple Explanation |
|---|---|---|
| Token | Lexical unit | A labeled piece of code (keyword, number, string) |
| AST | ast.parse() result |
Tree representation of your code |
| Bytecode | .pyc file content |
Simplified instructions for the VM |
| VM | CPython interpreter | The thing that actually runs your code |
| Lexer | Tokenizer | Breaks code into pieces |
| Parser | Syntax checker | Validates code structure |
| Compiler | compile() |
Converts code to bytecode |
| Interpreter | python command |
Runs code directly |
Level 1: Understand the concepts (You are here!)
- Read this guide
- Try the REPL
- Run examples
Level 2: See it in action
- Read Architecture Overview
- Follow code examples
- Trace execution
Level 3: Understand implementation
- Read component docs (tokenizer, parser, VM)
- Look at actual C code
- Understand data structures
Level 4: Contribute
- Write tests
- Add features
- Improve docs
- Help others
Remember: Every expert was once a beginner. Python's creator didn't know everything when he started. Neither did JavaScript's or PHP's creators. You're learning the same fundamentals they used!
Welcome to the world of language design! 🚀