Skip to content

[experiment] python re-implement#179

Draft
AlexandreYang wants to merge 14 commits intomainfrom
alex/python_re_implement
Draft

[experiment] python re-implement#179
AlexandreYang wants to merge 14 commits intomainfrom
alex/python_re_implement

Conversation

@AlexandreYang
Copy link
Copy Markdown
Member

What does this PR do?

Motivation

Testing

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)

AlexandreYang and others added 4 commits April 11, 2026 21:44
Adds a `python` builtin command that executes Python 3.4 source code
using the gpython pure-Go interpreter — no CPython installation required.

Usage: python [-c CODE] [-h] [SCRIPT | -] [ARG ...]

Security sandbox (enforced in builtins/internal/pyruntime/):
- os.system, os.popen, all exec/spawn/fork/write/delete functions removed
- open() replaced with read-only AllowedPaths-aware version; write/append
  modes raise PermissionError
- tempfile and glob modules neutered (functions removed)
- sys.exit() exit code propagated via closure variable before VM wraps error
- Source and file reads bounded at 1 MiB
- Context cancellation respected (goroutine + select on ctx.Done())

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…builtins, and keywords

Adds 61 new scenario tests across 10 categories to improve coverage of
the python builtin's gpython (3.4) interpreter:

- keywords: pass, del, assert, global, nonlocal, in/not-in, is/is-not, break, continue
- comprehensions: list, filtered list, dict, set, generator expression, nested
- generators: basic yield, generator.send(), yield from, StopIteration
- lambdas: basic, sorted key, map
- builtins: len, range, enumerate, zip, map, filter, sorted, all/any, min/max,
  sum, chr/ord, bin/hex/oct, isinstance, type constructors, repr,
  print kwargs, getattr/setattr/hasattr, abs/divmod/pow
- exceptions: try/finally, try/except/finally, bare raise, raise from,
  multiple except handlers
- operators: bitwise, augmented assignment, chained comparisons, ternary, boolean short-circuit
- data_structures: tuple unpacking, extended unpacking, set operations,
  string format (%), string methods
- functions: default args, *args, **kwargs
- os_module: os.getcwd(), os.environ

Tests account for gpython v0.2.0 limitations: no str.format(), no str.lower/upper,
no len(bytes), no frozenset(), no classmethod/staticmethod, no closures (free
variable capture without nonlocal), no integer dict keys, no enumerate(start=).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Copy Markdown
Member Author

Plan: Replace gpython with a Custom Pure-Go Python Interpreter

Context

The python builtin currently uses github.com/go-python/gpython — a pure-Go Python 3.4 interpreter — as its execution engine. The goal is to remove that dependency entirely and rewrite the interpreter layer from scratch in pure Go, while keeping the same user-facing interface (python -c, script files, stdin) and the same security sandbox (blocked dangerous ops, read-only open(), AllowedPaths enforcement).

The existing test suite has 129 scenario tests and a Go unit test suite that together validate:

  • Full Python 3 syntax: classes, generators, try/except, comprehensions, lambdas
  • stdlib modules: sys, math, os (read-only), binascii, string, time
  • Security sandbox: os.system/popen/exec blocked, write-mode open blocked, tempfile/glob ImportError
  • I/O: sandboxed open(), readline(), readlines(), with-statement, stdin/stdout/stderr
  • Error propagation: SyntaxError, RuntimeError, sys.exit(N), tracebacks to stderr

Recommended Approach: Custom Pure-Go Python Interpreter

Why not Starlark (github.com/google/starlark-go):

  • No class keyword (breaks ~30 tests: classes, inheritance, data structures)
  • No try/except (breaks ~20 error-handling tests)
  • No yield/generators (breaks 4 tests)
  • No import X syntax (breaks ALL tests that use sys, math, os, etc.)
  • Would require removing >40% of the existing test suite

Why a custom interpreter is the right call:

  • Preserves nearly all 129 existing tests unchanged
  • Gives full control over the security sandbox
  • Removes all external Python-related dependencies
  • The user explicitly said "the re-implementation can be a complex endeavour, this is fine, do it!"

Implementation Scope

The interpreter implements the Python 3 subset actually used by the tests. Out-of-scope: decorators, multiple-inheritance MRO edge cases, async/await, metaclasses, the full CPython stdlib.

Files to create (in builtins/internal/pyruntime/)

File Description Est. lines
lexer.go Tokenizer: keywords, operators, indentation (INDENT/DEDENT), string/number literals ~600
ast.go AST node types for all statements and expressions ~400
parser.go Recursive-descent parser: all statement forms, operator precedence ~1800
types.go Python object model: Int, Float, Str, Bytes, Bool, None, List, Dict, Set, Tuple, Function, Class, Instance, Generator, Exception ~1000
eval.go Tree-walking evaluator: scopes, exceptions, generators, context managers ~2500
builtins_py.go Built-in functions: print, len, range, enumerate, zip, sorted, map, filter, sum, min, max, abs, chr, ord, bin, hex, oct, type, isinstance, repr, str, int, float, bool, list, dict, set, tuple, open (sandboxed) ~700
modules.go Standard modules: sys (argv, exit, stdin, stdout, stderr), math, os (read-only subset), string, binascii, time ~600
sandbox.go Security layer: blocked os functions, write-mode open rejection, blocked imports (tempfile, glob) ~300
pyruntime.go Entry point: Run(ctx, RunOpts) int, same signature as current ~200

Total estimate: ~8,100 lines (the current pyruntime.go is ~700 lines using gpython; the difference is the interpreter itself)

Files to modify

File Change
builtins/internal/pyruntime/pyruntime.go Replace entirely (same Run() API, new pure-Go implementation)
builtins/python/python.go No change — already delegates to pyruntime.Run()
go.mod Remove github.com/go-python/gpython entry
go.sum Remove gpython hashes (go mod tidy)
SHELL_FEATURES.md Update Python description (remove "gpython", "Python 3.4")
builtins/tests/python/python_fuzz_test.go Update gpython-specific comment in FuzzPythonSource
analysis/symbols_builtins_test.go Update any gpython-specific allowlist entries

Python features to implement

Core language:

  • All literals: int (decimal, hex, octal, binary), float, complex, string (all quote forms + raw), bytes, bool, None, ellipsis
  • Operators: arithmetic, bitwise, comparison, boolean, in/not in, is/is not
  • Assignments: simple, augmented (+=, etc.), tuple unpacking, starred assignment (a, *b, c = ...)
  • Statements: if/elif/else, for/in, while, break, continue, pass, del, return, yield, yield from, raise, try/except/else/finally, with, assert, global, nonlocal, import, from-import, class, def
  • Comprehensions: list, dict, set, generator expression
  • Lambda expressions
  • Slicing (a[1:3:2])
  • Attribute access, subscript
  • Starred calls (*args, **kwargs)
  • Class definitions: single inheritance, __init__, methods, __str__, __repr__, __enter__/__exit__, __iter__/__next__
  • Generators: yield, yield from, send(), StopIteration

Exception handling:

  • All standard exception types: BaseException, Exception, ValueError, TypeError, KeyError, IndexError, AttributeError, NameError, ZeroDivisionError, IOError, OSError, FileNotFoundError, PermissionError, StopIteration, RuntimeError, ImportError, MemoryError, SystemExit, SyntaxError, AssertionError, NotImplementedError
  • Exception chaining
  • Custom exception classes (subclassing Exception)

Built-in functions (28+): print, len, range, enumerate, zip, sorted, reversed, map, filter, sum, min, max, abs, chr, ord, bin, hex, oct, type, isinstance, issubclass, repr, str, int, float, bool, list, dict, set, tuple, open (sandboxed), hash, id, iter, next, callable, getattr, setattr, hasattr, delattr, dir, vars, all, any, round, divmod, pow

Modules:

  • sys: argv, exit, stdin, stdout, stderr, version, platform, path (empty)
  • math: floor, ceil, sqrt, log, log2, log10, exp, sin, cos, tan, asin, acos, atan, atan2, pi, e, inf, nan, fabs, factorial, gcd, isnan, isinf, isfinite, degrees, radians, hypot
  • os: listdir (AllowedPaths), getcwd, path.join, path.dirname, path.basename, path.exists, path.isfile, path.isdir, path.splitext, getenv, environ (read-only), sep, linesep, devnull
  • string: whitespace, ascii_letters, ascii_lowercase, ascii_uppercase, digits, hexdigits, octdigits, printable, punctuation
  • binascii: hexlify, unhexlify, b2a_hex, a2b_hex
  • time: time, sleep (limited), monotonic

Security sandbox (same as current):

  • Blocked os functions: system, popen, remove, unlink, mkdir, makedirs, rmdir, removedirs, rename, renames, replace, link, symlink, chmod, chown, chroot, execl/le/lp/lpe, execv/ve/vp/vpe, _exit, fork, forkpty, kill, killpg, popen2-4, spawnl/le/lp/lpe/v/ve/vp/vpe, startfile, truncate, write, putenv, unsetenv, walk (removed)
  • open() write modes rejected (PermissionError)
  • tempfile and glob imports blocked (ImportError)
  • File reads capped at 1 MiB
  • Source code capped at 1 MiB

Test strategy

  • Existing 129 scenario tests: should all pass unchanged
  • Existing Go unit tests: update only the gpython-specific comment in FuzzPythonSource
  • The interpreter runs in a goroutine; select on ctx.Done() for context cancellation
  • Memory limits: maxSourceBytes = 1 MiB, maxReadBytes = 1 MiB per file.read() call
  • Traceback format: Traceback (most recent call last):\n File "name", line N\nExceptionType: msg

Verification

make fmt
go build ./...
go test ./builtins/... ./tests/... -timeout 120s
RSHELL_BASH_TEST=1 go test ./tests/ -run TestShellScenariosAgainstBash -timeout 120s  # skip; scenarios have skip_assert_against_bash: true
go run ./cmd/rshell --allow-all-commands -c 'python -c "print(\"hello\")"'
go run ./cmd/rshell --allow-all-commands -c 'help' | grep python

…eter

Replace github.com/go-python/gpython with a from-scratch Python 3
tree-walking interpreter (~12,000 lines) implemented across modular files
under builtins/internal/pyruntime/:

- ast.go: full AST node type definitions for Python 3 statements/expressions
- lexer.go: tokenizer with indent/dedent, string literals, number literals
- parser.go: recursive-descent parser covering the complete Python 3 grammar
  subset needed by the test suite
- types.go: Python object system (int, float, str, bytes, list, tuple, dict,
  set, class/instance, generator, exception hierarchy, module, file, scope)
- eval.go: tree-walking evaluator with generators via goroutine+channel,
  exception handling via Go panic/recover, context cancellation support,
  class definition with C3 MRO, closures, comprehensions, yield/yield from
- builtins_funcs.go: ~45 built-in functions (print, len, range, zip, map,
  filter, sorted, isinstance, type constructors, open, super, etc.)
- modules.go: module registry with sys, math, os (read-only), binascii,
  string; blocked modules (tempfile, glob, subprocess, socket, ctypes)

Remove github.com/go-python/gpython from go.mod (go mod tidy).
Update analysis symbol allowlists for new implementation.
All 40+ test packages pass including 129 Python scenario tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlexandreYang and others added 3 commits April 12, 2026 01:22
- Format analysis/symbols_internal.go and remove unused symbols from
  internalAllowedSymbols (bufio.Scanner, bytes.SplitAfter,
  hash/crc32.ChecksumIEEE, math.MaxFloat64, math.Round, math/big.NewFloat,
  unicode.Is{Space,Title,Upper}, unicode.To{Lower,Title,Upper},
  unicode/utf8.RuneError) that are not used by any builtins/internal file.
- Add missing copyright headers to pyruntime/parse_test.go and
  pyruntime/smoke_test.go.
- Fix data race in pyruntime.Run(): after ctx.Done() fires, wait for the
  goroutine running runInternal to finish before returning. Without this,
  the goroutine's defer (printTraceback → fmt.Fprintf to opts.Stderr) races
  with the caller reading opts.Stderr in the test. The evaluator checks
  ctx.Done() at each loop iteration so the goroutine terminates promptly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Block host env access:
- Remove os.Environ() and os.LookupEnv() from the Python os module.
  os.environ is now an empty dict and os.getenv() always returns its
  default argument. Python scripts must not be able to read process
  environment variables (API keys, tokens, etc.).
- Drop os.Environ and os.LookupEnv from the pyruntime symbol allowlists.
- Update scenario tests to verify PATH and other real env vars are
  invisible, and that os.environ is empty (len == 0).

Fix callObject data race:
- Replace the package-level callObject function variable with a
  goroutine-keyed sync.Map (goroutineCallFns). Each Python execution
  registers its evaluator's callObject at goroutine start and
  deregisters on return, so concurrent executions never share a
  function pointer. Previously, two parallel Python scenarios would
  race on the write at newEvaluator():50, causing test failures under
  -race. goroutineID() reads the goroutine number from runtime.Stack.
- Add runtime.Stack and sync.Map to the pyruntime symbol allowlists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…andbox

Python's os.listdir, os.path.exists, os.path.isfile, and os.path.isdir
were calling os.ReadDir/os.Stat directly, bypassing the AllowedPaths
sandbox. Route them through new Stat/ReadDir callbacks on RunOpts, wired
to callCtx.StatFile/callCtx.ReadDir in the python builtin.

Also remove os.Environ/os.LookupEnv from the symbol allowlist (removed
in prior commit) and add io/fs.FileInfo + io/fs.DirEntry in their place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlexandreYang and others added 3 commits April 12, 2026 01:47
…CodeQL

- lexer.go: guard rune() cast with unicode.MaxRune check for \U escapes
- parser.go: guard int64() cast with math.MaxInt64 check for uint64 literals;
  values exceeding int64 range now fall through to the big.Int path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ath leakage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…axInt64/unicode.MaxRune

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang AlexandreYang added the verified/analysis Human-reviewed static analysis changes label Apr 11, 2026
AlexandreYang and others added 3 commits April 12, 2026 01:56
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o builtins/python

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
"strings.TrimSpace", // 🟢 removes leading/trailing whitespace; pure function.
},
"python": {
"bufio.NewReader", // 🟢 wraps an io.Reader with buffering for readline support; no write capability.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: create symbols_python_builtins.go checks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified/analysis Human-reviewed static analysis changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants