[experiment] python re-implement by AlexandreYang · Pull Request #179 · DataDog/rshell

AlexandreYang · 2026-04-11T20:54:50Z

What does this PR do?

Motivation

Testing

Checklist

Tests added/updated
Documentation updated (if applicable)

Adds a `python` builtin command that executes Python 3.4 source code using the gpython pure-Go interpreter — no CPython installation required. Usage: python [-c CODE] [-h] [SCRIPT | -] [ARG ...] Security sandbox (enforced in builtins/internal/pyruntime/): - os.system, os.popen, all exec/spawn/fork/write/delete functions removed - open() replaced with read-only AllowedPaths-aware version; write/append modes raise PermissionError - tempfile and glob modules neutered (functions removed) - sys.exit() exit code propagated via closure variable before VM wraps error - Source and file reads bounded at 1 MiB - Context cancellation respected (goroutine + select on ctx.Done()) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…builtins, and keywords Adds 61 new scenario tests across 10 categories to improve coverage of the python builtin's gpython (3.4) interpreter: - keywords: pass, del, assert, global, nonlocal, in/not-in, is/is-not, break, continue - comprehensions: list, filtered list, dict, set, generator expression, nested - generators: basic yield, generator.send(), yield from, StopIteration - lambdas: basic, sorted key, map - builtins: len, range, enumerate, zip, map, filter, sorted, all/any, min/max, sum, chr/ord, bin/hex/oct, isinstance, type constructors, repr, print kwargs, getattr/setattr/hasattr, abs/divmod/pow - exceptions: try/finally, try/except/finally, bare raise, raise from, multiple except handlers - operators: bitwise, augmented assignment, chained comparisons, ternary, boolean short-circuit - data_structures: tuple unpacking, extended unpacking, set operations, string format (%), string methods - functions: default args, *args, **kwargs - os_module: os.getcwd(), os.environ Tests account for gpython v0.2.0 limitations: no str.format(), no str.lower/upper, no len(bytes), no frozenset(), no classmethod/staticmethod, no closures (free variable capture without nonlocal), no integer dict keys, no enumerate(start=). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AlexandreYang · 2026-04-11T21:12:02Z

Plan: Replace gpython with a Custom Pure-Go Python Interpreter

Context

The python builtin currently uses github.com/go-python/gpython — a pure-Go Python 3.4 interpreter — as its execution engine. The goal is to remove that dependency entirely and rewrite the interpreter layer from scratch in pure Go, while keeping the same user-facing interface (python -c, script files, stdin) and the same security sandbox (blocked dangerous ops, read-only open(), AllowedPaths enforcement).

The existing test suite has 129 scenario tests and a Go unit test suite that together validate:

Full Python 3 syntax: classes, generators, try/except, comprehensions, lambdas
stdlib modules: sys, math, os (read-only), binascii, string, time
Security sandbox: os.system/popen/exec blocked, write-mode open blocked, tempfile/glob ImportError
I/O: sandboxed open(), readline(), readlines(), with-statement, stdin/stdout/stderr
Error propagation: SyntaxError, RuntimeError, sys.exit(N), tracebacks to stderr

Recommended Approach: Custom Pure-Go Python Interpreter

Why not Starlark (github.com/google/starlark-go):

No class keyword (breaks ~30 tests: classes, inheritance, data structures)
No try/except (breaks ~20 error-handling tests)
No yield/generators (breaks 4 tests)
No import X syntax (breaks ALL tests that use sys, math, os, etc.)
Would require removing >40% of the existing test suite

Why a custom interpreter is the right call:

Preserves nearly all 129 existing tests unchanged
Gives full control over the security sandbox
Removes all external Python-related dependencies
The user explicitly said "the re-implementation can be a complex endeavour, this is fine, do it!"

Implementation Scope

The interpreter implements the Python 3 subset actually used by the tests. Out-of-scope: decorators, multiple-inheritance MRO edge cases, async/await, metaclasses, the full CPython stdlib.

Files to create (in `builtins/internal/pyruntime/`)

File	Description	Est. lines
`lexer.go`	Tokenizer: keywords, operators, indentation (INDENT/DEDENT), string/number literals	~600
`ast.go`	AST node types for all statements and expressions	~400
`parser.go`	Recursive-descent parser: all statement forms, operator precedence	~1800
`types.go`	Python object model: Int, Float, Str, Bytes, Bool, None, List, Dict, Set, Tuple, Function, Class, Instance, Generator, Exception	~1000
`eval.go`	Tree-walking evaluator: scopes, exceptions, generators, context managers	~2500
`builtins_py.go`	Built-in functions: print, len, range, enumerate, zip, sorted, map, filter, sum, min, max, abs, chr, ord, bin, hex, oct, type, isinstance, repr, str, int, float, bool, list, dict, set, tuple, open (sandboxed)	~700
`modules.go`	Standard modules: sys (argv, exit, stdin, stdout, stderr), math, os (read-only subset), string, binascii, time	~600
`sandbox.go`	Security layer: blocked os functions, write-mode open rejection, blocked imports (tempfile, glob)	~300
`pyruntime.go`	Entry point: `Run(ctx, RunOpts) int`, same signature as current	~200

Total estimate: ~8,100 lines (the current pyruntime.go is ~700 lines using gpython; the difference is the interpreter itself)

Files to modify

File	Change
`builtins/internal/pyruntime/pyruntime.go`	Replace entirely (same `Run()` API, new pure-Go implementation)
`builtins/python/python.go`	No change — already delegates to `pyruntime.Run()`
`go.mod`	Remove `github.com/go-python/gpython` entry
`go.sum`	Remove gpython hashes (`go mod tidy`)
`SHELL_FEATURES.md`	Update Python description (remove "gpython", "Python 3.4")
`builtins/tests/python/python_fuzz_test.go`	Update gpython-specific comment in `FuzzPythonSource`
`analysis/symbols_builtins_test.go`	Update any gpython-specific allowlist entries

Python features to implement

Core language:

All literals: int (decimal, hex, octal, binary), float, complex, string (all quote forms + raw), bytes, bool, None, ellipsis
Operators: arithmetic, bitwise, comparison, boolean, in/not in, is/is not
Assignments: simple, augmented (+=, etc.), tuple unpacking, starred assignment (a, *b, c = ...)
Statements: if/elif/else, for/in, while, break, continue, pass, del, return, yield, yield from, raise, try/except/else/finally, with, assert, global, nonlocal, import, from-import, class, def
Comprehensions: list, dict, set, generator expression
Lambda expressions
Slicing (a[1:3:2])
Attribute access, subscript
Starred calls (*args, **kwargs)
Class definitions: single inheritance, __init__, methods, __str__, __repr__, __enter__/__exit__, __iter__/__next__
Generators: yield, yield from, send(), StopIteration

Exception handling:

All standard exception types: BaseException, Exception, ValueError, TypeError, KeyError, IndexError, AttributeError, NameError, ZeroDivisionError, IOError, OSError, FileNotFoundError, PermissionError, StopIteration, RuntimeError, ImportError, MemoryError, SystemExit, SyntaxError, AssertionError, NotImplementedError
Exception chaining
Custom exception classes (subclassing Exception)

Built-in functions (28+): print, len, range, enumerate, zip, sorted, reversed, map, filter, sum, min, max, abs, chr, ord, bin, hex, oct, type, isinstance, issubclass, repr, str, int, float, bool, list, dict, set, tuple, open (sandboxed), hash, id, iter, next, callable, getattr, setattr, hasattr, delattr, dir, vars, all, any, round, divmod, pow

Modules:

sys: argv, exit, stdin, stdout, stderr, version, platform, path (empty)
math: floor, ceil, sqrt, log, log2, log10, exp, sin, cos, tan, asin, acos, atan, atan2, pi, e, inf, nan, fabs, factorial, gcd, isnan, isinf, isfinite, degrees, radians, hypot
os: listdir (AllowedPaths), getcwd, path.join, path.dirname, path.basename, path.exists, path.isfile, path.isdir, path.splitext, getenv, environ (read-only), sep, linesep, devnull
string: whitespace, ascii_letters, ascii_lowercase, ascii_uppercase, digits, hexdigits, octdigits, printable, punctuation
binascii: hexlify, unhexlify, b2a_hex, a2b_hex
time: time, sleep (limited), monotonic

Security sandbox (same as current):

Blocked os functions: system, popen, remove, unlink, mkdir, makedirs, rmdir, removedirs, rename, renames, replace, link, symlink, chmod, chown, chroot, execl/le/lp/lpe, execv/ve/vp/vpe, _exit, fork, forkpty, kill, killpg, popen2-4, spawnl/le/lp/lpe/v/ve/vp/vpe, startfile, truncate, write, putenv, unsetenv, walk (removed)
open() write modes rejected (PermissionError)
tempfile and glob imports blocked (ImportError)
File reads capped at 1 MiB
Source code capped at 1 MiB

Test strategy

Existing 129 scenario tests: should all pass unchanged
Existing Go unit tests: update only the gpython-specific comment in FuzzPythonSource
The interpreter runs in a goroutine; select on ctx.Done() for context cancellation
Memory limits: maxSourceBytes = 1 MiB, maxReadBytes = 1 MiB per file.read() call
Traceback format: Traceback (most recent call last):\n File "name", line N\nExceptionType: msg

Verification

make fmt
go build ./...
go test ./builtins/... ./tests/... -timeout 120s
RSHELL_BASH_TEST=1 go test ./tests/ -run TestShellScenariosAgainstBash -timeout 120s  # skip; scenarios have skip_assert_against_bash: true
go run ./cmd/rshell --allow-all-commands -c 'python -c "print(\"hello\")"'
go run ./cmd/rshell --allow-all-commands -c 'help' | grep python

…eter Replace github.com/go-python/gpython with a from-scratch Python 3 tree-walking interpreter (~12,000 lines) implemented across modular files under builtins/internal/pyruntime/: - ast.go: full AST node type definitions for Python 3 statements/expressions - lexer.go: tokenizer with indent/dedent, string literals, number literals - parser.go: recursive-descent parser covering the complete Python 3 grammar subset needed by the test suite - types.go: Python object system (int, float, str, bytes, list, tuple, dict, set, class/instance, generator, exception hierarchy, module, file, scope) - eval.go: tree-walking evaluator with generators via goroutine+channel, exception handling via Go panic/recover, context cancellation support, class definition with C3 MRO, closures, comprehensions, yield/yield from - builtins_funcs.go: ~45 built-in functions (print, len, range, zip, map, filter, sorted, isinstance, type constructors, open, super, etc.) - modules.go: module registry with sys, math, os (read-only), binascii, string; blocked modules (tempfile, glob, subprocess, socket, ctypes) Remove github.com/go-python/gpython from go.mod (go mod tidy). Update analysis symbol allowlists for new implementation. All 40+ test packages pass including 129 Python scenario tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

builtins/python/lexer.go

builtins/python/parser.go

- Format analysis/symbols_internal.go and remove unused symbols from internalAllowedSymbols (bufio.Scanner, bytes.SplitAfter, hash/crc32.ChecksumIEEE, math.MaxFloat64, math.Round, math/big.NewFloat, unicode.Is{Space,Title,Upper}, unicode.To{Lower,Title,Upper}, unicode/utf8.RuneError) that are not used by any builtins/internal file. - Add missing copyright headers to pyruntime/parse_test.go and pyruntime/smoke_test.go. - Fix data race in pyruntime.Run(): after ctx.Done() fires, wait for the goroutine running runInternal to finish before returning. Without this, the goroutine's defer (printTraceback → fmt.Fprintf to opts.Stderr) races with the caller reading opts.Stderr in the test. The evaluator checks ctx.Done() at each loop iteration so the goroutine terminates promptly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Block host env access: - Remove os.Environ() and os.LookupEnv() from the Python os module. os.environ is now an empty dict and os.getenv() always returns its default argument. Python scripts must not be able to read process environment variables (API keys, tokens, etc.). - Drop os.Environ and os.LookupEnv from the pyruntime symbol allowlists. - Update scenario tests to verify PATH and other real env vars are invisible, and that os.environ is empty (len == 0). Fix callObject data race: - Replace the package-level callObject function variable with a goroutine-keyed sync.Map (goroutineCallFns). Each Python execution registers its evaluator's callObject at goroutine start and deregisters on return, so concurrent executions never share a function pointer. Previously, two parallel Python scenarios would race on the write at newEvaluator():50, causing test failures under -race. goroutineID() reads the goroutine number from runtime.Stack. - Add runtime.Stack and sync.Map to the pyruntime symbol allowlists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…andbox Python's os.listdir, os.path.exists, os.path.isfile, and os.path.isdir were calling os.ReadDir/os.Stat directly, bypassing the AllowedPaths sandbox. Route them through new Stat/ReadDir callbacks on RunOpts, wired to callCtx.StatFile/callCtx.ReadDir in the python builtin. Also remove os.Environ/os.LookupEnv from the symbol allowlist (removed in prior commit) and add io/fs.FileInfo + io/fs.DirEntry in their place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

analysis/symbols_internal.go

…CodeQL - lexer.go: guard rune() cast with unicode.MaxRune check for \U escapes - parser.go: guard int64() cast with math.MaxInt64 check for uint64 literals; values exceeding int64 range now fall through to the big.Int path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ath leakage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…axInt64/unicode.MaxRune Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…o builtins/python Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexandreYang · 2026-04-12T00:16:20Z

analysis/symbols_builtins.go

 		"strings.TrimSpace",  // 🟢 removes leading/trailing whitespace; pure function.
 	},
+	"python": {
+		"bufio.NewReader",                 // 🟢 wraps an io.Reader with buffering for readline support; no write capability.


TODO: create symbols_python_builtins.go checks?

AlexandreYang and others added 4 commits April 11, 2026 21:44

[experiment] python

d5a31e7

test(python): add comprehensive scenario tests for python builtin

897491c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-advanced-security bot found potential problems Apr 11, 2026

View reviewed changes

builtins/python/lexer.go Fixed Show fixed Hide fixed

builtins/python/parser.go Fixed Show fixed Hide fixed

AlexandreYang and others added 3 commits April 12, 2026 01:22

AlexandreYang commented Apr 11, 2026

View reviewed changes

analysis/symbols_internal.go Outdated Show resolved Hide resolved

AlexandreYang commented Apr 11, 2026

View reviewed changes

analysis/symbols_internal.go Outdated Show resolved Hide resolved

AlexandreYang and others added 3 commits April 12, 2026 01:47

feat(python): remove os.getcwd and os.path.expanduser to block host p…

f5235f8

…ath leakage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(analysis): remove os.Getwd/UserHomeDir from allowlist, add math.M…

6787523

…axInt64/unicode.MaxRune Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexandreYang added the verified/analysis Human-reviewed static analysis changes label Apr 11, 2026

AlexandreYang and others added 3 commits April 12, 2026 01:56

docs: replace Python 3.4 references with Python 3 in SHELL_FEATURES.md

6ac03b1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(python): move interpreter from builtins/internal/pyruntime t…

0010b10

…o builtins/python Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

revert: cosmetic changes to analysis files

9198cb8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexandreYang commented Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experiment] python re-implement#179

[experiment] python re-implement#179
AlexandreYang wants to merge 14 commits intomainfrom
alex/python_re_implement

AlexandreYang commented Apr 11, 2026

Uh oh!

AlexandreYang commented Apr 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexandreYang commented Apr 11, 2026

What does this PR do?

Motivation

Testing

Checklist

Uh oh!

AlexandreYang commented Apr 11, 2026

Plan: Replace gpython with a Custom Pure-Go Python Interpreter

Context

Recommended Approach: Custom Pure-Go Python Interpreter

Implementation Scope

Files to create (in builtins/internal/pyruntime/)

Files to modify

Python features to implement

Test strategy

Verification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Files to create (in `builtins/internal/pyruntime/`)