Skip to content

Latest commit

 

History

History
39 lines (23 loc) · 18 KB

File metadata and controls

39 lines (23 loc) · 18 KB

Scope & Coverage

DiffGate is diff-aware and deterministic. How deeply it analyzes a change depends on the file's language. JavaScript/TypeScript, Python, and PHP get a real AST; other common languages are covered by comment-aware pattern rules; everything else is analyzed at the text level. This page states exactly where each tier applies so you can calibrate how much to trust a clean result.


Language coverage

Tier Languages What runs
Deep (AST) JavaScript, TypeScript, JSX/TSX (@babel); Python, PHP, Go, Ruby, Java, C#, Kotlin (tree-sitter) Real AST — @babel/parser for JS/TS, tree-sitter (WASM, no native build) for Python, PHP, Go, Ruby, Java, C#, and Kotlin. Rules are structural: deprecated calls aren't matched inside comments or strings, exported-signature changes are detected by shape (JS/TS), and injection sinks are AST-precise. The Python/PHP sql-injection rule is sink-targeted (only flags a dynamic SQL string reaching a query sink — .execute/text(...) in Python, mysqli_query/$pdo->query/->prepare plus Laravel raw-fragment builders whereRaw/orderByRaw/… in PHP — not a SELECT in a log line), parameter-aware (cur.execute("… %s", (uid,)), ->prepare("… ?"), and whereRaw("… > ?", [$x]) are safe), static-clearing (interpolating only a constant is safe), format-aware (Python .format(), PHP sprintf() into a query are caught), and sanitizer-aware (Python quote_ident/sql.Identifier, PHP (int)$id/mysqli_real_escape_string down-tier to review). PHP also knows single-quoted strings don't interpolate ('… $id' is literal), nowdoc (<<<'EOT') doesn't interpolate, and __DIR__/dirname()/const-built paths are static (not dynamic). Python also has AST-precise XSS (mark_safe/Markup/render_template_string of a dynamic value; escape()-wrapped values down-tier), path traversal (open/send_file of request data; secure_filename/safe_join/basename wrappers down-tier), permissive CORS (flask-cors CORS(...)/@cross_origin, django-cors-headers allow-all, manual Access-Control-Allow-Origin: *), command-injection (os.system/os.popen/subprocess.getoutput always-shell, and subprocess.run/Popen/… with shell=True and a dynamic arg — the argument-list form subprocess.run(["ls", x]) is safe; shlex.quote down-tiers), code-injection (eval/exec/compile of a dynamic value — a literal, and attribute calls like pandas df.eval/ast.literal_eval, are not flagged), and unsafe-deserialization (pickle/marshal/dill .load/.loads and yaml.load of a dynamic value — yaml.safe_load / Loader=SafeLoader down-tier, but FullLoader still blocks). Python's seven AST classes now match PHP's depth. Python module sinks (command/deserialization/SSRF) are import-alias-resolvedimport subprocess as sp; sp.run(...), from os import system, and import pickle as p; p.loads(...) are caught, not only direct dotted calls. PHP has the widest non-JS AST coverage — eight sink classes, each sink-targeted, dynamic-aware, and sanitizer-down-tiering: sql-injection, command-injection (exec/shell_exec/passthru/system/proc_open/popen/backticks; escapeshellarg/escapeshellcmd down-tier; an argument-array proc_open(['ls',$f]) is safe), code-injection (eval/create_function/string-assert of a dynamic value — assert($x===1) is correctly not flagged), file-inclusion (LFI/RFI — include/require family of a dynamic path; basename() down-tier), unsafe-deserialization (unserialize() of a dynamic value; ['allowed_classes'=>false] down-tier), xss-sink (echo/print/printf and short-echo <?= … ?> of request superglobals $_GET/$_POST/…; htmlspecialchars/htmlentities/(int) down-tier), and path-traversal (fopen/readfile/file_get_contents/unlink/copy/… of request data — only the path argument, so file_put_contents($safe, $reqData) is not flagged; file_get_contents($url) is also SSRF; basename/realpath down-tier), and permissive-cors (a wildcard header("Access-Control-Allow-Origin: *") or a request-reflected $_SERVER['HTTP_ORIGIN'], incl. framework $resp->headers->set(...) / ->withHeader(...); a generic ->set on a non-CORS header and an allowlisted origin are not flagged). The blocking PHP classes (sql/command/code/file-inclusion/deserialization) block on local evidence; xss, path-traversal, and permissive-cors are non-blocking advisories. Go has four AST classes, the injection ones exploiting Go's lack of string interpolation (a dynamic query is built only by + concat or fmt.Sprintf): sql-injection (database/sql/sqlx/gorm sinks — Query/Exec/Queryx/MustExec/Raw/…; a ?/$1 placeholder string is static and safe, fmt.Sprintf/concat into a sink blocks; the generic Get/Select names are deliberately excluded to avoid false-blocking caches), command-injection (exec.Command/exec.CommandContext — Go runs no shell, so the argument-vector form exec.Command("git", "checkout", branch) is correctly safe; it blocks only a static shell program with a dynamic argument (sh -c <dynamic>) or a request-tainted program name — staying clear of the noisy gosec-G204 "variable in subprocess" false-block), and path-traversal (os.ReadFile/os.Open/http.ServeFile/… of request data via r.FormValue/r.URL.Query()/mux.Vars; filepath.Base down-tiers), and permissive-cors (a wildcard w.Header().Set("Access-Control-Allow-Origin", "*") or the request's own Origin reflected back, gin-contrib AllowAllOrigins: true/AllowOrigins: []string{"*"}/AllowOriginFunc: … { return true }/cors.Default(), rs/cors AllowedOrigins: {"*"}/cors.AllowAll() — an explicit origin allowlist, a real origin-checking func, and a generic .Set/.Default on non-CORS objects are not flagged). sql/command block; path-traversal and permissive-cors are advisory. Not yet covered for Go (honest gaps): text/template-vs-html/template XSS, and sqlx Get/Select. Ruby has six AST classes, built around #{…} interpolation (a string/subshell with an interpolation of a non-constant is dynamic; interpolating only a Constant is static): sql-injection (ActiveRecord raw-SQL methods where/find_by_sql/exists?/order/group/joins/… and connection execute/exec_query/… — fragment sinks, so "age > #{x}" blocks; the ?-placeholder array where("age > ?", x), the value-bound where("id = ?", "#{x}"), and the hash where(age: x) forms are safe; connection.quote/sanitize_sql down-tier), command-injection (system/exec/spawn, backticks/%x{} subshell, IO.popen, Open3.*, Process.spawn — only the single-string form shells, so system("git", "checkout", x) is safe; Shellwords.escape down-tiers), code-injection (eval/instance_eval/class_eval/module_eval of a dynamic value — the block form is not flagged), unsafe-deserialization (Marshal.load/YAML.load/Oj.load of a dynamic value; YAML.safe_load is safe), xss-sink (raw(…)/.html_safe/safe_concat of a dynamic value; sanitize/h/html_escape down-tier), and permissive-cors (rack-cors origins '*', a wildcard set_header/headers['Access-Control-Allow-Origin'] = '*', or the request's own Origin reflected back — explicit allowlists and unrelated '*' arguments are not flagged). sql/command/code/deserialization block; xss and permissive-cors are advisory. Not yet covered for Ruby (honest gaps): path-traversal (File.read(params[:file]), send_file params[:path]), mass-assignment, open-redirect, render inline: SSTI, and dynamic send/constantize. Java (no string interpolation either — dynamic queries/commands are built by + or String.format) has five AST classes: sql-injection (JDBC executeQuery/executeUpdate/prepareStatement/prepareCall, JPA/Hibernate createQuery/createNativeQuery/createSQLQuery as fragment sinks so keyword-less HQL from User where … blocks, and JdbcTemplate query/update/… gated on a SQL keyword; a ?-placeholder prepareStatement is safe, cross-line query variables resolved), command-injection (Runtime.exec/ProcessBuilder with a concatenation/String.format-built or request-tainted argument — a bare opaque parameter is not flagged), unsafe-deserialization (ObjectInputStream.readObject/readUnshared on a receiver, XStream.fromXML of a dynamic value — the canonical native-deser gadget sink), and path-traversal (new File/FileInputStream/Files.readAllBytes/Paths.get of request.getParameter/getHeader; FilenameUtils.getName down-tiers), and permissive-cors (a bare @CrossOrigin — Spring's default allows ALL origins — or one whose only settings are e.g. maxAge, an explicit "*" origin, .allowedOrigins("*")/.addAllowedOrigin("*") and the …OriginPattern variants, and setHeader/addHeader wildcard or request-reflected Origin writes — an explicit origin, key or implicit-value form, is not flagged). sql/command/deserialization block; path-traversal and permissive-cors are advisory. Not yet covered for Java (honest gaps): StringBuilder-built SQL, and SpEL/OGNL expression injection. C# (interpolation-aware — $"…{x}", plus concat and string.Format) has six AST classes: sql-injection (new SqlCommand(…), cmd.CommandText = …, Dapper Query/Execute (SQL-keyword gated), EF Core FromSqlRaw/ExecuteSqlRaw as fragment sinks — a @name-parameterized command and EF FromSqlInterpolated (which parameterizes the interpolation) are correctly safe; cross-line query variables and const-interpolation resolved), command-injection (Process.Start/ProcessStartInfo.Arguments/FileName with a concat/interpolation-built or request-tainted value — a bare opaque parameter is not flagged), unsafe-deserialization (BinaryFormatter/SoapFormatter/NetDataContractSerializer/LosFormatter .Deserialize, resolved through the receiver's new …() — a safe serializer is distinguished), path-traversal (File.*/new FileStream/StreamReader of Request.Query/Request.Form/…; Path.GetFileName down-tiers), xss-sink (@Html.Raw/Response.Write/new HtmlString of a dynamic value; HttpUtility.HtmlEncode down-tiers), and permissive-cors (.AllowAnyOrigin(), .WithOrigins("*"), and Response.Headers wildcard or request-reflected Origin writes via .Add/.Append/indexer — explicit origins and non-CORS headers are not flagged). sql/command/deserialization block; path-traversal, xss, and permissive-cors are advisory. Not yet covered for C# (honest gaps): Json.NET TypeNameHandling and LDAP injection. Kotlin runs on the JVM, so it shares Java's sink vocabulary, plus Kotlin string templates ("… $x" simple and "… ${expr}" braced — interpolating a non-constant is dynamic, interpolating a const val is static): five AST classes — sql-injection (JDBC/JPA fragment sinks + Android rawQuery/execSQL, JdbcTemplate query/execute keyword-gated; ?-placeholders and const templates safe, cross-line val query variables resolved, escapeSql/quoteIdentifier down-tier — matching Java), command-injection (Runtime.exec/ProcessBuilder with a template/concat-built or request-tainted value — a bare opaque parameter is not flagged), unsafe-deserialization (ObjectInputStream.readObject/readUnshared on a receiver), path-traversal (File(...)/FileInputStream/Files.readAllBytes of getParameter/@RequestParam/Ktor call.parameters; FilenameUtils.getName down-tiers), and permissive-cors (Ktor anyHost(), a bare or "*" Spring @CrossOrigin, .allowedOrigins("*") and variants, and setHeader/.headers.append wildcard or request-reflected Origin writes — explicit origins, allowHost(...), and non-CORS headers are not flagged). sql/command/deserialization block; path-traversal and permissive-cors are advisory. The community Kotlin grammar is field-less and represents a simple $x template as text fragments, so the rules use positional parsing; not yet covered for Kotlin (honest gaps): XSS, the File(...).name property sanitizer, and Ktor sources beyond the common ones. SSRF is a cross-language advisory (orange, non-blocking) shared by all eight Deep-AST languages: a request-tainted URL/host reaching an outbound-request sink — JS fetch/axios, Python requests/urllib, Go http.Get/NewRequest, Ruby Net::HTTP/HTTParty/URI.open, PHP curl/fsockopen/get_headers, Java new URL/RestTemplate, C# HttpClient/WebRequest, Kotlin URL(...)/OkHttp. It is library-qualified (a generic .get on a dict/cache is not an HTTP fetch) and only fires on tainted URLs, so a static or config URL is never flagged. Python taint sources: request.args/form/values/json/get_json()/data/files/params and Django equivalents; known gap: a two-hop chain through an intermediate dict (data = request.get_json(); url = data.get('url'); requests.get(url)) is not traced — only one-hop assignments (url = request.args.get('url')) and inline calls (requests.get(request.args.get('url'))) are caught. XXE is a cross-language advisory on the JVM (Java, Kotlin) and .NET (C#): an XML parser created without disabling DOCTYPE/external-entity resolution. It is suppressed when the file shows recognized hardening (JVM: disallow-doctype-decl, FEATURE_SECURE_PROCESSING, SUPPORT_DTD=false, ACCESS_EXTERNAL_DTD, …; .NET: DtdProcessing.Prohibit, XmlResolver = null), and the modern safe-by-default APIs (XmlReader.Create, XmlSecureResolver) are not flagged. On JS/TS, injection sinks additionally cover prototype pollution and NoSQL injection — these two are JS/TS-only (Node/Express/Mongo idioms) — and JS/TS findings are eligible for code-graph taint confirmation. Not yet covered for PHP (honest gaps): type-juggling (==), weak crypto (md5/sha1/mt_rand), header() injection / open redirect, and extract()/mass-assignment.
Pattern (regex, comment-aware) C/C++, Rust, Swift, Scala, SQL, shell Hardcoded secrets, schema/migration changes, auth/crypto-sensitive code, dynamic execution / shell-out, raw queries, outbound network calls, debug logging, TODO/FIXME. Commented-out code (# os.system(x)) isn't flagged; a secret committed inside a comment still is. Injection sinks run here too: a broad advisory sql-injection-candidate rule catches the idiomatic non-AST vectors (Ruby #{}, etc.) and dangerous-exec covers Go exec.Command, Ruby system/%x{}, etc. Those advisories escalate to blocking when CodeGraph confirms reachability from an HTTP/event handler (see below).
Text-only YAML, Terraform, JSON, and any other text Hardcoded secrets and TODO/FIXME markers.

The deterministic core is the trustworthy floor, not an exhaustive guarantee. On AST languages, an injection finding is AST-precise. On pattern/text languages, treat injection findings as a strong signal and clean results as "nothing matched the patterns," not "proven safe."

Reachability (community CodeGraph)

The non-JS injection advisories are deliberately broad — broad regex is recall, not precision, so on its own a candidate never blocks. Precision comes from the graph: when an optional CodeGraph index is present, DiffGate walks the (deterministic, AST-derived) call graph from the sink back toward untrusted entry points — HTTP/event handlers that the framework exposes. If a path exists, the finding escalates to a blocking orange with the entry point named (GET /user → …) and is labelled trust: "reachable". If no path exists it stays advisory and is labelled trust: "unreachable" ("verify before dismissing — coverage depends on the index"); it is never auto-cleared unless you opt in with graph.reachabilityDeescalate: true. No graph, or a graph that can't answer, leaves the advisory exactly as it was — the gate behavior with no code graph is unchanged. This needs only the community CodeGraph edition (no Pro taint engine).

Raising coverage for your stack: add project-specific rules via customPatterns / orangePatterns in .diffgate.json — these apply to any language. See CONFIG.md.


What the code graph does — and doesn't

The optional code graph (CODE-GRAPH.md) is a precision layer, not a recall layer. When configured, it operates on the findings the base rules already produced and makes them more trustworthy and better-prioritized:

  • attaches cross-file blast radius — caller counts, suggested reviewers, untested call sites — and adjusts tier accordingly;
  • reachability (community edition): escalates a broad cross-language injection advisory to a blocking finding when the call graph proves the sink is reachable from an untrusted entry point — the community-tier precision source;
  • taint analysis (Pro): confirms a real source → sink path on injection findings (kept, with the data-flow trace) or proves none exists (down-tiered only if you opt in with graph.securityDeescalate: true).

It does not add new detections. If a base rule doesn't flag a sink, enabling the graph won't flag it either — there is nothing for the graph to escalate. Raising recall (e.g. catching more non-JS injection forms) is a rule-layer change — which is exactly what sql-injection-candidate is. The split is deliberate: rules buy recall, the graph buys the right to block. The graph is off by default and degrades to a complete no-op when absent.


Design intent

DiffGate is intentionally a low-noise, diff-scoped second pair of eyes — not a whole-repo SAST. It is tuned to the residue modern coding agents actually ship (see MEASUREMENT.md) rather than to maximize raw rule count. If you need exhaustive multi-language taint analysis across an entire codebase, pair DiffGate with a dedicated SAST tool; DiffGate's job is the fast, deterministic gate on the lines that just changed.