Skip to content

Indexer skips TypeScript .cts/.mts source files (extension→grammar map omits them); CommonJS-TS codebases index as empty #27

@trek-e

Description

@trek-e

Summary

Memtrace's indexer does not parse TypeScript files with the .cts (and .mts) extensions. These are official, first-class TypeScript source extensions (introduced in TypeScript 4.7), syntactically identical to .ts — only their module-format semantics differ. The walker treats them as non-source and skips them before the TypeScript grammar ever runs, so a codebase authored in CommonJS-TypeScript (.cts) indexes as effectively empty: no symbols, no call edges, no communities for the real production code.

This is the analogue of #16 (detection "limited to declared web frameworks") but for source-file parsing: the language map recognizes .ts/.tsx but omits .cts/.mts.

Environment

  • Memtrace: 0.6.30 (also absent from release notes through v0.6.46, the current latest — no cts/mts/extension/language change mentioned in any release body v0.6.30v0.6.46)
  • OS: macOS (darwin-arm64)
  • Node: v26.3.0
  • Target repo: a TypeScript project using --module nodenext, with 116 .cts source modules under src/ (CommonJS-TypeScript; tsc emits gitignored .cjs build artifacts to a separate dir)

Steps to reproduce

  1. Point index_directory at a repo whose source is .cts (e.g. src/*.cts, compiled via tsc with outDir elsewhere; the compiled .cjs is gitignored, as is standard).
  2. Inspect the result / get_repository_stats / find_symbol for any known exported function.

Observed:

  • style_fingerprint.by_language contains only javascript (from committed .cjs tests/scripts) and yamlno typescript bucket at all, despite 116 .cts files on disk.
  • find_symbol for real exports (loadConfig, resolveModel, …) returns 0 matches → "Symbol not found in indexed graph".
  • find_central_symbols / find_bridge_symbols are dominated by test helpers and build scripts — the production spine is simply absent from the graph.

Proof the content parses fine — the problem is purely the extension gate, not the grammar:

  • Copy the same src/*.cts files to a temp dir, rename each to .ts (byte-identical content), and index_directory that.
  • Result: a typescript bucket appears with 1077 functions, 7268 edges, and 5 real source communities. Every symbol resolves.

The only variable changed was the file extension.

Root cause (inferred from observable behavior)

The filesystem walker's extension→grammar map maps .ts, .tsx, .js, .jsx to a parser but has no entry for .cts/.mts (nor .d.cts/.d.mts). Files with those extensions fall through to the incremental: skipping non-source file path (string present in the shipped binary) and are never handed to the TypeScript parser — even though tree-sitter-typescript parses their contents identically (demonstrated above).

docs/indexing-and-ignore-rules.md documents the gitignore/.memtraceignore/binary-extension skip layers, but .cts/.mts are not being skipped by any of those — they are simply not in the source parse set, so there is no user-facing override (--no-ignore-vcs, .memtraceignore re-include) that recovers them.

Why .cts/.mts must be supported (authoritative references)

From the TypeScript docs (Context7 /microsoft/typescript-website):

  • TypeScript 4.7 release notes — "New File Extensions": "TypeScript introduces new source file extensions: .mts for ES modules and .cts for CommonJS modules. When TypeScript compiles .mts files, it emits them as .mjs… and .cts files are emitted as .cjs. TypeScript also supports new declaration file extensions, .d.mts and .d.cts." These are the explicit, format-pinned counterparts to Node's .mjs/.cjs.
  • Module Resolution (Node16/nodenext): the importing file's extension (.mts vs .cts) selects the import vs require resolution algorithm. .cts/.mts are load-bearing in modern Node module resolution, not stylistic.
  • TypeScript 5.6 release notes — "Enforcing File Extension Module Formats": .mts files never emit CommonJS and .cts files never emit ESM; format-specific extensions are now respected in all module modes. The ecosystem is moving toward these extensions, not away.

In short: .cts/.mts are standard TypeScript source as of TS 4.7 (June 2022) and increasingly the canonical way to author format-explicit modules under nodenext. A tool that indexes .ts but not .cts/.mts silently produces an empty graph for entire CommonJS-TypeScript codebases, with no error and no diagnostic — the failure mode is invisible until a user checks find_symbol.

Proposed fix

Add .cts, .mts, .d.cts, .d.mts to the TypeScript entry of the extension→grammar map (route .cts/.mts to the same tree-sitter-typescript grammar as .ts; .d.cts/.d.mts alongside .d.ts). The grammar already handles the content, so this is a map addition, not parser work — plausibly a good first issue.

Optional nicety: when a directory contains source files whose extensions are unrecognized but adjacent to recognized ones, surface a one-line count in memtrace status (e.g. Skipped (unrecognized source extension): 116 files) so this class of silent-empty-index failure becomes visible.

Impact

Any repo authored in .cts/.mts — common in modern Node libraries targeting nodenext and dual CJS/ESM packages — currently can't be indexed at all. The fix is small and unblocks an entire (growing) category of TypeScript projects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions