Skip to content

[FEATURE] 158-language support via universal tree-sitter grammar loader #18

@Wolfvin

Description

@Wolfvin

Problem

CodeLens currently supports ~10 languages with hand-written parsers per language. Each new language requires a new parser file. This doesn't scale, and agents working on Go, Rust (beyond current support), Java, C++, etc. get no structural analysis.

Proposed Approach: Universal Grammar Loader

tree-sitter has grammars for 150+ languages available as npm packages. Instead of writing per-language parsers, write one generic extraction layer that works on any tree-sitter grammar:

UNIVERSAL_NODE_TYPES = {
    # Maps tree-sitter node type names -> CodeLens concept
    'function_definition': 'Function',
    'function_declaration': 'Function',
    'method_definition': 'Method',
    'class_declaration': 'Class',
    'class_definition': 'Class',
    'import_statement': 'Import',
    'call_expression': 'CallSite',
    # ... etc
}

Many tree-sitter grammars use similar node type names — a universal mapper covers 80% of languages with ~100 lines of code. Language-specific overrides handle the remaining 20%.

Implementation Steps

  1. Write universal_parser.py that loads any tree-sitter-{lang} grammar dynamically
  2. Define the node type mapping table
  3. Add language -> grammar package mapping (e.g. go -> tree-sitter-go)
  4. Install grammars on-demand via npm/pip in setup.sh
  5. Language-specific override files for languages where naming differs (Ruby, Haskell, etc.)

Custom Kernel Idea (long-term)

For maximum performance, compile all grammars into a single shared Python extension (.so file) using tree-sitter's Language.build_library(). This eliminates per-process grammar loading overhead and could be distributed as a wheel.

This is essentially what codebase-memory-mcp does in C — vendoring all 158 grammars into one binary. The Python equivalent is a compiled .so with all languages baked in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureCore architecture changeenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions