Skip to content

feat(resolution): add C/C++ include path resolution#453

Merged
colbymchenry merged 2 commits into
colbymchenry:mainfrom
g122622:feat/cpp-include-resolution
May 27, 2026
Merged

feat(resolution): add C/C++ include path resolution#453
colbymchenry merged 2 commits into
colbymchenry:mainfrom
g122622:feat/cpp-include-resolution

Conversation

@g122622
Copy link
Copy Markdown
Contributor

@g122622 g122622 commented May 26, 2026

Add full import resolution pipeline for C and C++ #include directives, connecting extracted import nodes to actual header files in the project.

  • Add C/C++ extension resolution (.h, .hpp, .hxx, .cpp, .cc, .cxx)
  • Add system header filtering with ~80 C and ~80 C++ stdlib headers
  • Add extractCppImports() for #include import mapping extraction
  • Add compile_commands.json parsing for -I/-isystem include directories
  • Add heuristic include dir discovery (include/, src/, lib/, api/)
  • Add resolveCppIncludePath() for include directory search
  • Add C/C++ built-in symbol filtering (printf, malloc, std::*, etc.)
  • Wire getCppIncludeDirs into ResolutionContext
  • Add 13 new tests for C/C++ import resolution and extraction

Copilot AI review requested due to automatic review settings May 26, 2026 13:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds C/C++ awareness to the resolver so #include directives and common C/C++ symbols can be handled more accurately during indexing.

Changes:

  • Adds getCppIncludeDirs() to ResolutionContext and wires it into ReferenceResolver.
  • Implements C/C++ #include extraction and include-path resolution via discovered -I directories (compile DB / heuristic).
  • Adds C/C++ built-in symbol filtering plus unit tests for extraction and resolution.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/resolution/types.ts Extends resolver context to expose C/C++ include search directories.
src/resolution/index.ts Adds C/C++ built-in symbol filtering and exposes include-dir loading through context.
src/resolution/import-resolver.ts Adds #include parsing, stdlib header filtering, include-dir discovery/caching, and include-dir-based resolution.
tests/resolution.test.ts Adds test coverage for C/C++ include resolution and include-dir discovery.
tests/extraction.test.ts Adds test coverage for C/C++ include extraction into unresolved references.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +25 to +26
c: ['.h', '.c'],
cpp: ['.h', '.hpp', '.hxx', '.cpp', '.cc', '.cxx'],
Comment on lines +458 to +466

for (const dir of includeDirs) {
const normalizedDir = dir.replace(/\\/g, '/');
for (const ext of extensions) {
const candidate = normalizedDir + '/' + importPath + ext;
if (context.fileExists(candidate)) return candidate;
}
// Try as-is (already has extension)
const candidate = normalizedDir + '/' + importPath;
Comment on lines +163 to +169
// C/C++ standard library headers — both C-style (<stdio.h>) and
// C++-style (<cstdio>, <vector>) forms. Checked against the import
// path (which the extractor strips of <> or "" delimiters).
if (C_CPP_STDLIB_HEADERS.has(importPath)) return true;
// C++ headers without .h extension (e.g. "vector", "string")
const withoutExt = importPath.replace(/\.h$/, '');
if (C_CPP_STDLIB_HEADERS.has(withoutExt)) return true;
Comment on lines +377 to +408
* Handles double-quoted and single-quoted arguments.
*/
function shlexSplit(cmd: string): string[] {
const result: string[] = [];
let i = 0;
while (i < cmd.length) {
// Skip whitespace
while (i < cmd.length && /\s/.test(cmd[i]!)) i++;
if (i >= cmd.length) break;
const ch = cmd[i]!;
if (ch === '"') {
i++;
let arg = '';
while (i < cmd.length && cmd[i] !== '"') {
if (cmd[i] === '\\' && i + 1 < cmd.length) { i++; arg += cmd[i]; }
else { arg += cmd[i]; }
i++;
}
i++; // closing quote
result.push(arg);
} else if (ch === "'") {
i++;
let arg = '';
while (i < cmd.length && cmd[i] !== "'") { arg += cmd[i]; i++; }
i++; // closing quote
result.push(arg);
} else {
let arg = '';
while (i < cmd.length && !/\s/.test(cmd[i]!)) { arg += cmd[i]; i++; }
result.push(arg);
}
}
Comment thread src/resolution/index.ts Outdated
'size_t', 'ptrdiff_t', 'wchar_t', 'intptr_t', 'uintptr_t',
'int8_t', 'int16_t', 'int32_t', 'int64_t',
'uint8_t', 'uint16_t', 'uint32_t', 'uint64_t',
'FILE', 'FILE',
Comment on lines 763 to 766
export function clearImportMappingCache(): void {
importMappingCache.clear();
cppIncludeDirCache.clear();
}
g122622 and others added 2 commits May 26, 2026 19:42
Add full import resolution pipeline for C and C++ #include directives,
connecting extracted import nodes to actual header files in the project.

- Add C/C++ extension resolution (.h, .hpp, .hxx, .cpp, .cc, .cxx)
- Add system header filtering with ~80 C and ~80 C++ stdlib headers
- Add extractCppImports() for #include import mapping extraction
- Add compile_commands.json parsing for -I/-isystem include directories
- Add heuristic include dir discovery (include/, src/, lib/, api/)
- Add resolveCppIncludePath() for include directory search
- Add C/C++ built-in symbol filtering (printf, malloc, std::*, etc.)
- Wire getCppIncludeDirs into ResolutionContext
- Add 13 new tests for C/C++ import resolution and extraction

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The PR landed the include-dir scan logic (loadCppIncludeDirs +
resolveCppIncludePath) but the indexer never reached it: imports
references with referenceName='X.h' fell into resolveViaImport's
symbol-lookup branch (matched extractCppImports' basename-without-ext
localName via .startsWith, then tried to find a symbol named like the
extension and failed). End result on bitcoin-core: 0 new file→file
imports vs main, despite the include-dir scan resolving paths correctly
when probed directly. resolveViaImport now has a C/C++ imports branch
that resolves the include path to the actual file node and returns
that — skipping the irrelevant symbol scan. Measured on bitcoin-core:
+2,059 newly resolved file→file imports (6,027 → 8,086, +34%).

The unconditional CPP_BUILT_INS / C_BUILT_INS filter also misfired:
C/C++ codebases routinely shadow stdlib names (bitcoin's mp::move,
custom allocators with free/malloc, stream classes with read/write/
close/open, logging libs wrapping printf). Filtering those names
killed legitimate edges — 1,179 → 0 for move(), 33 → 0 for free(),
149 → 7 for write() on bitcoin. The filter now defers to
hasAnyPossibleMatch: only filter when no user-defined symbol with the
name exists. std:: prefix stays unconditional (never user-shadowed in
practice). After: printf/free/open/close/read/write/swap all preserved
at main's counts; the std::move-binds-to-mp::move false-positives still
drop (correctly: −2,154 C/C++ calls).

Also: drop the duplicate 'FILE' in C_BUILT_INS; add an end-to-end test
that asserts `#include "X.h"` produces a file→file imports edge in the
real indexing pipeline (not just direct resolver probes); add a test
documenting the cross-language `.h` heuristic claim (Obj-C dirs are
intentionally allowed as C/C++ include dirs); add CHANGELOG entry
under [Unreleased] with measured numbers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@colbymchenry colbymchenry force-pushed the feat/cpp-include-resolution branch from cbe460f to 5f6830f Compare May 27, 2026 00:53
@colbymchenry
Copy link
Copy Markdown
Owner

Huge thanks for this @g122622 — C/C++ include resolution was a real gap and the bones of this PR were solid. The loadCppIncludeDirs design (compile_commands.json first, then heuristic probing of include//src//lib//api//inc/), the stdlib header list, and the built-ins concept were all the right shape.

I pushed a follow-up review commit (5f6830f) before merging to fix two things I caught with an A/B on bitcoin-core (1,989 files) + tinyxml2:

  • Wired the include-dir scan into the pipeline. Probed in isolation the scan resolved paths correctly, but during indexing the imports reference flowed through resolveViaImport's symbol-lookup branch (matched extractCppImports' basename-without-ext localName via .startsWith, then tried to find a symbol named like the file extension and failed). Added a C/C++ imports branch that resolves directly to the file node. Result on bitcoin: file→file imports 6,027 → 8,086 (+34%).
  • Guarded the built-ins filter with hasAnyPossibleMatch. Unconditional filtering of CPP_BUILT_INS/C_BUILT_INS killed legitimate edges to user-shadowed names (bitcoin's mp::move, custom allocators, stream wrappers — move() 1,179 → 0, free() 33 → 0, write() 149 → 7). Now it only filters when no user-defined symbol with that name exists. The std:: prefix stays unconditional. After: all the user-shadowed names preserved at main's edge counts; the genuine std::move-binds-to-mp::move false positives still drop (−2,154 calls, correctly).

Also added an end-to-end test that exercises #include "X.h" → file→file edge through the real indexing pipeline so the wiring can't silently regress again, a test documenting the cross-language .h heuristic (Obj-C dirs are intentionally allowed as C/C++ include dirs), dropped a duplicate 'FILE', and put a CHANGELOG entry under [Unreleased] with the measured numbers. Squash-merging — your authorship stays on the commit. 🙌

@colbymchenry colbymchenry merged commit 48eebe1 into colbymchenry:main May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants