Skip to content

Conversation

@jakebailey
Copy link
Member

@jakebailey jakebailey commented Dec 12, 2025

Fixes #1080

The old compiler did "package deduplication" through source file redirection. This allowed two duplicate packages in different parts of node_modules (but with the same name / package.json version) to be "compatible" by way of them being the same ASTs even though they are at different paths (and so at runtime, probably are not compatible).

The old compiler did this by essentially proxying the original SourceFile via Object.create(sourceFile) (the original file is the prototype 😨), then overriding symbol, fileName, etc, on that new instance.

The old compiler also chose the "winning"/canonical packages as they were walked, saving work and producing a deterministic result. The new compiler's loading is fully concurrent, so that is a challenge.

We can't really do the same thing as we did before. It is technically feasible to move data off SourceFile and embed a pointer to a data sidecar, which can be shared safely, but the problem is that we want the symbol to also be shared, and that's not movable due to the way nodes are laid out.

Instead, this PR lets loading proceed as "normal", no deduplication. Then, during collectFiles, it deduplicates packages. Deduplication is done by quite literally replacing duplicates with the canonical files' ASTs. This means looking them up later will give the "right" source file, with the "right" nodes and symbols, as before.

This seems to be sufficient; it's basically just what we did before, but less optimal, and without creating a completely new SourceFile node with different. It just means that if you ever query filesByPath, you might get a file with a different path than you expect. But, only in deps.

I've also made a flag that controls this behavior, --deduplicatePackages, defaulting to true.


I tried also reintroducing the concept of "redirected files" by moving data off of SourceFile into an embedded pointer, but that hit some walls where things didn't work. I'm not sure if it's worth doing if this method seems to work fine?

@jakebailey jakebailey changed the title Add "package deduplication", sorta Add "package deduplication" Dec 15, 2025
@jakebailey jakebailey changed the title Add "package deduplication" Add "package deduplication", --disablePackageDeduplication Dec 15, 2025
@jakebailey jakebailey marked this pull request as ready for review December 17, 2025 03:18
Copilot AI review requested due to automatic review settings December 17, 2025 03:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements package deduplication for the TypeScript compiler port to address issue #1080. When multiple packages with the same name and version are installed in different node_modules locations, the compiler now treats them as compatible by replacing duplicate ASTs with a canonical version.

Key changes:

  • Adds --disablePackageDeduplication flag (default: false) to control the behavior
  • Modifies package ID format to include the resolved file path within the package, making IDs more specific
  • Implements deduplication during file collection by replacing duplicate file references with canonical ones
  • Updates program incremental update logic to trigger full rebuilds when deduplicated files change

Reviewed changes

Copilot reviewed 74 out of 74 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/core/compileroptions.go Adds new DisablePackageDeduplication compiler option field
internal/tsoptions/parsinghelpers.go Adds parsing logic for the new option
internal/tsoptions/declscompiler.go Defines command-line option metadata with appropriate flags
internal/diagnostics/*.{json,go} Adds diagnostic message for the new option
internal/module/resolver.go Changes package ID generation to use resolved file path instead of package directory, making SubModuleName more specific
internal/compiler/program.go Implements redirect target tracking and updates incremental compilation logic to handle deduplicated files
internal/compiler/fileloader.go Adds data structures for tracking package deduplication mappings
internal/compiler/filesparser.go Implements core deduplication logic that tracks canonical files and replaces duplicates
internal/fourslash/tests/manual/duplicatePackageServices_fileChanges_test.go Adds test for language service behavior with file changes in deduplicated packages
internal/fourslash/_scripts/*.txt Updates test lists to reflect new passing tests
testdata/baselines/reference/**/*.{js,txt,diff,jsonc} Updates baseline files showing new package ID format and deduplicated file behavior

collectFiles(subTasks, seen)

if packageIdToCanonicalPath != nil {
for _, resolution := range task.resolutionsInFile {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like that this loop is like this, versus having the package ID before collect and then deduping with it, but there are some weird ordering constraints here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stared at this for a while and figured out how to actually make it behave the way I wanted without this loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this loop was working around the fact that packages get their IDs from incoming edges, so there was a race. Now it's way less weird

// node_modules locations), we need to rebuild the program because the redirect targets
// might need recalculation. A file is in a redirect group if it's either a canonical
// file that others redirect to, or if it redirects to another file.
if _, ok := p.deduplicatedPathMap[changedFilePath]; ok {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second reading, I think this is wrong; this mapping is always populated even if something was not deduped.

I will have to redo this to instead build a list of files that are actually a part of deduping, I think. Or, not bother. I have not written a test for this case, so...

@jakebailey jakebailey changed the title Add "package deduplication", --disablePackageDeduplication Add "package deduplication", --deduplicatePackages Jan 9, 2026
@jakebailey jakebailey changed the title Add "package deduplication", --deduplicatePackages Add package deduplication, --deduplicatePackages Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deduplicate doubly-installed packages

3 participants