-
Notifications
You must be signed in to change notification settings - Fork 783
Add package deduplication, --deduplicatePackages #2361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...es/reference/submodule/fourslash/goToDefinition/duplicatePackageServices.baseline.jsonc.diff
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements package deduplication for the TypeScript compiler port to address issue #1080. When multiple packages with the same name and version are installed in different node_modules locations, the compiler now treats them as compatible by replacing duplicate ASTs with a canonical version.
Key changes:
- Adds
--disablePackageDeduplicationflag (default: false) to control the behavior - Modifies package ID format to include the resolved file path within the package, making IDs more specific
- Implements deduplication during file collection by replacing duplicate file references with canonical ones
- Updates program incremental update logic to trigger full rebuilds when deduplicated files change
Reviewed changes
Copilot reviewed 74 out of 74 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
internal/core/compileroptions.go |
Adds new DisablePackageDeduplication compiler option field |
internal/tsoptions/parsinghelpers.go |
Adds parsing logic for the new option |
internal/tsoptions/declscompiler.go |
Defines command-line option metadata with appropriate flags |
internal/diagnostics/*.{json,go} |
Adds diagnostic message for the new option |
internal/module/resolver.go |
Changes package ID generation to use resolved file path instead of package directory, making SubModuleName more specific |
internal/compiler/program.go |
Implements redirect target tracking and updates incremental compilation logic to handle deduplicated files |
internal/compiler/fileloader.go |
Adds data structures for tracking package deduplication mappings |
internal/compiler/filesparser.go |
Implements core deduplication logic that tracks canonical files and replaces duplicates |
internal/fourslash/tests/manual/duplicatePackageServices_fileChanges_test.go |
Adds test for language service behavior with file changes in deduplicated packages |
internal/fourslash/_scripts/*.txt |
Updates test lists to reflect new passing tests |
testdata/baselines/reference/**/*.{js,txt,diff,jsonc} |
Updates baseline files showing new package ID format and deduplicated file behavior |
...aselines/reference/submodule/compiler/declarationEmitForGlobalishSpecifierSymlink.types.diff
Show resolved
Hide resolved
internal/compiler/filesparser.go
Outdated
| collectFiles(subTasks, seen) | ||
|
|
||
| if packageIdToCanonicalPath != nil { | ||
| for _, resolution := range task.resolutionsInFile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly like that this loop is like this, versus having the package ID before collect and then deduping with it, but there are some weird ordering constraints here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stared at this for a while and figured out how to actually make it behave the way I wanted without this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, this loop was working around the fact that packages get their IDs from incoming edges, so there was a race. Now it's way less weird
internal/compiler/program.go
Outdated
| // node_modules locations), we need to rebuild the program because the redirect targets | ||
| // might need recalculation. A file is in a redirect group if it's either a canonical | ||
| // file that others redirect to, or if it redirects to another file. | ||
| if _, ok := p.deduplicatedPathMap[changedFilePath]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second reading, I think this is wrong; this mapping is always populated even if something was not deduped.
I will have to redo this to instead build a list of files that are actually a part of deduping, I think. Or, not bother. I have not written a test for this case, so...
Fixes #1080
The old compiler did "package deduplication" through source file redirection. This allowed two duplicate packages in different parts of
node_modules(but with the same name / package.json version) to be "compatible" by way of them being the same ASTs even though they are at different paths (and so at runtime, probably are not compatible).The old compiler did this by essentially proxying the original
SourceFileviaObject.create(sourceFile)(the original file is the prototype 😨), then overridingsymbol,fileName, etc, on that new instance.The old compiler also chose the "winning"/canonical packages as they were walked, saving work and producing a deterministic result. The new compiler's loading is fully concurrent, so that is a challenge.
We can't really do the same thing as we did before. It is technically feasible to move data off
SourceFileand embed a pointer to a data sidecar, which can be shared safely, but the problem is that we want thesymbolto also be shared, and that's not movable due to the way nodes are laid out.Instead, this PR lets loading proceed as "normal", no deduplication. Then, during
collectFiles, it deduplicates packages. Deduplication is done by quite literally replacing duplicates with the canonical files' ASTs. This means looking them up later will give the "right" source file, with the "right" nodes and symbols, as before.This seems to be sufficient; it's basically just what we did before, but less optimal, and without creating a completely new SourceFile node with different. It just means that if you ever query
filesByPath, you might get a file with a different path than you expect. But, only in deps.I've also made a flag that controls this behavior,
--deduplicatePackages, defaulting totrue.I tried also reintroducing the concept of "redirected files" by moving data off of
SourceFileinto an embedded pointer, but that hit some walls where things didn't work. I'm not sure if it's worth doing if this method seems to work fine?