Skip to content

Commit 583af47

Browse files
authored
feat(PLT-2676): enhance caching strategy for monorepos with smart install detection (#174)
This PR enhances the setup-node-with-cache action with comprehensive cache integrity verification and verbose debugging to support the ongoing rollout of centralized GitHub Actions across frontend projects.
1 parent d4b9fd4 commit 583af47

2 files changed

Lines changed: 210 additions & 25 deletions

File tree

shared-actions/setup-node-with-cache/README.md

Lines changed: 97 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Standardized Node.js setup with enhanced yarn caching and GitHub packages regist
1010
- ✅ Restore-keys for fallback caching (85%+ hit rate)
1111
- ✅ GitHub packages registry configuration
1212
- ✅ Automatic dependency installation on cache miss
13+
-**Cache integrity verification** to prevent stale cache issues
14+
-**Comprehensive debug logging** for troubleshooting
1315
- ✅ Support for local testing with `act`
1416

1517
## Usage
@@ -46,8 +48,18 @@ None
4648
- `~/.cache/yarn` - Yarn global cache (always enabled for better performance)
4749
- `~/.asdf/installs` - asdf tool installations (when using asdf)
4850
5. **Uses restore-keys** for fallback caching when exact match not found
49-
6. **Logs cache status** for visibility (HIT/MISS)
50-
7. **Installs dependencies** automatically on cache miss
51+
6. **Verifies cache integrity** on cache hit:
52+
- Checks if `node_modules/` exists and has content
53+
- Validates `.yarn-integrity` file presence
54+
- Verifies workspace packages have `node_modules/`
55+
- Forces fresh install if cache is incomplete or corrupted
56+
7. **Logs cache status** with detailed debug information
57+
8. **Installs dependencies** automatically on cache miss or verification failure
58+
9. **Smart install detection** on cache hit (after verification):
59+
- **Yarn workspaces**: ALWAYS runs install (needs workspace symlink creation)
60+
- **Lerna monorepos**: Runs install (needs lerna bootstrap)
61+
- **Postinstall hooks**: Runs install (needs hook execution)
62+
- **Note**: Even Turbo monorepos need install for workspace linking
5163

5264
## Cache Strategy
5365

@@ -68,17 +80,59 @@ This ensures:
6880
- **Monorepo support**: `**/yarn.lock` pattern handles nested workspaces
6981
- **Consistent keys**: Removed `.tool-versions` dependency for reliability
7082
83+
### Cache Integrity Verification
84+
85+
On cache hit, the action performs **automatic verification** to prevent stale cache issues:
86+
87+
1. **node_modules existence check**: Verifies directory exists and has packages
88+
2. **Yarn integrity validation**: Checks for `.yarn-integrity` file
89+
3. **Workspace structure validation**: Ensures workspace packages have `node_modules/`
90+
91+
If any verification fails, the action **forces a fresh install** to rebuild the cache correctly.
92+
93+
**Why this matters**: Prevents issues where:
94+
- Cache is restored but incomplete (network interruption during save)
95+
- Workspace dependencies added but not in cached `node_modules/`
96+
- Cache corruption or partial restoration
97+
98+
### Monorepo Handling
99+
100+
The action intelligently handles different monorepo types (after cache verification):
101+
102+
| Monorepo Type | Cache Hit Behavior | Reason |
103+
|---------------|-------------------|---------|
104+
| **Yarn workspaces** | ⚠️ ALWAYS runs install | Workspace symlinks NOT preserved in cache |
105+
| **Lerna** (has `lerna.json`) | ⚠️ Runs install | Needs `lerna bootstrap` for package linking |
106+
| **Postinstall hooks** | ⚠️ Runs install | Needs to execute postinstall scripts |
107+
| **Failed verification** | ⚠️ Runs install | Cache incomplete or corrupted |
108+
109+
**Critical: Yarn Workspace Symlinks**
110+
111+
Even Turbo monorepos need `yarn install` on cache hit because:
112+
- GitHub Actions cache does NOT preserve symlinks
113+
- Yarn creates symlinks between workspace packages during install
114+
- Without symlinks, workspace dependencies aren't found (e.g., "jest: not found")
115+
- Turbo handles BUILD caching, not workspace linking
116+
117+
**Performance impact**:
118+
- Adds ~10-20 seconds to cache hits for workspace symlink creation
119+
- This is unavoidable for Yarn workspace monorepos
120+
- The install is fast because packages are already cached
121+
71122
## Performance Impact
72123
73124
| Scenario | Before | After | Improvement |
74125
|----------|--------|-------|-------------|
75-
| Cache hit (setup-node) | 2-3 min | 10-15 sec | 85% faster |
76-
| Cache hit (asdf) | 2-3 min | 5-10 sec | 90% faster |
126+
| Cache hit (setup-node) | 2-3 min | 30-40 sec | 75% faster |
127+
| Cache hit (asdf) | 2-3 min | 25-35 sec | 80% faster |
77128
| Cache miss | 2-3 min | 1-2 min | 40% faster (with yarn cache) |
78129
| Cache hit rate | 60% | 85%+ | 42% better |
79130
| asdf Node.js install | 1-2 min | 5 sec | 95% faster (when cached) |
80131
81-
**Note**: When using asdf-vm, the action caches both the asdf installations and node_modules, providing even better performance.
132+
**Note**:
133+
- Cache hits for Yarn workspaces include ~10-20s for workspace symlink creation
134+
- When using asdf-vm, the action caches both the asdf installations and node_modules
135+
- The install on cache hit is fast because packages are already cached
82136
83137
## Local Testing with act
84138
@@ -157,13 +211,51 @@ If you see "❌ Cache MISS" on every run:
157211
- Look for "✅ Cache HIT" or "❌ Cache MISS" in workflow logs
158212
- Check the cache key being used
159213

214+
### Cache Hits But Still Runs Install?
215+
216+
If you see cache hit but install still runs, check the debug logs:
217+
218+
1. **Cache verification failed**:
219+
```
220+
⚠️ node_modules appears empty - forcing install
221+
⚠️ Missing .yarn-integrity file - forcing install
222+
⚠️ Workspace packages exist but no workspace node_modules found - forcing install
223+
```
224+
This means the cache was incomplete. The action will rebuild it correctly.
225+
226+
2. **Monorepo requires install**:
227+
```
228+
📦 Detected Yarn workspaces - install needed for workspace linking
229+
📦 Detected Lerna monorepo - install needed for lerna bootstrap
230+
🔧 Detected postinstall hook - install needed to execute it
231+
```
232+
This is expected behavior for Yarn workspace monorepos (including Turbo-based ones).
233+
The install recreates workspace symlinks that aren't preserved in cache.
234+
235+
3. **Review debug output**:
236+
The action logs detailed cache information:
237+
- Root `node_modules/` package count
238+
- Yarn integrity file status
239+
- Workspace package count
240+
- Workspace `node_modules/` count
241+
160242
### Still Slow After Cache Hit?
161243

162244
If cache hits but setup still takes >1 minute:
163245

164246
1. **Large node_modules**: Consider using artifacts instead of cache
165247
2. **Slow runner disk**: Check runner performance
166248
3. **Network latency**: Cache download may be slow
249+
4. **Verification forcing install**: Check debug logs for verification failures
250+
251+
### Stale Cache Issues?
252+
253+
If builds fail with "Cannot find module" after cache hit:
254+
255+
1. **Check debug logs** for cache verification results
256+
2. **Manually delete old caches** at: `Settings → Actions → Caches`
257+
3. **The action should auto-detect** incomplete caches and rebuild them
258+
4. **If issue persists**, open an issue with the debug logs
167259

168260
## Related Actions
169261

shared-actions/setup-node-with-cache/action.yml

Lines changed: 113 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,44 @@ runs:
8383
restore-keys: |
8484
${{ runner.os }}-${{ runner.arch }}-yarn-
8585
86+
- name: Debug cache contents
87+
id: debug-cache
88+
if: ${{ !env.ACT && steps.yarn-cache.outputs.cache-hit == 'true' }}
89+
shell: bash
90+
run: |
91+
set +e # Don't exit on error - we want to continue even if find fails
92+
93+
echo "=== 🔍 Cache Debug Info ==="
94+
echo "Cache key: ${{ runner.os }}-${{ runner.arch }}-yarn-${{ hashFiles('**/yarn.lock', '**/package.json') }}"
95+
echo ""
96+
97+
echo "=== 📦 Root node_modules Status ==="
98+
if [ -d "node_modules" ]; then
99+
MODULE_COUNT=$(ls -1 node_modules 2>/dev/null | wc -l | tr -d ' ')
100+
echo "✅ node_modules exists"
101+
echo "📊 Package count: $MODULE_COUNT"
102+
echo "🔒 Yarn integrity: $([ -f 'node_modules/.yarn-integrity' ] && echo '✅ present' || echo '❌ missing')"
103+
else
104+
echo "❌ node_modules directory not found"
105+
fi
106+
echo ""
107+
108+
echo "=== 🏢 Workspace Structure ==="
109+
# Use find with || true to prevent failures when directories don't exist
110+
WORKSPACE_PACKAGES=$(find packages/*/package.json apps/*/package.json -type f 2>/dev/null || true | wc -l | tr -d ' ')
111+
WORKSPACE_MODULES=$(find packages/*/node_modules apps/*/node_modules -maxdepth 0 -type d 2>/dev/null || true | wc -l | tr -d ' ')
112+
echo "📦 Workspace packages: $WORKSPACE_PACKAGES"
113+
echo "🔗 Workspace node_modules: $WORKSPACE_MODULES"
114+
115+
if [ "$WORKSPACE_PACKAGES" -gt 0 ]; then
116+
echo ""
117+
echo "Workspace packages found:"
118+
find packages/*/package.json apps/*/package.json -type f 2>/dev/null || true | sed 's|/package.json||' | sed 's|^| - |'
119+
fi
120+
echo ""
121+
122+
set -e # Re-enable exit on error
123+
86124
- name: Check if yarn install needed on cache hit
87125
id: check-install-needed
88126
if: ${{ !env.ACT }}
@@ -94,48 +132,103 @@ runs:
94132
# - Cache restores node_modules, but doesn't run postinstall hooks
95133
# - Monorepos need workspace linking even with cached node_modules
96134
# - Some projects have critical postinstall scripts (e.g., building native modules)
135+
# - Cached node_modules might be incomplete or corrupted
97136
#
98-
# We detect three scenarios that require yarn install on cache hit:
137+
# We detect scenarios that require yarn install on cache hit:
99138
100139
NEEDS_INSTALL=false
140+
CACHE_HIT="${{ steps.yarn-cache.outputs.cache-hit }}"
101141
102-
# 1. Yarn workspaces: Need symlink creation between workspace packages
103-
if grep -q '"workspaces"' package.json 2>/dev/null; then
104-
echo "📦 Detected Yarn workspaces - install needed for workspace linking"
105-
NEEDS_INSTALL=true
106-
fi
107-
108-
# 2. Lerna monorepo: Need lerna bootstrap (usually in postinstall hook)
109-
if [ -f "lerna.json" ]; then
110-
echo "📦 Detected Lerna monorepo - install needed for lerna bootstrap"
111-
NEEDS_INSTALL=true
142+
# On cache hit, verify cache integrity before trusting it
143+
if [ "$CACHE_HIT" == "true" ]; then
144+
echo "=== 🔍 Verifying Cache Integrity ==="
145+
146+
# Verification 1: Check if node_modules exists and has content
147+
if [ ! -d "node_modules" ]; then
148+
echo "❌ node_modules directory missing - forcing install"
149+
NEEDS_INSTALL=true
150+
else
151+
MODULE_COUNT=$(ls -1 node_modules 2>/dev/null | wc -l | tr -d ' ')
152+
if [ "$MODULE_COUNT" -lt 5 ]; then
153+
echo "⚠️ node_modules appears empty ($MODULE_COUNT packages) - forcing install"
154+
NEEDS_INSTALL=true
155+
fi
156+
fi
157+
158+
# Verification 2: Check yarn integrity file
159+
if [ "$NEEDS_INSTALL" == "false" ] && [ ! -f "node_modules/.yarn-integrity" ]; then
160+
echo "⚠️ Missing .yarn-integrity file - cache may be incomplete, forcing install"
161+
NEEDS_INSTALL=true
162+
fi
163+
164+
# Verification 3: For workspaces, verify workspace packages have node_modules
165+
if [ "$NEEDS_INSTALL" == "false" ]; then
166+
WORKSPACE_PACKAGES=$(find packages/*/package.json apps/*/package.json -type f 2>/dev/null || true | wc -l | tr -d ' ')
167+
if [ "$WORKSPACE_PACKAGES" -gt 0 ]; then
168+
WORKSPACE_MODULES=$(find packages/*/node_modules apps/*/node_modules -maxdepth 0 -type d 2>/dev/null || true | wc -l | tr -d ' ')
169+
if [ "$WORKSPACE_MODULES" -eq 0 ]; then
170+
echo "⚠️ Workspace packages exist but no workspace node_modules found - forcing install"
171+
NEEDS_INSTALL=true
172+
fi
173+
fi
174+
fi
175+
176+
if [ "$NEEDS_INSTALL" == "false" ]; then
177+
echo "✅ Cache integrity verified"
178+
fi
112179
fi
113180
114-
# 3. Postinstall hook: Any project with postinstall needs it executed
115-
# Examples: building native modules, generating files, running setup scripts
116-
if grep -q '"postinstall"' package.json 2>/dev/null; then
117-
echo "🔧 Detected postinstall hook - install needed to execute it"
118-
NEEDS_INSTALL=true
181+
# Check for monorepo configurations that need install
182+
if [ "$NEEDS_INSTALL" == "false" ]; then
183+
# 1. Yarn workspaces: ALWAYS need symlink creation between workspace packages
184+
# CRITICAL: Workspace symlinks are NOT preserved in GitHub Actions cache!
185+
# Even Turbo monorepos need this - Turbo handles build caching, not workspace linking.
186+
if grep -q '"workspaces"' package.json 2>/dev/null; then
187+
echo "📦 Detected Yarn workspaces - install needed for workspace linking"
188+
echo " ⚠️ Workspace symlinks are not preserved in cache"
189+
echo " ⚠️ Skipping install will cause 'module not found' errors"
190+
NEEDS_INSTALL=true
191+
fi
192+
193+
# 2. Lerna monorepo: Need lerna bootstrap (usually in postinstall hook)
194+
if [ -f "lerna.json" ]; then
195+
echo "📦 Detected Lerna monorepo - install needed for lerna bootstrap"
196+
NEEDS_INSTALL=true
197+
fi
198+
199+
# 3. Postinstall hook: Any project with postinstall needs it executed
200+
# Examples: building native modules, generating files, running setup scripts
201+
if grep -q '"postinstall"' package.json 2>/dev/null; then
202+
echo "🔧 Detected postinstall hook - install needed to execute it"
203+
NEEDS_INSTALL=true
204+
fi
119205
fi
120206
121207
echo "needs-install=$NEEDS_INSTALL" >> $GITHUB_OUTPUT
122208
123-
- name: Log cache status
209+
- name: Log cache status and decision
124210
if: ${{ !env.ACT }}
125211
shell: bash
126212
run: |
213+
echo "=== 📊 Cache Status Summary ==="
127214
if [ "${{ steps.yarn-cache.outputs.cache-hit }}" == "true" ]; then
128215
echo "✅ Cache HIT - Dependencies restored from cache"
129216
echo "📦 Cache key: ${{ runner.os }}-${{ runner.arch }}-yarn-${{ hashFiles('**/yarn.lock', '**/package.json') }}"
217+
echo ""
130218
if [ "${{ steps.check-install-needed.outputs.needs-install }}" == "true" ]; then
131-
echo "🔗 Will run yarn install for workspace linking and/or postinstall hooks"
219+
echo "🔄 Decision: WILL run yarn install"
220+
echo "Reasons: Cache verification failed, workspace linking needed, or postinstall hooks detected"
132221
else
133-
echo "⚡ Skipping yarn install - not a monorepo and no postinstall hooks"
222+
echo "⚡ Decision: SKIP yarn install"
223+
echo "Reason: Cache verified and no workspace linking or postinstall hooks required"
134224
fi
135225
else
136-
echo "❌ Cache MISS - Installing dependencies"
226+
echo "❌ Cache MISS - Fresh installation required"
137227
echo "🔍 Looking for key: ${{ runner.os }}-${{ runner.arch }}-yarn-${{ hashFiles('**/yarn.lock', '**/package.json') }}"
228+
echo ""
229+
echo "🔄 Decision: WILL run yarn install"
138230
fi
231+
echo ""
139232
140233
- name: Install Node.js dependencies
141234
if: ${{ !env.ACT && (steps.yarn-cache.outputs.cache-hit != 'true' || steps.check-install-needed.outputs.needs-install == 'true') }}

0 commit comments

Comments
 (0)