Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 78 additions & 51 deletions scripts/migrate-candidates-to-v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,8 @@ for the architectural context.
`BEGIN; UPDATE * N; COMMIT;` — a single transaction that applies all
v2 blobs to the prod DB.

Two migration-time fallbacks are applied to recover currently-dead data
(neither changes any normalizer behavior; both classes of project are
unreadable in the prod UI today):
Three migration-time fallbacks are applied to recover currently-dead data
(all classes are unreadable in the prod UI today):

- **Sole-algo fallback** (~18 projects). Pre-2026 saves omit the
per-entry `algorithm` field AND the `results[0].search_meta.algorithm`
Expand All @@ -37,44 +36,63 @@ unreadable in the prod UI today):
`/orgs/WHO/sources/ICD-11-WHO/` (no version in URL) saved before
`ocl_issues#2522` removed the silent `'HEAD'` default. Use `'HEAD'` as
the migrate-time version so the projectContext builds.
- **Orphan-algo result-derived recovery** (`ocl_issues#2555`, 9 projects,
731 codes). When an entry's algorithm tag isn't in the project's current
`algorithms[]` (reconfigured / mistagged project), the normalizer can't look
up a concept_identity, so it derives one from the RESULT itself: if the
result's source is the project's target source → anchor to the target
canonical+version (formal, visible/recommendable); otherwise key on its own
generated source (correctly NON-target, so the UI's target filter keeps it
inert rather than mis-surfacing it). See `vendored/normalizers.js`
`reference_source: 'result'`.

## Runbook

The migration is read-only against the source data (dump + normalize)
until the final `psql -f migrate.sql` step. Everything before that can
be inspected, diffed, and re-run safely.

> **IMPORTANT — read before running (verified 2026-06-02, see
> `verification-report.md`):**
> 1. **`migrate.mjs` is self-contained.** It imports the normalizer from
> `./vendored/` (pinned to the validated `315e9b0` version), so
> `node migrate.mjs` runs from any checkout — including `main`, where the
> v2-only PR (`9f8f4b3`) deleted `normalizeLegacyAllCandidates` from `src/`.
> No special checkout or worktree needed.
> 2. **Dump with `dump-projects.sql`** (not the bare query below) so concept_keys
> use the formal source canonical (matching the live app) instead of a
> generated `ns.openconceptlab.org` URL. Also apply `fix-algorithm-canonicals.sql`
> (`--rw`) so the live app stays consistent on re-run. See `ocl_issues#2555`.
> 3. **Validate** every run with `node validate.mjs` (0 hard failures expected;
> review the orphan-tag drop list — currently ~9 reconfigured/mistagged
> projects, separate from this fix).
> 4. **Retain the backup permanently** (not just for the rollback window). It is
> the only recoverable copy of the orphan-tag-dropped raw results.
> 5. **Never re-dump + re-run `migrate.mjs` after applying** — it throws
> `TypeError: (candidates || []) is not iterable` on v2 data. (`migrate.sql`
> re-apply is safe/idempotent.)

### 1. Dry-run locally

Confirms current prod data shape against the live normalizer. Run any
time without coordination.

```bash
# From any workstation that can reach oclapi2 prod through ocl-psql:
mkdir -p scripts/migrate-candidates-to-v2/input

~/.ocl/bin/ocl-psql oclapi2 -t -A -c "
SELECT jsonb_build_object(
'id', id,
'name', name,
'target_repo_url', target_repo_url,
'owner_url', COALESCE(
'/orgs/' || (SELECT mnemonic FROM organizations WHERE id = organization_id) || '/',
'/users/' || (SELECT username FROM user_profiles WHERE id = user_id) || '/'
),
'algorithms', algorithms,
'candidates', candidates
)::text
FROM map_projects
WHERE candidates IS NOT NULL AND candidates::text NOT IN ('{}', '[]', 'null')
ORDER BY id;
" > scripts/migrate-candidates-to-v2/input/projects.ndjson

node scripts/migrate-candidates-to-v2/migrate.mjs
cd scripts/migrate-candidates-to-v2
mkdir -p input

# 1. Dump with resolved formal canonicals (read-only):
~/.ocl/bin/ocl-psql oclapi2 -t -A -f dump-projects.sql > input/projects.ndjson

# 2. Normalize to v2 (self-contained; runs from any checkout):
node migrate.mjs

# 3. Independently validate the transformation:
node validate.mjs
```

Inspect `scripts/migrate-candidates-to-v2/output/summary.json` and
`skipped.json`. Spot-check 3-4 generated `proj-<id>.v2.json` blobs
Inspect `output/summary.json`, `skipped.json`, and `output/validation-report.json`
(expect 0 hard failures). Spot-check 3-4 generated `proj-<id>.v2.json` blobs
against representative live projects:

- One bridge-heavy LOINC (e.g. proj 105, 130, 73 — Jussara / Top-200 / ARUP).
Expand All @@ -91,18 +109,20 @@ The migration window (~15-20 min) consists of:
```
T-0:00 Announce maintenance window (Slack/email to active users)
T-0:01 Block oclmap traffic at nginx (or 503 maintenance page)
T-0:02 Snapshot the table:
T-0:02 Snapshot the table (RETAIN PERMANENTLY, not just for rollback):
pg_dump --table=map_projects --data-only > map_projects.backup.sql
T-0:03 Re-dump the live candidates (in case a save landed since the dry run):
~/.ocl/bin/ocl-psql oclapi2 -t -A -c "<see Step 1 query>" \
> scripts/migrate-candidates-to-v2/input/projects.ndjson
T-0:04 Run migrate.mjs (regenerates v2 blobs + migrate.sql):
node scripts/migrate-candidates-to-v2/migrate.mjs
T-0:05 Apply the migration (single BEGIN/COMMIT transaction):
~/.ocl/bin/ocl-psql oclapi2 --rw \
-f scripts/migrate-candidates-to-v2/output/migrate.sql
T-0:06 Deploy the v2-only oclmap build to prod
T-0:15 Smoke test: open 4 representative projects (above) in the UI.
T-0:03 Fix the live algorithms config (formal canonicals; app re-run consistency):
~/.ocl/bin/ocl-psql oclapi2 --rw -f fix-algorithm-canonicals.sql
T-0:04 Re-dump the live candidates (in case a save landed since the dry run):
~/.ocl/bin/ocl-psql oclapi2 -t -A -f dump-projects.sql \
> input/projects.ndjson
T-0:05 Normalize to v2 + validate:
node migrate.mjs
node validate.mjs # confirm 0 hard failures
T-0:07 Apply the migration (single BEGIN/COMMIT transaction):
~/.ocl/bin/ocl-psql oclapi2 --rw -f output/migrate.sql
T-0:08 Deploy the v2-only oclmap build to prod
T-0:16 Smoke test: open 4 representative projects (above) in the UI.
Confirm candidates render correctly.
T-0:18 Unblock traffic
```
Expand All @@ -123,25 +143,32 @@ If smoke test fails on the v2-only oclmap:

## Sizes

Verified against the 2026-05-27 prod snapshot:
Verified against the 2026-06-02 prod snapshot:

- 68 projects with non-empty `candidates`
- v1 total: ~803 MB
- v2 total: ~1199 MB (+49% — structural expansion: bridge cascade targets
become explicit `bridge_child` Candidate + ConceptDefinition entries)
- Migration runtime: ~10 sec (Node) + ~30-60 sec (psql apply over the
tunnel)
- **69** projects with non-empty `candidates` (was 68 on 2026-05-27 — the set
drifts, so always re-dump immediately before the window)
- v1 total: ~860 MB · v2 total: ~1.2 GB (+~40% structural expansion)
- Migration runtime: ~10 sec (Node normalize) + ~30-60 sec (psql apply)

## Files

- `migrate.mjs` — the migration script (Node 22+, ESM).
- `migrate.mjs` — the migration script (Node 22+, ESM). Self-contained; runs from any checkout.
- `vendored/normalizers.js`, `vendored/conceptKey.js` — vendored normalizer (the
`315e9b0` version + the migration-only orphan-algo `reference_source: 'result'`
recovery, `ocl_issues#2555`), so `migrate.mjs` doesn't depend on `src/` (where the
v2-only PR deleted `normalizeLegacyAllCandidates`). Dir is `vendored/` rather than
`lib/` because the root `.gitignore` ignores `lib/`.
- `dump-projects.sql` — the dump query with resolved formal canonicals.
- `fix-algorithm-canonicals.sql` — one-time `--rw` fix of `algorithms` config so
the live app keys concepts on the same formal canonicals as the migrated data.
- `validate.mjs` — independent v1↔v2 equivalence + referential-integrity validator.
- `verification-report.md` — the pre-prod verification record (`ocl_issues#2555`).
- `README.md` — this file.
- `.gitignore` — excludes `input/`, `output/`, and `*.tmp` from version
control.
- `.gitignore` — excludes `input/`, `output/`, and `*.tmp` from version control.

## After the migration

This whole folder can be deleted once the migration has run cleanly in
prod and the v2-only oclmap is stable. The script imports
`normalizers.js`'s `normalizeLegacyAllCandidates` function, which itself
is deleted in the v2-only oclmap PR.
This whole folder (including `vendored/`) can be deleted once the migration
has run cleanly in prod and the v2-only oclmap is stable. `vendored/normalizers.js`
is the `315e9b0` normalizer (`normalizeLegacyAllCandidates`, no longer in `src/`)
plus the migration-only orphan-algo recovery (`reference_source: 'result'`).
83 changes: 83 additions & 0 deletions scripts/migrate-candidates-to-v2/dump-projects.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
-- ============================================================================
-- Migration dump (replaces the inline query in README §1). Produces
-- input/projects.ndjson for migrate.mjs, with formal source canonicals resolved
-- so the migrated concept_keys match what the live app generates.
-- ============================================================================
-- Self-contained: resolves canonicals inline so it works for dry-runs and the
-- real run alike, with NO dependency on fix-algorithm-canonicals.sql having run
-- first. It resolves three things:
-- (1) target_repo.canonical_url (migrate.mjs keys target_repo concepts on it)
-- (2) custom algo canonical_url (ICD-11 -> formal, overriding stray variants;
-- recovers custom algos that would otherwise drop)
-- (3) bridge algo bridge_repo.canonical_url (from the bridge source)
--
-- Rule: ICD-11 (any variant) -> http://id.who.int/icd/release/11/mms;
-- else source's registered canonical_url (CIEL/LOINC/SNOMED-GPS);
-- else NULL -> migrate.mjs generates the ns.openconceptlab.org URL (PIH).
--
-- (fix-algorithm-canonicals.sql applies the SAME custom/bridge resolution to the
-- live `algorithms` column so re-runs in the app stay consistent with the
-- migrated data; target_repo the app fetches live.)
--
-- ocl-psql oclapi2 -t -A -f dump-projects.sql > input/projects.ndjson
-- ============================================================================
WITH resolved AS (
SELECT DISTINCT ON (mp.id)
mp.id, mp.name, mp.target_repo_url, mp.algorithms, mp.candidates,
mp.organization_id, mp.user_id,
CASE
WHEN mp.target_repo_url ILIKE '%ICD-11%' THEN 'http://id.who.int/icd/release/11/mms'
ELSE s.canonical_url
END AS resolved_canonical
FROM map_projects mp
LEFT JOIN organizations o ON o.mnemonic = split_part(mp.target_repo_url,'/',3)
LEFT JOIN user_profiles u ON u.username = split_part(mp.target_repo_url,'/',3)
LEFT JOIN sources s ON s.mnemonic = split_part(mp.target_repo_url,'/',5)
AND s.version = 'HEAD'
AND (s.organization_id = o.id OR s.user_id = u.id)
WHERE mp.candidates IS NOT NULL AND mp.candidates::text NOT IN ('{}','[]','null')
ORDER BY mp.id
)
SELECT jsonb_build_object(
'id', id,
'name', name,
'target_repo_url', target_repo_url,
'owner_url', COALESCE(
'/orgs/' || (SELECT mnemonic FROM organizations WHERE id = organization_id) || '/',
'/users/' || (SELECT username FROM user_profiles WHERE id = user_id) || '/'
),
'algorithms', (
SELECT jsonb_agg(
CASE
-- custom: set/override canonical (ICD-11 forced to formal; others filled)
WHEN a.elem->>'type' = 'custom' AND resolved_canonical IS NOT NULL
AND (COALESCE(a.elem->>'canonical_url','') = '' OR target_repo_url ILIKE '%ICD-11%')
THEN a.elem || jsonb_build_object('canonical_url', resolved_canonical)
-- bridge: set bridge_repo.canonical_url from the bridge source when missing
WHEN a.elem->>'type' IN ('ocl-bridge','ocl-ciel-bridge')
AND COALESCE(a.elem->'bridge_repo'->>'canonical_url','') = ''
THEN a.elem || jsonb_build_object('bridge_repo', jsonb_build_object('canonical_url',
COALESCE(
(SELECT bs.canonical_url FROM sources bs
JOIN organizations bo ON bo.id = bs.organization_id
WHERE bs.mnemonic = split_part(COALESCE(NULLIF(a.elem->>'target_repo_url',''),'/orgs/CIEL/sources/CIEL/'),'/',5)
AND bo.mnemonic = split_part(COALESCE(NULLIF(a.elem->>'target_repo_url',''),'/orgs/CIEL/sources/CIEL/'),'/',3)
AND bs.version='HEAD'
LIMIT 1),
'https://ns.openconceptlab.org' || COALESCE(NULLIF(a.elem->>'target_repo_url',''),'/orgs/CIEL/sources/CIEL/')
)))
ELSE a.elem
END
ORDER BY a.ord
)
FROM unnest(algorithms) WITH ORDINALITY AS a(elem, ord)
),
'candidates', candidates,
'target_repo', CASE
WHEN resolved_canonical IS NOT NULL
THEN jsonb_build_object('canonical_url', resolved_canonical)
ELSE NULL
END
)::text
FROM resolved
ORDER BY id;
73 changes: 73 additions & 0 deletions scripts/migrate-candidates-to-v2/fix-algorithm-canonicals.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
-- ============================================================================
-- PROD PRE-STEP (run BEFORE the v1->v2 candidates migration): canonical fix
-- ============================================================================
-- Sets the formal canonical_url on map_projects.algorithms so BOTH the live app
-- and the migration produce identical, formal concept_keys. Without this, custom
-- ICD-11 algos drop entirely (no identity) and LOINC/CIEL/ICD-11 concepts split
-- between the formal canonical and a generated ns.openconceptlab.org URL.
--
-- Rules (per @paynejd 2026-06-02):
-- * ICD-11 (any variant, incl. ICD-11-WHO-Agent) -> http://id.who.int/icd/release/11/mms
-- * else -> the target/bridge source's registered canonical_url (CIEL, LOINC, SNOMED-GPS)
-- * else (no registered canonical, e.g. PIH) -> leave unset -> app/migration generate ns URL
--
-- WHY this is needed (not just the migration dump): the live app reads custom
-- algo canonical from algorithms[].canonical_url and bridge canonical from
-- algorithms[].bridge_repo.canonical_url (it does NOT fetch the bridge source).
-- target_repo, by contrast, the app fetches live, so the migration dump resolves
-- that one (see dump-projects.sql) and no config change is needed for it.
--
-- Validated read-only against prod 2026-06-02: 51 custom algos -> formal,
-- bridge -> CIELterminology.org, built-in types untouched.
--
-- SAFETY: single transaction; idempotent (re-running is a no-op on already-fixed
-- algos). Take the standard map_projects backup first. Apply with:
-- ocl-psql oclapi2 --rw -f fix-algorithm-canonicals.sql
-- ============================================================================
BEGIN;

UPDATE map_projects mp
SET algorithms = (
SELECT array_agg(
CASE
-- custom algos: force ICD-11 to formal (overrides stray '/icd11/mms');
-- fill others from the project target source canonical.
WHEN elem->>'type' = 'custom' AND r.resolved IS NOT NULL
AND (COALESCE(elem->>'canonical_url','') = '' OR mp.target_repo_url ILIKE '%ICD-11%')
THEN elem || jsonb_build_object('canonical_url', r.resolved)
-- bridge algos: set bridge_repo.canonical_url from the bridge source when absent.
WHEN elem->>'type' IN ('ocl-bridge','ocl-ciel-bridge')
AND COALESCE(elem->'bridge_repo'->>'canonical_url','') = ''
THEN elem || jsonb_build_object('bridge_repo', jsonb_build_object('canonical_url',
COALESCE(
(SELECT bs.canonical_url FROM sources bs
JOIN organizations bo ON bo.id = bs.organization_id
WHERE bs.mnemonic = split_part(COALESCE(NULLIF(elem->>'target_repo_url',''),'/orgs/CIEL/sources/CIEL/'),'/',5)
AND bo.mnemonic = split_part(COALESCE(NULLIF(elem->>'target_repo_url',''),'/orgs/CIEL/sources/CIEL/'),'/',3)
AND bs.version='HEAD' LIMIT 1),
'https://ns.openconceptlab.org' || COALESCE(NULLIF(elem->>'target_repo_url',''),'/orgs/CIEL/sources/CIEL/')
)))
ELSE elem
END ORDER BY ord
)
FROM unnest(mp.algorithms) WITH ORDINALITY AS t(elem, ord)
)
FROM (
SELECT mp2.id,
CASE WHEN mp2.target_repo_url ILIKE '%ICD-11%'
THEN 'http://id.who.int/icd/release/11/mms'
ELSE s.canonical_url END AS resolved
FROM map_projects mp2
LEFT JOIN organizations o ON o.mnemonic = split_part(mp2.target_repo_url,'/',3)
LEFT JOIN sources s ON s.mnemonic = split_part(mp2.target_repo_url,'/',5)
AND s.version='HEAD' AND s.organization_id = o.id
) r
WHERE mp.id = r.id
AND EXISTS (
SELECT 1 FROM unnest(mp.algorithms) e
WHERE e->>'type'='custom'
OR (e->>'type' IN ('ocl-bridge','ocl-ciel-bridge')
AND COALESCE(e->'bridge_repo'->>'canonical_url','')='')
);

COMMIT;
8 changes: 7 additions & 1 deletion scripts/migrate-candidates-to-v2/migrate.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,15 @@ import { createInterface } from 'node:readline'
import { fileURLToPath } from 'node:url'
import { dirname, join, resolve } from 'node:path'

// Self-contained: the normalizer is vendored into ./vendored/ (pinned to the
// 315e9b0 version this migration was written + validated against) so the
// script runs from any checkout, including main where the v2-only PR
// (9f8f4b3) deleted normalizeLegacyAllCandidates from src/. The whole folder
// is deleted after the one-shot prod migration. (Not ./lib/ — root .gitignore
// ignores lib/.)
import {
normalizeLegacyAllCandidates
} from '../../src/components/map-projects/normalizers.js'
} from './vendored/normalizers.js'

const __dirname = dirname(fileURLToPath(import.meta.url))
const INPUT = resolve(process.env.INPUT || join(__dirname, 'input/projects.ndjson'))
Expand Down
Loading