docs: Add DSQL loader operations reference by amaksimo · Pull Request #176 · awslabs/agent-plugins

amaksimo · 2026-05-27T23:25:43Z

Add a comprehensive data-loading reference for the aurora-dsql-loader, covering:

Fresh-vs-warm partition behavior and throughput expectations
Resume/retry mechanics (--manifest-dir, --resume-job-id, --keep-manifest)
Conflict handling with --on-conflict do-nothing
CSV/TSV header handling (--header flag)
Schema inference caveats and --dry-run validation
Index count impact on throughput
Diagnostic decision tree for slow loads

Also adds:

Workflow 3 (Bulk Data Loading) to SKILL.md
Data loading trigger keywords to skill description
Trigger evals for data loading scenarios
Cross-reference from connectivity-tools.md

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

amaksimo · 2026-05-28T22:30:54Z

Functional Eval Results (with-skill, live run)

Ran evals 10-12 with the skill loaded. All 11/11 expectations pass.

Eval	Scenario	Pass Rate	Key Behaviors Verified
10	Loader stuck at 3K rec/s	4/4 ✅	Identifies partition-constrained fresh table, explains warming, does NOT recommend more workers
11	Loader crash, lost manifest	4/4 ✅	Identifies /tmp tmpfs, recommends --on-conflict do-nothing recovery, --manifest-dir prevention
12	Header row parse error	3/3 ✅	Identifies missing --header flag, explains default behavior, gives correct fix

The skill teaches DSQL-loader-specific operational knowledge (partition warming, tmpfs defaults, header flag semantics) that the agent cannot infer from general training data.

anwesham-lab · 2026-05-29T18:46:23Z

Code review

DSQL skill PR adds references/data-loading.md (242 lines) + Workflow 3 (Bulk Data Loading) + 3 functional evals + 5 trigger evals. The reference content is well-structured, the SKILL.md routing is coherent, the cross-ref to connectivity-tools.md#data-loading-tools resolves cleanly, and the v3.0.0 --header default-flip is accurately documented. CI fully green at head 4161f369b4927e749153fa28621e44b1251da5d5 (24 checks). One blocking concern (workflow-number drift); rest are content tightening.

20-agent fleet ran per dsql-skill-author Workflow 2. Findings post 5-gate validation:

#	Confidence	Area	Finding	Suggestion
1	95	`references/dsql-lint.md#L70-L75` + `#L104-L108` — correctness / cross-ref drift	The new Workflow 3 (Bulk Data Loading) shifts every later workflow by +1 (3→4, 4→5, …, 8→9). SKILL.md updates its own self-reference at L111 ("Workflow 9 Phase 0") but `dsql-lint.md` still says "Workflow 6's confirmation gate" (L72) and "Use Table Recreation Pattern — see …and Workflow 6" (L106). Table Recreation is now Workflow 7, not 6. The agent will follow the wrong cross-reference for the destructive-DDL confirmation gate. Independently flagged by 4 sub-agents (slots 2, 3, 12, 18).	Update both `dsql-lint.md` references from "Workflow 6" to "Workflow 7". Also see #2 — same renumber issue affects the eval files. Long-term: cross-reference workflows by name (e.g., "the Table Recreation Pattern") rather than by number, so future inserts don't silently desync.
2	90	`tools/evals/databases-on-aws/README.md#L18-L25` + `#L156-L160`, `query_explainability_evals.json#L9-L11`, `run_query_explainability_evals.py#L191-L211` — eval / cross-ref drift	Five additional places still say "Workflow 8" for query plan explainability, which is now Workflow 9: README L20 (`# Workflow 8: query plan diagnostics`), L23 (`# Runner/grader for Workflow 8`), L158 (heading "Query Plan Explainability Functional Evals (Workflow 8)"), `query_explainability_evals.json` L10 (assertion text "Reads all four Workflow 8 reference files…"), and `run_query_explainability_evals.py` L192-193, L210 (comments + `evidence = "All four Workflow 8 references loaded"`). The eval still passes (the L10 assertion is descriptive evidence text, not gate logic), but the eval narrative misnames the workflow.	Bulk-replace "Workflow 8" → "Workflow 9" across all 5 sites, OR rename to slug ("Query Plan Explainability") to break the dependence on the integer.
3	85	`references/data-loading.md#L141-L164` — silent failure / guidance gap	The "Schema Inference Caveats" section lists three real failure modes (mixed nullability → TEXT, ZIP/phone leading-zero loss, non-ISO date → TEXT) but never tells the reader these produce a successful load with no error or warning. The agent treats `--dry-run` as optional and may dismiss the section as "the loader will fall back to TEXT, that's fine." Compare against the "Symptoms of a missing `--header`" section which explicitly states "the most common source of failure" — header errors are loud, schema-inference errors are silent, but the doc treats them with the same weight.	Add an explicit callout at the top of "Schema Inference Caveats": "These produce successful loads with no error or warning — verify via `--dry-run` against any new table." Mirror the structure of the `--header` "Symptoms" subsection.
4	80	`references/data-loading.md#L95-L108` + `#L226-L234` — silent failure / `--on-conflict` semantics	The `--on-conflict do-nothing` section says it skips rows "whose primary key already exists" and lists "typically a primary key" as the unique-constraint requirement. PostgreSQL `ON CONFLICT DO NOTHING` without a target catches any unique-constraint violation, not just the PK. If the target has additional UNIQUE indexes (email, slug, external_id), rows that conflict on those will silently drop. Worse, the recovery path at L226-234 (manifest gone → re-run with `--on-conflict do-nothing`) silently keeps old values if the source has been corrected/updated since the original run — no error, user thinks recovery succeeded.	Add: (a) explicit statement that `do-nothing` triggers on any unique constraint, not just the PK; (b) precondition for the recovery path: "safe only if the source has not changed since the original run; if rows were updated, `do-nothing` keeps the old values." Cross-link to the idempotency requirement in the conflict-handling section.
5	75	`references/data-loading.md#L132` + `#L170-L172`, `#L32` — rot risk / unsourced quantitative claims	Three pinned numbers without citation: (a) `--batch-size (typically 2000)` — pinned to a loader default that may change; (b) "~2× slower with 15 indexes vs 3" + "1 + num_indexes index-entry writes" formula — no source corroborates this in `auth/scaling-guide.md` or upstream loader docs; (c) "fresh table absorbs roughly 3-4K rec/s from a single client" — not in `scaling-guide.md` or DSQL public docs. The 3K rec/s number is even hard-coded into eval 10's grader, so future DSQL improvements that raise the ceiling would mark a correct agent answer wrong.	(a) Drop "(typically 2000)" — say "equal to your `--batch-size`" without the literal. (b) Either cite the source or hedge ("noticeably slower with many indexes"). (c) Soften eval 10 to credit "a few thousand rec/s, partition-constrained" rather than the literal "3K". Hard-coded magic numbers in agent-facing prose decay fastest.
6	75	`tools/evals/databases-on-aws/dsql/data_loading_eval_results.md#L4-L5` — eval rigor / baseline not actually run	The eval-results doc is internally contradictory: line 5 says "agent run with skill loaded vs. agent run without skill (subagent invocation with skill content loaded)" but immediately downgrades baseline to "represents typical LLM behavior without DSQL-loader-specific training data." No `iteration-N/without_skill/{transcript.md, grading.json}` artifacts ship in the diff — only the markdown summary. The dsql-skill-author harness (`eval-harnesses.md`) requires both with-skill and without-skill be subagent runs, with stored transcripts. The "baseline FAIL (2 errors)" verdicts in the table are author-estimated, not measured. The PR comment claiming "11/11 expectations pass" is correct for with-skill (4+4+3=11 verified) but the comparison delta is fictional.	Either (a) commit the actual baseline subagent transcript+grading per iteration under `iteration-N/without_skill/`, OR (b) reword the doc to remove the implied empirical comparison — say "expected baseline behavior" rather than "FAIL (2 errors)". For non-trivial changes like this, also add agent-efficacy benchmark coverage (`AxdbLLMsBenchmarkingExperiments` `CodingTask` + `checks.py` rules) — the skill prescribes both harnesses for substantive additions.
7	70	`tools/evals/databases-on-aws/README.md#L82` — count drift	README says "12 eval prompts, 43 assertions total." Actual count at head is 12 prompts and 42 assertions (verified: `4+4+4+4+4+2+2+4+3+4+4+3 = 42`).	Update "43" → "42".
8	70	`SKILL.md#L3` — trigger / description bloat	`description` grew from 539 to 644 chars (+19.5%) by adding 5 partly-redundant trigger phrases: `aurora-dsql-loader`, `bulk load DSQL`, `DSQL data loading`, `load CSV into DSQL`, plus `load data` in the capability sentence. Three of four contain "DSQL" verbatim, so disambiguation risk is contained, and the new should-not-trigger evals (RDS/Redshift COPY) are well-targeted defenses — but the description is now keyword-stuffed past the agentskills.io ~250-300 char guidance, and trigger-match scoring tends to dilute at this length. Same author received the same feedback on PR #157 (anwesham-lab review threads on redundant trigger phrases).	Drop `bulk load DSQL` and `DSQL data loading` (covered by the `bulk data loading` capability phrase + existing `Aurora DSQL` triggers). Keep `aurora-dsql-loader` and `load CSV into DSQL`.
9	70	`SKILL.md#L218-L225` + `references/data-loading.md#L56` + `#L118` — authoring style / RFC 2119 inconsistency	The rest of the skill uses bolded MUST / SHOULD / MAY densely; the new content uses softer "(strongly recommended)" / "Always set this explicitly" / "If your file has a header row, pass `--header`". Per `dsql-skill-author/authoring-style.md`, RFC 2119 keywords should be bolded and unambiguous — and the doc itself argues `--manifest-dir` and `--header` are critical. Same author was flagged for the same RFC 2119 lapse in past reviews on PR #157 and PR #168.	data-loading.md L56: "(strongly recommended)" → "(MUST set explicitly)". L118: "pass `--header`" → "MUST pass `--header`". connectivity-tools.md L66 "ALWAYS use the loader's schema inference" — bold and qualify, since data-loading.md's three-failure-mode caveats contradict an unqualified ALWAYS.
10	65	PR description / commit body — missing test plan + missing migration callout	PR body has no test plan section, no link to the `data_loading_eval_results.md` artifact added in the same diff, and no callout that Workflow 3-7 → 4-8 + Workflow 8 → 9 renumbering may break external cross-references (per #1, #2 above, this is real). The author's separate comment posts eval results, but a reviewer reading only the body cannot tell what was verified.	Add to PR body: (a) "Test plan" section listing trigger evals, functional evals 10-12, and link to `data_loading_eval_results.md`; (b) "Breaking" callout: "Workflow 3-8 renumbered to 4-9 — see findings #1, #2 for stale references that need follow-up."

Items considered and dropped (audit trail)

Slot 11 claimed plugins/sagemaker-ai/README.md and tools/markdownlint-skill-length.cjs were in the diff — REFUTED. git diff origin/main...HEAD --name-only shows only the 7 expected files. Slot 11 hallucinated; finding dropped.
Slot 12 description-length numbers (391→559) — corrected by slot 10 to 539→644 chars. Direction stands; finding folded into chore(docs): update documentation #8.
Slot 7 "Legacy behavior (v2.x)" rot warning — fair point but the v2/v3 framing is still useful for migrating users; not blocking.
Slot 8 "drop the TOC" simplification — TOC is useful for a 242-line reference file; author followed convention from PR feat(dsql): Add PostgreSQL schema conversion and migration references #168 review feedback (TOCs required for files >150 lines). Drop.
Slot 18 "Part of [DSQL Development Guide]" header convention drift — minor, not blocking.
Slot 13 multi-tenant scoping note + /var/lib/dsql-loader/manifests permissions note — both nice-to-have but not blocking; loader is bulk-import, not query, and the directory permission angle is a thin attack surface.
Slot 14's deletion of "PostgreSQL Compatibility" + "CloudFormation Resource" links from SKILL.md — minor scope drift that's defensible as part of the description-tightening rewrite.
Slot 9 baseline-rigor concern — covered by finding chore(deps): update github-actions: Bump jdx/mise-action from 2 to 3 #6.
Slot 17 second-person hedging in data-loading.md — minor, acceptable tone for descriptive-empirical content.

🤖 Generated with Claude Code — 20-agent fleet per dsql-skill-author Workflow 2 §1 roster, all findings 5-gate validated at head SHA 4161f369b4927e749153fa28621e44b1251da5d5. Confidence threshold ≥ 60.

_{If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Add references/data-loading.md covering aurora-dsql-loader operations: - Fresh-vs-warm partition behavior and throughput expectations - Resume/retry mechanics (--manifest-dir, --resume-job-id) - Conflict handling (--on-conflict do-nothing) - CSV/TSV header handling (--header flag, v3.0.0 default) - Schema inference caveats and --dry-run validation - Index count impact on throughput - Diagnostic decision tree for slow loads SKILL.md changes: - Add Workflow 3: Bulk Data Loading with key constraints - Add data loading to overview and Quick Start - Add trigger keywords (aurora-dsql-loader, bulk load, etc.) - Add data-loading.md reference entry with When/Contains - Add cross-reference from connectivity-tools.md Eval coverage: - 3 should-trigger + 2 should-not-trigger entries in trigger_evals.json - 3 functional evals (IDs 10-12) in evals.json with LLM judge grading - data_loading_eval_results.md with expected with-skill vs baseline comparison demonstrating the skill teaches operational knowledge not in general training data (partition warming, tmpfs defaults, header flag semantics)

- Condense data-loading.md from 242 to 166 lines (remove verbose explanations per reviewer feedback) - Add RFC 2119 directives (MUST/SHOULD) for --manifest-dir, --header, --on-conflict preconditions, and schema inference validation - Add silent-failure callout for schema inference caveats - Fix --on-conflict semantics: triggers on any unique constraint, not just PK; add precondition that source must not have changed for crash recovery - Rename 'Going Deeper' header to 'When to load the full reference' with clear agent trigger condition - Fix 'slower than expected' to 'slow load times' in SKILL.md - Trim redundant description triggers (bulk load DSQL, DSQL data loading) - Fix workflow-number drift: Workflow 6 → 7 in dsql-lint.md, Workflow 8 → 9 in README/evals/runner - Fix README assertion count 43 → 42 - Remove standalone 'Related References' section (cross-ref inlined at top) - Add eval results with baseline vs with-skill comparison

- Partition behavior: replace bullet list with summary paragraph + link to published DSQL primary keys documentation - Common pitfall: rewrite as 'Agent guidance' with explicit action (advise user to check for duplicate PKs, recommend de-duplication) - Index section: remove explanatory preamble, keep only the two actionable bullets

- Reconcile SKILL.md against main: keep main's structure (awslabs#176 added Workflow 3 Bulk Data Loading), add PR 168's PostgreSQL Migrations / ORM Guides / OCC Retry references and renumber the new workflows to 10 (Full PG to DSQL Schema Migration) and 11 (ORM Migration). - Restore the correct MCP tool name `dsql_lint` (with underscore) across SKILL.md, dsql-lint.md, development-guide.md, and the new pg-migrations/* files. The PR had renamed it to `dsql-lint` (dash), which does not match the registered tool. - Bring main's data-loading.md and other merge changes into the branch.

amaksimo requested review from a team, krokoko, scottschreckengaust and theagenticguy May 27, 2026 23:25

amaksimo requested review from a team as code owners May 27, 2026 23:25

amaksimo requested review from Benjscho, Morlej, anwesham-lab, gxjx-x, jaichabria, pkale and praba2210 May 27, 2026 23:25

amaksimo marked this pull request as draft May 27, 2026 23:30

amaksimo changed the title ~~Add DSQL loader operations reference~~ docs: Add DSQL loader operations reference May 28, 2026

amaksimo marked this pull request as ready for review May 28, 2026 22:13

amaksimo assigned anwesham-lab May 28, 2026

amaksimo force-pushed the improve-dsql-loader-docs branch 2 times, most recently from 90abd0d to 4161f36 Compare May 28, 2026 22:30

amaksimo mentioned this pull request May 28, 2026

feat(dsql): Add PostgreSQL schema conversion and migration references #168

Open

anwesham-lab reviewed May 29, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/auth/connectivity-tools.md Outdated

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

Comment thread plugins/databases-on-aws/skills/dsql/SKILL.md Outdated

anwesham-lab self-requested a review May 29, 2026 18:44

anwesham-lab reviewed May 29, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

anwesham-lab reviewed May 29, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

anwesham-lab force-pushed the improve-dsql-loader-docs branch from 4161f36 to d093231 Compare May 29, 2026 18:55

anwesham-lab reviewed May 29, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

amaksimo force-pushed the improve-dsql-loader-docs branch from 5687cab to 3e1d908 Compare May 29, 2026 19:20

amaksimo force-pushed the improve-dsql-loader-docs branch from 3e1d908 to 86ec0da Compare May 29, 2026 19:22

anwesham-lab reviewed Jun 1, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

amaksimo force-pushed the improve-dsql-loader-docs branch from b6a44e2 to 385b469 Compare June 1, 2026 17:59

anwesham-lab approved these changes Jun 1, 2026

View reviewed changes

amaksimo enabled auto-merge June 1, 2026 18:07

krokoko approved these changes Jun 1, 2026

View reviewed changes

amaksimo added this pull request to the merge queue Jun 1, 2026

Merged via the queue into main with commit aac1f4c Jun 1, 2026
24 checks passed

amaksimo deleted the improve-dsql-loader-docs branch June 1, 2026 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add DSQL loader operations reference#176

docs: Add DSQL loader operations reference#176
amaksimo merged 3 commits into
mainfrom
improve-dsql-loader-docs

amaksimo commented May 27, 2026 •

edited

Loading

Uh oh!

amaksimo commented May 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anwesham-lab commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amaksimo commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amaksimo commented May 28, 2026

Functional Eval Results (with-skill, live run)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anwesham-lab commented May 29, 2026

Code review

Items considered and dropped (audit trail)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amaksimo commented May 27, 2026 •

edited

Loading