Skip to content

Commit 2cc92c3

Browse files
feat: add doccano-django sample (keploy postgres-v3 simple-Query bind regression)
Minimum reproducer for the polymorphic-resourcetype failure that motivated keploy/integrations#177. Wraps doccano v1.8.5 + django-rest-polymorphic + postgres 13.3-alpine — the same shape the bug originally surfaced on (keploy/enterprise PRs #1889 / #1964, pipelines 3556 / 3572). Per the keploy-ci-debug skill, the sample owns ALL orchestration the lane scripts in keploy/integrations and keploy/enterprise need: the docker-compose, the admin-bootstrap flow, the API traffic loop, the noise filter (via keploy.yml.template), and a coverage-report helper. Future lanes that exercise the same backend re-use this directory; they don't redefine compose / bootstrap / traffic in their own scripts. The intent is to migrate enterprise/.ci/scripts/doccano-linux.sh from its current ~400-line inlined-everything shape down to a thin "clone sample → wrap in keploy → assert" wrapper in a follow-up PR. Layout: * `Dockerfile` — `FROM doccano/doccano:backend`. Wrapper exists so a future doccano patch (or a backport of an upstream fix that changes the bug-triggering shape) is a one-line edit here, not scattered across lane scripts. * `docker-compose.yml` — postgres + doccano backend on a fixed subnet, every name fully env-driven (DOCCANO_BACKEND_CONTAINER / DOCCANO_DB_CONTAINER / DOCCANO_APP_PORT / DOCCANO_DB_IP / DOCCANO_NETWORK_SUBNET). Lane scripts running multiple matrix cells in parallel pass per-cell values so the cells don't collide on container names. Two-phase boot (DOCCANO_SKIP_BOOTSTRAP=0 → migrations + admin; named volume retained; DOCCANO_SKIP_BOOTSTRAP=1 → gunicorn-only against the populated volume) so record/replay see a deterministic state. * `flow.sh` — four subcommands: bootstrap — log in as admin, install the deterministic authtoken_token row so record-time and replay-time Authorization headers match. record-traffic — drive the API: 16-call /v1/me warmup hammer (gunicorn worker contenttypes-cache warmup, necessary for the SIGINT-driven shutdown pattern lanes use), POST a polymorphic TextClassificationProject, GET / PATCH it, plus dependent category-types / examples / categories / metrics reads that exercise the multi-bind django_content_type lookups the fix targets. Fire-and-forget; keploy is the assertion layer at replay. coverage — walk the running backend's URL resolver (introspecting actual served methods, not Django's permissive http_method_names default) and the just-recorded keploy/test-set-* tests; emit a (method, path) coverage percentage for the v1/projects + accessory surface. list-routes — print the route table the coverage report uses as its denominator (diagnostic). * `keploy.yml.template` — globalNoise filter for the inherently non-deterministic fields (Date/Expires headers, created_at/ updated_at body fields). Centralised here so a future doccano version that adds another auto-timestamp field is one edit rather than a fan-out across lane scripts. Lane scripts envsubst this template into the per-cell run dir. * `README.md` — bug shape, local-run instructions, lane pointers. Sample is keploy-independent: `docker compose up && bash flow.sh bootstrap && bash flow.sh record-traffic` works against bare doccano. Verified locally: 25/25 calls return expected status, polymorphic resourcetype is `TextClassificationProject` end-to-end. The route walker emits 144 (method, path) pairs for the v1/projects + /v1/me + /v1/users + /v1/health + /v1/auth surface; coverage matching against synthetic recorded tests rounds correctly. Lanes that pin to this sample (pinned to the feat/doccano-django-sample branch via --branch until this PR merges): * keploy/integrations `.woodpecker/doccano-postgres.yml` — three-way matrix (record-build × replay-build, record-latest × replay-build, record-build × replay-latest); depends_on prepare-and-run. * keploy/enterprise `.woodpecker/doccano-linux.yml` — being migrated to consume this sample in a follow-up PR; today still uses inline compose generation. Signed-off-by: Akash Kumar <meakash7902@gmail.com>
1 parent 57856de commit 2cc92c3

5 files changed

Lines changed: 641 additions & 0 deletions

File tree

doccano-django/Dockerfile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Thin wrapper around doccano's official backend image at the version
2+
# this sample tracks. Pinning here (rather than in each lane script
3+
# under keploy/integrations / keploy/enterprise) means a future
4+
# doccano release that changes the bug-triggering shape is a one-line
5+
# retag in this repo, not a hunt across the CI tree.
6+
#
7+
# Upstream tag: doccano/doccano:backend (the rolling backend tag)
8+
# Source pin: doccano/doccano @ v1.8.5
9+
# https://github.com/doccano/doccano/releases/tag/v1.8.5
10+
#
11+
# v1.8.5 was the version exercised on keploy/enterprise pipeline 3556
12+
# (PR #1889) and pipeline 3572 (PR #1964 minimal repro) where the
13+
# bug originally manifested.
14+
FROM doccano/doccano:backend

doccano-django/README.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# doccano-django — keploy postgres-v3 simple-Query bind regression sample
2+
3+
Minimal reproducer for the doccano polymorphic-resourcetype failure
4+
that motivated [keploy/integrations#177](https://github.com/keploy/integrations/pull/177)
5+
("fix(postgres-v3): extract simple-Query literals into bindValues").
6+
7+
The sample wraps doccano (Django + django-rest-polymorphic + psycopg2)
8+
at version `v1.8.5` against postgres `13.3-alpine`. The shape under
9+
test: a polymorphic Django model (`Project` with subclass
10+
`TextClassificationProject`) created over the REST API and re-read via
11+
DRF's polymorphic queryset. Without the integrations fix, every
12+
`SELECT … FROM django_content_type WHERE app_label = $1 AND model = $2`
13+
at replay returns the same recorded mock (the matcher's
14+
`pickSessionFallback` FIFO-collapses every variant onto the first
15+
recording when the bind signature is empty), so the polymorphic
16+
serializer can't resolve the project's subclass and `resourcetype`
17+
flips from `"TextClassificationProject"` to `"Project"`.
18+
19+
The bug is in keploy's recorder + replayer simple-Query path; doccano
20+
is just a vehicle. Same pattern would reproduce on any Django app
21+
that:
22+
23+
* Uses a polymorphic ORM (django-polymorphic / django-rest-polymorphic).
24+
* Sends parameterised reads via psycopg2's simple-Query mode
25+
(literals interpolated into the SQL text rather than carried in a
26+
separate Bind packet).
27+
* Exercises the polymorphic queryset across multiple HTTP requests
28+
against the same recorded backend.
29+
30+
## What's in here
31+
32+
* `Dockerfile` — thin wrapper around `doccano/doccano:backend` pinning
33+
the upstream version this sample tracks. Future doccano releases
34+
that change the bug-triggering shape are addressed by retagging
35+
here, not by scattering version pins across the lane scripts in
36+
`keploy/integrations` / `keploy/enterprise`.
37+
* `docker-compose.yml` — the orchestration: postgres-13 alongside
38+
the doccano backend, on a fixed subnet so the lane scripts can
39+
rely on stable IPs across record/replay phases.
40+
* `flow.sh` — the minimum reproducer traffic, ~10 HTTP calls. POST
41+
`/v1/projects` (creates a `TextClassificationProject`), then GET
42+
list / GET single / PATCH single / a few dependent reads. The
43+
GET / PATCH responses are what diverge under the bug — POST
44+
passes either way because the in-memory subclass instance shapes
45+
the response without consulting the DB.
46+
* `keploy.yml.template` — keploy config skeleton (proxy port, DNS
47+
port, container name placeholders) that lane scripts in
48+
`keploy/integrations` and `keploy/enterprise` `envsubst` into a
49+
per-job copy.
50+
51+
## Running locally
52+
53+
```sh
54+
# Bring doccano up + bootstrap the admin token (one-shot; the volume
55+
# is reused for the actual record run).
56+
docker compose up -d
57+
./flow.sh bootstrap
58+
59+
# Record
60+
keploy record \
61+
-c "docker compose up" \
62+
--container-name doccano_backend \
63+
--proxy-port 18081 --dns-port 18082
64+
65+
# (in another shell, while keploy record is up)
66+
./flow.sh record-traffic
67+
# → SIGINT keploy when traffic returns
68+
69+
# Replay
70+
keploy test \
71+
-c "docker compose up" \
72+
--containerName doccano_backend \
73+
--apiTimeout 60 --delay 20 \
74+
--proxy-port 18081 --dns-port 18082
75+
```
76+
77+
Expected outcome with the integrations fix in place: 0 failures,
78+
all `is_text_project: true` / `resourcetype: "TextClassificationProject"`
79+
across the project-read responses.
80+
81+
Expected outcome **without** the fix: tests covering GET-after-POST
82+
project reads fail with `is_text_project: true → false` and
83+
`resourcetype: "TextClassificationProject" → "Project"`.
84+
85+
## CI lanes that consume this sample
86+
87+
* `keploy/integrations``.woodpecker/doccano-postgres.yml` /
88+
`.ci/scripts/python/doccano/doccano-linux.sh`. Three-way matrix
89+
(record-build × replay-build, record-latest × replay-build,
90+
record-build × replay-latest) — the cross-binary cells stay red
91+
until both keploy releases pick up the bind-extraction fix.
92+
* `keploy/enterprise``.woodpecker/doccano-linux.yml` /
93+
`.ci/scripts/doccano-linux.sh`. Same three-way matrix wired to
94+
the enterprise compat-matrix harness.
95+
96+
Both clone this directory at the branch / tag pinned by the
97+
respective lane script.
98+
99+
## Related
100+
101+
* [keploy/integrations#177](https://github.com/keploy/integrations/pull/177) — the fix this sample falsifies.
102+
* [keploy/enterprise#1889](https://github.com/keploy/enterprise/pull/1889) — original failing PR where the bug surfaced.
103+
* [django-rest-polymorphic](https://github.com/apirobot/django-rest-polymorphic) — the upstream library whose serialisation path the bug breaks.

doccano-django/docker-compose.yml

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# doccano-django sample compose. Postgres + doccano backend on a
2+
# fixed subnet so the lane scripts in keploy/integrations and
3+
# keploy/enterprise can pin the DB IP without runtime discovery.
4+
#
5+
# Two-phase boot pattern (used by the lane scripts but valid for
6+
# local runs too):
7+
#
8+
# 1. DOCCANO_SKIP_BOOTSTRAP=0 → backend runs migrations, creates
9+
# the admin user, sets the auth token; once that returns we
10+
# `compose down` the stack but keep the named volume so the
11+
# DB state persists.
12+
# 2. DOCCANO_SKIP_BOOTSTRAP=1 → backend re-launches in
13+
# gunicorn-only mode against the populated volume; recording /
14+
# replay run against this incarnation.
15+
#
16+
# The split is what gives the lane a deterministic DB starting state
17+
# without paying the migration cost on every record/replay invocation.
18+
services:
19+
backend:
20+
build:
21+
context: .
22+
dockerfile: Dockerfile
23+
# Every name is env-driven so multiple matrix cells can run in
24+
# parallel on the same docker daemon without colliding. Lane
25+
# scripts that spin up doccano per-cell pass cell-scoped values
26+
# (e.g. DOCCANO_BACKEND_CONTAINER=doccano_backend_<cell-slug>).
27+
# Local single-runs use the shorter defaults.
28+
container_name: ${DOCCANO_BACKEND_CONTAINER:-doccano_backend}
29+
init: true
30+
stop_grace_period: 5s
31+
ports:
32+
- "${DOCCANO_APP_PORT:-18080}:8000"
33+
environment:
34+
ADMIN_USERNAME: ${DOCCANO_ADMIN_USER:-admin}
35+
ADMIN_PASSWORD: ${DOCCANO_ADMIN_PASSWORD:-password}
36+
ADMIN_EMAIL: ${DOCCANO_ADMIN_EMAIL:-admin@example.com}
37+
DATABASE_URL: postgres://doccano:doccano@${DOCCANO_DB_IP:-172.34.0.10}:5432/doccano?sslmode=disable
38+
ALLOW_SIGNUP: "False"
39+
DEBUG: "False"
40+
DJANGO_SETTINGS_MODULE: config.settings.production
41+
DOCCANO_SKIP_BOOTSTRAP: "${DOCCANO_SKIP_BOOTSTRAP:-0}"
42+
depends_on:
43+
postgres:
44+
condition: service_healthy
45+
networks:
46+
- doccano-net
47+
48+
postgres:
49+
image: postgres:13.3-alpine
50+
container_name: ${DOCCANO_DB_CONTAINER:-doccano_db}
51+
stop_grace_period: 5s
52+
environment:
53+
POSTGRES_USER: doccano
54+
POSTGRES_PASSWORD: doccano
55+
POSTGRES_DB: doccano
56+
healthcheck:
57+
test: ["CMD-SHELL", "pg_isready -U doccano -d doccano"]
58+
interval: 5s
59+
timeout: 5s
60+
retries: 20
61+
volumes:
62+
- doccano-db-data:/var/lib/postgresql/data
63+
networks:
64+
doccano-net:
65+
ipv4_address: ${DOCCANO_DB_IP:-172.34.0.10}
66+
67+
networks:
68+
doccano-net:
69+
driver: bridge
70+
ipam:
71+
config:
72+
- subnet: ${DOCCANO_NETWORK_SUBNET:-172.34.0.0/24}
73+
74+
volumes:
75+
doccano-db-data:

0 commit comments

Comments
 (0)