PRG-06: Idempotent merge bundle (from #339 #340) by nap-liu · Pull Request #371 · dataelement/Clawith

nap-liu · 2026-04-10T03:57:19Z

This regrouped PR combines the original changes from:\n- #339 context-window-dynamic\n- #340 dingtalk-dedup-ack\n\nScope: idempotent merge and duplicate-ack handling.

…licate processing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a user messages via DingTalk bot, fetch their corp profile (mobile, email) from the DingTalk API and pass it to channel_user_service. This enables matching against existing users who registered via web SSO or other channels, preventing duplicate user records for the same person. Also enrich existing matched users with mobile/email/name from the channel API so future cross-channel lookups succeed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wisdomqin · 2026-04-13T15:34:51Z

Thanks for the regrouped PR. The overall direction makes sense: avoiding DingTalk retransmit duplicates and ACK timeouts is useful.

I do not think this is safe to merge as-is yet. A few issues need to be addressed first:

The dedup marker is written at the very beginning of process_dingtalk_message. If any later step fails, such as DB write, LLM processing, or replying through the session webhook, DingTalk retries will be treated as duplicates and skipped. That turns a transient processing failure into message loss. The dedup state should either be marked only after the message is durably accepted, or use a processing / done state model so failed attempts can be retried safely.
The new DingTalk unionid fetched from get_dingtalk_user_detail() does not actually participate in the primary OrgMember lookup. resolve_channel_user() still calls _find_org_member() with external_user_id=sender_staff_id, so existing org-sync records that are only matchable by real unionid may still be missed.
When a user is matched by email or mobile and an existing OrgMember is reused, the DingTalk identifiers are not backfilled. The current reuse path only updates Feishu-specific identifiers, so future DingTalk messages may keep falling back to API detail lookup and email/mobile matching instead of resolving directly.

There is also a secondary operational concern: this adds a DingTalk user-detail API call, plus access-token retrieval, on every incoming message. That may need caching or rate-limit protection before production use.

Could you update the PR to make dedup retry-safe and wire DingTalk unionid / external_id into the actual OrgMember matching and backfill paths?

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add `extra_ids` param to `resolve_channel_user` and refactor `_find_org_member` to accept a deduplicated `candidate_ids` list, enabling OR-matched lookups across staff_id and unionid for DingTalk. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hoist `or_` import to module level (was inline inside _find_org_member). Add four tests that call _find_org_member with a _RecordingSession and assert the compiled WHERE clause contains the right column IN-clauses for dingtalk, feishu, wecom, and the empty-ids short-circuit path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… to avoid IntegrityError Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ures are visible Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nd wecom Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

nap-liu · 2026-04-14T04:01:38Z

Thanks for the thorough review. Addressed all four concerns, plus one bonus bugfix discovered during local end-to-end testing.

R1 — Retry-safe dedup

Replaced the single-SETNX-on-entry approach with a processing → done state machine in backend/app/api/dingtalk.py:

acquire_dedup_lock(message_id) sets processing with PROCESSING_TTL=180s
On successful processing → mark_dedup_done (DONE_TTL=600s)
On exception → release_dedup_lock deletes the key only if it is still processing (avoids racing with a concurrent done)
If a worker crashes before marking done, PROCESSING_TTL auto-releases so DingTalk retransmits can re-enter

Redis is used when available (atomic cross-process); falls back to per-process memory with periodic GC. The Redis client is now injected via module-level _get_redis_client so tests can monkeypatch it to force the memory path.

R2 — Real `unionid` in OrgMember lookup

ChannelUserService.resolve_channel_user now accepts extra_ids: list[str] | None. The DingTalk entry point fetches unionid from the corp user/get API and passes it (when different from sender_staff_id) as an extra candidate. _find_org_member was rewritten to OR-match all candidate ids against channel-specific columns (unionid/open_id/external_id) via sqlalchemy.or_ and .in_(), so org-sync records keyed by the real unionid resolve on first hit.

R3 — Backfill on reuse

Extracted _backfill_org_member_ids(member, channel_type, external_user_id, extra_info) — a synchronous idempotent method that only fills empty fields. Wired into both paths for all three channels:

Email/mobile match → _find_existing_org_member_for_user hits → backfill (previously feishu-only inline, dingtalk/wecom missed)
Shell OrgMember linked to new user → backfill

R4 — API rate

Added backend/app/services/dingtalk_cache.py: TTLCache with per-key asyncio.Lock (single-flight, double-checked against _store). Wrapped:

get_dingtalk_access_token → 7000s TTL (margin under DingTalk's 7200s expiry)
get_dingtalk_user_detail → 1800s TTL
Failure results (missing access_token, None/empty user detail) are actively invalidated so next call retries

Verified end-to-end: for 3 DingTalk messages the backend makes exactly 1 gettoken and 1 user/get request; subsequent messages are pure cache hits.

Bonus: enrichment IntegrityError

During local E2E testing, a real DingTalk message triggered UniqueViolationError on ix_identities_phone: _enrich_user_from_extra_info checked that the target identity's own phone was empty, but didn't check whether another globally-unique Identity already held that phone. The IntegrityError was swallowed by the Stream layer's fire-and-forget dispatch, leaving the user with no reply. This is a pre-existing issue from the original cross-channel auto-merge commit, but it fully blocked the feature in practice.

Fix (two-layer defence):

Pre-check in _enrich_user_from_extra_info: new _identity_field_in_use(column, value, exclude_id) query. Conflict → logger.warning + skip the field
Isolation at the call sites in resolve_channel_user: wrap enrichment in try/except, log, continue — enrichment failures can never crash the main flow

Also added logger.exception to the dedup wrapper so any future fire-and-forget failure surfaces in logs.

Tests & verification

New unit/integration tests: test_dingtalk_service_cache.py (6), test_channel_user_service_identity.py (18 including feishu/wecom reuse integration), test_dingtalk_dedup.py (7) — all 31 pass
Full-repo regression vs. PR base (ac236e2): 0 new failures (pre-existing password_reset_* / agent_delete failures predate this branch and are unrelated)
Real DingTalk messages: 3 messages, 1 gettoken, 1 user/get, 3 Redis done keys, enrichment skip logged correctly, LLM replies received

11 new commits on top of ac236e2. PTAL.

nap.liu and others added 2 commits April 10, 2026 11:28

feat: dingtalk stream immediate ACK + message_id dedup to prevent dup…

0cea1c0

…licate processing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nap.liu and others added 11 commits April 14, 2026 10:25

feat(dingtalk): add TTLCache helper with single-flight

8f801e3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

perf(dingtalk): cache access_token and user_detail to reduce API calls

22707a7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(dingtalk): do not cache empty user_detail results

74098dd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(dingtalk): pass real unionid as extra_ids to channel user resolver

0ae5422

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(channel-user): back-fill channel identifiers on org-member reuse …

b67ad24

…paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(dingtalk): make message dedup retry-safe via processing/done states

4e0b2dd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(channel-user): pre-check unique Identity fields before enrichment…

5760a76

… to avoid IntegrityError Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(dingtalk): log exception in dedup wrapper so fire-and-forget fail…

48a489f

…ures are visible Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test(channel-user): add reuse+backfill integration tests for feishu a…

ae0b1d0

…nd wecom Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRG-06: Idempotent merge bundle (from #339 #340)#371

PRG-06: Idempotent merge bundle (from #339 #340)#371
nap-liu wants to merge 13 commits intodataelement:mainfrom
nap-liu:prg/06-idempotent-merge__from-p339-p340

nap-liu commented Apr 10, 2026

Uh oh!

wisdomqin commented Apr 13, 2026

Uh oh!

nap-liu commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nap-liu commented Apr 10, 2026

Uh oh!

wisdomqin commented Apr 13, 2026

Uh oh!

nap-liu commented Apr 14, 2026

R1 — Retry-safe dedup

R2 — Real unionid in OrgMember lookup

R3 — Backfill on reuse

R4 — API rate

Bonus: enrichment IntegrityError

Tests & verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

R2 — Real `unionid` in OrgMember lookup