Skip to content

PRG-06: Idempotent merge bundle (from #339 #340)#371

Open
nap-liu wants to merge 13 commits intodataelement:mainfrom
nap-liu:prg/06-idempotent-merge__from-p339-p340
Open

PRG-06: Idempotent merge bundle (from #339 #340)#371
nap-liu wants to merge 13 commits intodataelement:mainfrom
nap-liu:prg/06-idempotent-merge__from-p339-p340

Conversation

@nap-liu
Copy link
Copy Markdown

@nap-liu nap-liu commented Apr 10, 2026

This regrouped PR combines the original changes from:\n- #339 context-window-dynamic\n- #340 dingtalk-dedup-ack\n\nScope: idempotent merge and duplicate-ack handling.

nap.liu and others added 2 commits April 10, 2026 11:28
…licate processing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a user messages via DingTalk bot, fetch their corp profile (mobile,
email) from the DingTalk API and pass it to channel_user_service. This
enables matching against existing users who registered via web SSO or
other channels, preventing duplicate user records for the same person.

Also enrich existing matched users with mobile/email/name from the
channel API so future cross-channel lookups succeed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Thanks for the regrouped PR. The overall direction makes sense: avoiding DingTalk retransmit duplicates and ACK timeouts is useful.

I do not think this is safe to merge as-is yet. A few issues need to be addressed first:

  1. The dedup marker is written at the very beginning of process_dingtalk_message. If any later step fails, such as DB write, LLM processing, or replying through the session webhook, DingTalk retries will be treated as duplicates and skipped. That turns a transient processing failure into message loss. The dedup state should either be marked only after the message is durably accepted, or use a processing / done state model so failed attempts can be retried safely.

  2. The new DingTalk unionid fetched from get_dingtalk_user_detail() does not actually participate in the primary OrgMember lookup. resolve_channel_user() still calls _find_org_member() with external_user_id=sender_staff_id, so existing org-sync records that are only matchable by real unionid may still be missed.

  3. When a user is matched by email or mobile and an existing OrgMember is reused, the DingTalk identifiers are not backfilled. The current reuse path only updates Feishu-specific identifiers, so future DingTalk messages may keep falling back to API detail lookup and email/mobile matching instead of resolving directly.

There is also a secondary operational concern: this adds a DingTalk user-detail API call, plus access-token retrieval, on every incoming message. That may need caching or rate-limit protection before production use.

Could you update the PR to make dedup retry-safe and wire DingTalk unionid / external_id into the actual OrgMember matching and backfill paths?

nap.liu and others added 11 commits April 14, 2026 10:25
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `extra_ids` param to `resolve_channel_user` and refactor
`_find_org_member` to accept a deduplicated `candidate_ids` list,
enabling OR-matched lookups across staff_id and unionid for DingTalk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hoist `or_` import to module level (was inline inside _find_org_member).
Add four tests that call _find_org_member with a _RecordingSession and
assert the compiled WHERE clause contains the right column IN-clauses
for dingtalk, feishu, wecom, and the empty-ids short-circuit path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… to avoid IntegrityError

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ures are visible

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd wecom

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nap-liu
Copy link
Copy Markdown
Author

nap-liu commented Apr 14, 2026

Thanks for the thorough review. Addressed all four concerns, plus one bonus bugfix discovered during local end-to-end testing.

R1 — Retry-safe dedup

Replaced the single-SETNX-on-entry approach with a processing → done state machine in backend/app/api/dingtalk.py:

  • acquire_dedup_lock(message_id) sets processing with PROCESSING_TTL=180s
  • On successful processing → mark_dedup_done (DONE_TTL=600s)
  • On exception → release_dedup_lock deletes the key only if it is still processing (avoids racing with a concurrent done)
  • If a worker crashes before marking done, PROCESSING_TTL auto-releases so DingTalk retransmits can re-enter

Redis is used when available (atomic cross-process); falls back to per-process memory with periodic GC. The Redis client is now injected via module-level _get_redis_client so tests can monkeypatch it to force the memory path.

R2 — Real unionid in OrgMember lookup

ChannelUserService.resolve_channel_user now accepts extra_ids: list[str] | None. The DingTalk entry point fetches unionid from the corp user/get API and passes it (when different from sender_staff_id) as an extra candidate. _find_org_member was rewritten to OR-match all candidate ids against channel-specific columns (unionid/open_id/external_id) via sqlalchemy.or_ and .in_(), so org-sync records keyed by the real unionid resolve on first hit.

R3 — Backfill on reuse

Extracted _backfill_org_member_ids(member, channel_type, external_user_id, extra_info) — a synchronous idempotent method that only fills empty fields. Wired into both paths for all three channels:

  • Email/mobile match → _find_existing_org_member_for_user hits → backfill (previously feishu-only inline, dingtalk/wecom missed)
  • Shell OrgMember linked to new user → backfill

R4 — API rate

Added backend/app/services/dingtalk_cache.py: TTLCache with per-key asyncio.Lock (single-flight, double-checked against _store). Wrapped:

  • get_dingtalk_access_token → 7000s TTL (margin under DingTalk's 7200s expiry)
  • get_dingtalk_user_detail → 1800s TTL
  • Failure results (missing access_token, None/empty user detail) are actively invalidated so next call retries

Verified end-to-end: for 3 DingTalk messages the backend makes exactly 1 gettoken and 1 user/get request; subsequent messages are pure cache hits.

Bonus: enrichment IntegrityError

During local E2E testing, a real DingTalk message triggered UniqueViolationError on ix_identities_phone: _enrich_user_from_extra_info checked that the target identity's own phone was empty, but didn't check whether another globally-unique Identity already held that phone. The IntegrityError was swallowed by the Stream layer's fire-and-forget dispatch, leaving the user with no reply. This is a pre-existing issue from the original cross-channel auto-merge commit, but it fully blocked the feature in practice.

Fix (two-layer defence):

  • Pre-check in _enrich_user_from_extra_info: new _identity_field_in_use(column, value, exclude_id) query. Conflict → logger.warning + skip the field
  • Isolation at the call sites in resolve_channel_user: wrap enrichment in try/except, log, continue — enrichment failures can never crash the main flow

Also added logger.exception to the dedup wrapper so any future fire-and-forget failure surfaces in logs.

Tests & verification

  • New unit/integration tests: test_dingtalk_service_cache.py (6), test_channel_user_service_identity.py (18 including feishu/wecom reuse integration), test_dingtalk_dedup.py (7) — all 31 pass
  • Full-repo regression vs. PR base (ac236e2): 0 new failures (pre-existing password_reset_* / agent_delete failures predate this branch and are unrelated)
  • Real DingTalk messages: 3 messages, 1 gettoken, 1 user/get, 3 Redis done keys, enrichment skip logged correctly, LLM replies received

11 new commits on top of ac236e2. PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants