Skip to content

Release 0.0.38: PII masking, next/previous fixes, how display fixes, behavior indicator#365

Merged
jbutcher21 merged 11 commits intomainfrom
363-jb-mask-pii
Apr 8, 2026
Merged

Release 0.0.38: PII masking, next/previous fixes, how display fixes, behavior indicator#365
jbutcher21 merged 11 commits intomainfrom
363-jb-mask-pii

Conversation

@jbutcher21
Copy link
Copy Markdown
Contributor

@jbutcher21 jbutcher21 commented Apr 7, 2026

Summary

Test plan

  • sz_explorer -M — verify masking with 2 visible chars
  • sz_explorer -M1 — verify masking with 1 visible char
  • show_last_call — verify raw JSON is also masked
  • why <id1> <id2> — verify ↓/↑ indicator appears when scoring behavior differs from config
  • how <entity_id> columnar view — verify features align with correct virtual entity columns
  • how <entity_id> on entity with 3+ records — verify INBOUND/CANDIDATE sides are correct
  • get <id> then next / previous — verify correct navigation
  • previous with no prior get — verify graceful warning, no crash
  • next / previous without direct database access — verify graceful warning
  • how on entity 23 (Maria Sentosa) with SDK 4.2.0 — verify warning instead of crash for phantom feature IDs

Resolves #363
Resolves #364
Resolves #341
Resolves #340

@jbutcher21 jbutcher21 requested a review from a team as a code owner April 7, 2026 07:42
@jbutcher21 jbutcher21 requested review from antaenc and docktermj April 7, 2026 07:43
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Now I have enough information to write the review.


PR Code Review: v0.0.38

Summary: This PR adds --mask_pii CLI flag for PII masking, scoring behavior upgrade/downgrade indicators, fixes next/previous navigation bugs, and fixes the columnar view feature reversal bug.


Code Quality

✅ No commented-out code — The diff actually removes previously commented-out code (e.g., # data_source = None, dead SQL branches in find).

✅ Meaningful variable namesbehavior_rank, fmt_behavior_change, mask_json_pii, ftype_behavior, etc. are all clear.

✅ DRY principle — PII masking logic is centralized in mask_pii/mask_json_pii; no duplication observed.

❌ Minor style issue — argparse flag uses underscore instead of hyphen
sz_explorer:4349 uses --mask_pii but POSIX/CLI convention (and argparse convention) is to use --mask-pii (hyphen). Argparse converts it to args.mask_pii automatically:

argParser.add_argument(
    "-M",
    "--mask-pii",   # <-- hyphen, not underscore
    ...
)

❌ Potential defect — find() doesn't guard operator == ">" when starting_id is None
sz_explorer:3903 sets starting_id = self.last_entity_id which is initialized to 0 at line 2997, so starting_id = 0 for a fresh session. The guard at line 3909 only checks not starting_id and operator == "<". For operator == ">", the SQL becomes RES_ENT_ID > 0 which works, but this relies on the initialization value of 0 being intentionally meaningful. The logic is technically correct but fragile — a comment would clarify the intent.

❌ Minor defect — mask_pii separator check uses string in instead of set membership
sz_explorer:92:

token not in ',;:|/\\-'

This works for single-char separators (substring check on a 7-char string), but is semantically misleading — it reads as substring containment, not set membership. Should use:

token not in {',', ';', ':', '|', '/', '\\', '-'}

✅ No logic errors in behavior_rank — The rank calculation correctly strips ES suffix before indexing into freq_rank, and checks for E/S in the stripped suffix portion.

✅ ROLE-based fix for columnar view (#341) — The switch from checking STEP_TYPE to checking entity_list[0]["ROLE"] is correct. The ROLE is assigned before sorting by MEMBER_COUNT, so it correctly preserves the original INBOUND/CANDIDATE assignment regardless of post-sort order.

find() fix is sound — Querying RES_ENT for ordering and verifying via RES_ENT_OKEY correctly skips deleted/orphaned entity IDs. The early return on operator == "<" with no starting_id is correct since entity IDs start at 1.

mask_json_pii mutates in-place before caching — Line 1270: masking happens before appending to last_call_cache, so cached data is consistently masked. No double-masking risk.


Testing

❌ No unit tests for new pure functionsmask_pii and mask_json_pii are stateless pure functions that would be straightforward to unit test (e.g., edge cases: empty string, single-char token, tokens with only separators, deeply nested JSON). No tests were added.

❌ No tests for behavior_rank / fmt_behavior_change — The ranking logic has several branches (F1, FF, FM, FVM, NAME, unknown, E/S modifiers) that would benefit from tests.

❌ No existing sz_explorer test coverage — No test files for sz_explorer exist in the repo. While this is pre-existing, the new features (PII masking, behavior indicators) add untested surface area.


Documentation

✅ CHANGELOG.md updated — All four fixes and two features are listed with issue references.

✅ Version bumpedpyproject.toml updated from 0.0.37 to 0.0.38.

✅ CLI help text provided — The --mask_pii argument includes a descriptive help= string.

✅ No inline comments needed — The PII masking functions have docstrings. behavior_rank logic is compact but readable given the domain context.


Security

✅ No hardcoded credentials

✅ No license files (.lic) checked in

✅ PII masking is additive/opt-in — The default is False; no data is masked unless explicitly requested.

⚠️ Note on PII masking scopePII_KEYS at line 77–83 covers ENTITY_NAME, FEAT_DESC, INBOUND_FEAT_DESC, CANDIDATE_FEAT_DESC, FELEM_VALUE. It does not cover raw source record fields like NAME_LAST, NAME_FIRST, ADDR_LINE1, etc. This may be intentional (masking only resolved/scored output, not raw records), but worth confirming the scope is sufficient for the intended use case.

✅ No SQL injection risk — The find() method uses sz_dbo.sql_prep() (parameterized queries) for both SQL statements, with starting_id passed as a bound parameter, not interpolated.


Summary

Category Result
Code Quality Minor issues (argparse flag naming, separator check style)
Testing No tests for new functions
Documentation ✅ Complete
Security ✅ No critical issues; PII scope worth confirming

Key items to address before merge:

  1. --mask_pii--mask-pii (CLI convention)
  2. Consider adding unit tests for mask_pii/mask_json_pii/behavior_rank
  3. token not in ',;:|/\\-' → use a set literal for clarity

Automated code review analyzing defects and coding standards

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Now I have enough context for a full review. Here's my analysis:


PR Code Review: v0.0.38

Code Quality

✅ No commented-out code — The diff also removes previously commented-out lines (e.g., data_source logic in find), which is a positive cleanup.

✅ Meaningful variable namesftype_behavior, behavior_rank, fmt_behavior_change, mask_pii, visible are all clear and descriptive.

✅ DRY principlemask_pii is reused correctly. fmt_behavior_change centralizes behavior arrow logic.

❌ Line lengthsz_tools/sz_explorer:3917 slightly approaches the 120-char limit and reads poorly:

sql = sz_dbo.sql_prep(f"select RES_ENT_ID from RES_ENT where RES_ENT_ID {operator} ? order by RES_ENT_ID {order}")

Consider breaking the f-string onto its own line.

⚠️ mask_json_pii mutates in-place AND returnssz_tools/sz_explorer:102-114. The function modifies data in place but also returns it. At every call site the return value is discarded. This is an inconsistent API — pick one convention: either return the modified value (and don't mutate) or mutate in-place and return None. As-is, it's misleading.

⚠️ find loop is unboundedsz_tools/sz_explorer:3919-3923. The method fetches all entities greater/less than starting_id with no LIMIT, then verifies each against RES_ENT_OKEY in a separate query per row. On a large database with many entities that lack OKEY entries, this could be very slow. A LIMIT N with retry or a JOIN would be safer.

⚠️ behavior_rank locals not extractedsz_tools/sz_explorer:1213-1221. freq_rank is rebuilt as a new dict on every call since it's defined inside the static method. Consider making it a module-level or class-level constant.

✅ ROLE fix is correct — The switch from step_type == "Create virtual entity" to step[ve_num]["VIRTUAL_ENTITY_ID"] == step["INBOUND_VIRTUAL_ENTITY_ID"] at sz_tools/sz_explorer:1550 is the right approach and aligns with the columnar view fix.

✅ Phantom feature ID guardsz_tools/sz_explorer:1513-1515. The guard for missing lib_feat_id in entity_features is correct and the warning message is useful.

✅ No SQL injection riskoperator in find only ever comes from internal callers passing ">" or "<", and order is derived from that. Low risk, but worth documenting with a comment if this function is ever refactored.


Testing

❌ No tests for new functionalitytests/example_test.py is only a placeholder. There are no unit tests for:

  • mask_pii (various inputs: empty string, tokens shorter than visible, tokens at boundary, multi-token strings with delimiters)
  • mask_json_pii (recursive structure, non-PII keys untouched, list traversal)
  • behavior_rank (known frequency strings, with/without E/S suffixes, unknown strings)
  • fmt_behavior_change (upgrade vs. downgrade arrow, matching behavior returns "")
  • The find fix (next/previous with no prior get, no DB access case)

These are particularly important for mask_pii and behavior_rank since they contain non-trivial logic.

❌ Test coverage — Coverage cannot exceed the placeholder test. This should be addressed.


Documentation

✅ CHANGELOG.md updated — Version 0.0.38 entry is correctly placed with added features and fixes, dated 2026-04-07.

pyproject.toml version bumped to 0.0.38.

✅ Inline docstrings present on mask_pii and mask_json_pii.

⚠️ CLI help textsz_tools/sz_explorer:4358: "mask PII data, optionally specify number of visible characters (default: 2)" doesn't mention that -M alone uses the default of 2 visible characters. Consider: "mask PII data; use -M alone for 2 visible chars, or -M N for N visible chars".


Security

✅ No hardcoded credentials.

✅ No sensitive data in logs — The warning at sz_tools/sz_explorer:1514 prints DATA_SOURCE and RECORD_ID (not PII field values), which is appropriate.

✅ No license files (.lic) detected.

⚠️ PII masking applied after SDK call cachingsz_tools/sz_explorer:1266-1270. The call cache (self.last_call_cache) is populated with the raw response before PII masking is applied (at line 1269-1270 the masking runs on response_data, and the cache append is at 1272). Since response_data is a dict and mask_json_pii mutates it in-place, the cached version is masked. However, this is subtle — a reader could incorrectly believe the cache stores pre-masked data. A comment clarifying the mutation behavior would help. If the intent were ever to cache raw data for debugging, this would silently mask it.


Summary

Area Status
Code style ✅ mostly clean, minor line-length and API-consistency issues
No commented-out code
DRY / naming
Defects ⚠️ unbounded DB loop in find, freq_rank rebuilt each call
Tests ❌ no tests for new/fixed logic
CHANGELOG / docs
Security ⚠️ subtle mutation-before-cache ordering (actually safe but fragile)

Blocking items: The lack of unit tests for mask_pii, behavior_rank, and fmt_behavior_change is a significant gap given the logic complexity. The find loop's potential performance issue on large databases is worth addressing before merge.

Automated code review analyzing defects and coding standards

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Now I have enough context to provide a thorough review.


PR Code Review: v0.0.38

Code Quality

✅ No commented-out code — The PR actually removes previously commented-out dead code (# data_source = None, # else: # data_source = parm.upper()).

✅ Meaningful variable namesmask_pii, visible, ftype_behavior, behavior_rank, fmt_behavior_change are all clear and descriptive.

✅ DRY principle — PII masking is centralized in mask_pii() and mask_json_pii(), reused consistently across call sites.

❌ Line length violations — The project enforces 120-char lines via Black, but several new lines exceed this:

  • sz_explorer:1525print_message(f"Feature ID {lib_feat_id} on record {record['DATA_SOURCE']}: {record['RECORD_ID']} not in entity features", "warning") (~146 chars)
  • sz_explorer:1559"ROLE": "INBOUND" if step[ve_num]["VIRTUAL_ENTITY_ID"] == step["INBOUND_VIRTUAL_ENTITY_ID"] else "CANDIDATE", (~131 chars)
  • sz_explorer:3923sql = sz_dbo.sql_prep(f"select RES_ENT_ID from RES_ENT where RES_ENT_ID {operator} ? order by RES_ENT_ID {order}") (~123 chars)
  • sz_explorer:4391sdk_wrapper = EdaSdkWrapper(engine_config, debug_trace=args.debug_trace, webapp_url=args.webapp_url, mask_pii=args.mask_pii) (~132 chars)

❌ Input validation missing for --mask_pii — The argument accepts any int, including negative values. Calling mask_pii("hello", -1) produces "hell******" (Python slice token[:-1] + len(token) - (-1) stars), which is silently wrong. Should add type=int + a choices range or explicit guard:

# In argparser:
type=int,
metavar="N",
# Or add a validation after parse_args:
if args.mask_pii is not None and args.mask_pii < 0:
    argParser.error("--mask_pii must be >= 0")

✅ No defects found in core logic — The find() fix is correct: last_entity_id initializes to 0 (line 3006), so not starting_id and operator == "<" properly guards the "previous with no prior get" case, while > 0 correctly returns the first entity for next.

✅ Behavior rank logicbehavior_rank() correctly handles the rstrip("ES") suffix extraction. Empty/unknown behavior gracefully returns rank 990.

ℹ️ Minor: close() swallows all exceptions broadlysz_explorer:1212-1216. Silently ignoring any Exception is broad. If sz_factory was never assigned (some unforeseen init path), an AttributeError would be eaten. Consider at minimum catching SzError specifically or checking hasattr(self, 'sz_factory').

ℹ️ Minor: mask_json_pii mutates in place but also returns — The return value is unused at call sites (call_sdk, get_features). This is harmless but inconsistent; consider dropping the return data or using it at call sites for clarity.


Testing

❌ No unit tests for new functionsmask_pii(), mask_json_pii(), behavior_rank(), and fmt_behavior_change() are pure/static functions that are straightforward to unit test. The test suite currently contains only example_test.py with a trivial assertion. Key cases to cover:

  • mask_pii("John Smith", 2)"Jo** Sm***"
  • mask_pii("", 2)"" (empty passthrough)
  • mask_pii("Jo", 2)"Jo" (token length == visible)
  • mask_json_pii with nested dicts, missing PII keys, non-string values
  • behavior_rank("F1E") vs behavior_rank("FF") ordering
  • fmt_behavior_change when score_behavior == expected (returns "")

❌ Test coverage — No coverage measurement can currently verify the >80% threshold with only the example test.


Documentation

✅ CHANGELOG.md updated — Correct Keep a Changelog format with [0.0.38] - 2026-04-07 section, properly listing added features and fixes.

✅ pyproject.toml version bumped0.0.370.0.38.

✅ Inline comments — New sections use the existing # --- comment block style consistently.


Security

✅ No hardcoded credentials

✅ No sensitive data in logs — The warning message at sz_explorer:1525 prints DATA_SOURCE and RECORD_ID but not PII feature values. Appropriate.

✅ No license files (.lic) or AQAAAD-prefixed strings detected

✅ SQL parameterization — The new SQL in find() (line 3923-3924) uses parameterized queries (?) for user-supplied values. The operator (> / <) and order (ASC / DESC) are controlled by internal code only, not user input.

✅ PII masking applied before cachingmask_json_pii runs before last_call_cache.append(...) in call_sdk, so the debug export cache also contains masked data when PII masking is enabled.


Summary

Category Status
Code style (line length) ❌ 4 violations
Input validation (--mask_pii negative) ❌ Missing
Unit tests for new functions ❌ Missing
Commented-out code
DRY / naming
Logic correctness
Security
Documentation

The functional changes (PII masking, behavior indicator, next/previous fixes, SDK factory cleanup) are correct and well-structured. The main gaps are missing unit tests for the new pure functions and the unvalidated negative-integer edge case for --mask_pii.

Automated code review analyzing defects and coding standards

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Now I have enough context. Here is the full review:


Code Review: sz_explorer v0.0.38


Code Quality

✅ No commented-out code — The PR actually removes previously commented-out code from find(). Good cleanup.

✅ Meaningful variable namesftype_behavior, behavior_rank, mask_pii, score_display are all clear.

✅ DRY principle — PII masking is applied at appropriate call sites without unnecessary duplication.

❌ Bug: behavior_rank incorrectly handles "NAME" frequency

sz_tools/sz_explorer:1215

freq = behavior.rstrip("ES")

str.rstrip("ES") strips any combination of characters 'E' and 'S' from the right. The string "NAME" ends in 'E', so:

"NAME".rstrip("ES")  →  "NAM"
freq_rank.get("NAM", 99)  →  99   # should be 5

Any feature type with FTYPE_FREQ = "NAME" (with or without the E/S suffix) will receive the wrong rank (990+), causing the upgrade/downgrade arrow to point the wrong direction for name-type features.

Fix: parse the freq code as a known prefix instead of using rstrip:

@staticmethod
def behavior_rank(behavior):
    freq_rank = {"F1": 1, "FF": 2, "FM": 3, "FVM": 4, "NAME": 5}
    for freq, rank in sorted(freq_rank.items(), key=lambda x: -len(x[0])):
        if behavior.startswith(freq):
            suffix = behavior[len(freq):]
            r = rank * 10
            if "E" in suffix:
                r -= 2
            if "S" in suffix:
                r -= 1
            return r
    return 99 * 10

❌ Line length violations (project enforces 120 chars)

sz_tools/sz_explorer:1553 (~137 chars with indentation):

"ROLE": "INBOUND" if step[ve_num]["VIRTUAL_ENTITY_ID"] == step["INBOUND_VIRTUAL_ENTITY_ID"] else "CANDIDATE",

sz_tools/sz_explorer:1519 (~145 chars with indentation):

print_message(f"Feature ID {lib_feat_id} on record {record['DATA_SOURCE']}: {record['RECORD_ID']} not in entity features", "warning")

⚠️ mask_json_pii mutates response_data before caching

sz_tools/sz_explorer:1269-1272

if self.mask_pii:
    mask_json_pii(response_data, self.mask_pii)

self.last_call_cache.append([caller, api_name, flags, parms, response_data])

mask_json_pii modifies the dict in-place, so the call cache (used for --debug/export) stores already-masked data. If the intent is to let developers export raw API responses for debugging while still masking display output, this is a defect. If masking the cache is acceptable, add a comment to make it explicit.

⚠️ -M 0 silently disables masking

sz_tools/sz_explorer:4350-4358 — With type=int and default=0, passing -M 0 sets self.mask_pii = 0, which is falsy, so masking is disabled rather than showing 0 visible characters. A user invoking -M 0 expecting fully-masked output (all asterisks) will be confused. Consider rejecting 0 or treating it as "mask all."

find() refactor logic is correctfetch_all_rows calls fetchall() internally (confirmed in _sz_database.py:446), so all rows are in memory before the inner fetch_row call runs — no cursor-in-cursor conflict.

✅ INBOUND/CANDIDATE role fix — Using ENTITY_LIST[0]["ROLE"] rather than STEP_TYPE is more robust and directly reflects what the SDK reports.


Testing

❌ No tests for new logicmask_pii, mask_json_pii, and behavior_rank are pure functions that are easy to unit test. The project has only a placeholder test (tests/example_test.py). The behavior_rank bug above would have been caught by a simple test:

assert EdaSdkWrapper.behavior_rank("NAME") == 50  # fails with current code → 988

Documentation

✅ CHANGELOG.md updated — Entries for 0.0.38 are accurate and well-structured.

✅ Version bumped in pyproject.toml.


Security

✅ No hardcoded credentials.

✅ No license files checked in.

✅ SQL in find()operator (> / <) and order (ASC / DESC) are both hardcoded internally, not user-supplied. No injection risk.

✅ PII masking feature itself is a positive security addition.


Summary

Severity Issue
High behavior_rank incorrectly handles "NAME" frequency — wrong arrows on name features
Medium Line length violations on 2 lines
Medium mask_json_pii mutates in-place before caching — debug cache contains masked data
Low -M 0 silently disables masking instead of masking all characters
Low No unit tests for pure functions (mask_pii, mask_json_pii, behavior_rank)

Automated code review analyzing defects and coding standards

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🤖 Claude Code Review

Now I have enough context for a thorough review.


Code Review: PR to v0.0.38

Code Quality

✅ No commented-out code — The diff actually removes two blocks of commented-out code (good cleanup).

✅ Meaningful variable namesftype_behavior, behavior_rank, mask_json_pii, starting_id, candidate_id are all clear.

✅ DRY principle — PII masking logic is centralized in mask_pii / mask_json_pii. The fmt_behavior_change method is reused across why and how display paths.

❌ Line length violationsz_tools/sz_explorer:4385 is ~132 characters, exceeding the 120-char Black limit:

sdk_wrapper = EdaSdkWrapper(engine_config, debug_trace=args.debug_trace, webapp_url=args.webapp_url, mask_pii=args.mask_pii)

Should be wrapped (Black would split this across lines).

❌ Potential defect — MATCHED_FEAT_DESC not maskedPII_KEYS (sz_tools/sz_explorer:77-83) includes FEAT_DESC, INBOUND_FEAT_DESC, CANDIDATE_FEAT_DESC, but omits MATCHED_FEAT_DESC. This value is rendered directly into the why display at line 1767:

matched_desc = f"\u2514\u2500\u2500 {feature['MATCHED_FEAT_DESC']} ({score}){behavior_change}"

With --mask_pii enabled, this field will still show unmasked PII in the tree view.

❌ Edge case in behavior_rank with unknown feature typefmt_behavior_change (sz_tools/sz_explorer:1223) calls self.ftype_behavior.get(ftype_code, ""), which returns "" for unknown feature types. behavior_rank("") then returns 990, causing almost any score_behavior to appear as an "upgrade" (↑) regardless of actual behavior. Should guard:

if not score_behavior or not expected or score_behavior == expected:
    return ""

❌ Performance concern in findsz_tools/sz_explorer:3917-3923 scans RES_ENT row-by-row, verifying each against RES_ENT_OKEY in a loop. On large databases with many orphaned RES_ENT rows, this could be very slow. A single JOIN query would be more efficient:

SELECT r.RES_ENT_ID FROM RES_ENT r
JOIN RES_ENT_OKEY o ON r.RES_ENT_ID = o.RES_ENT_ID
WHERE r.RES_ENT_ID {operator} ? ORDER BY r.RES_ENT_ID {order} LIMIT 1

⚠️ Minor: -M 0 has no effectsz_tools/sz_explorer:4355 uses default=0 and the check if self.mask_pii: (line 1269) treats 0 as falsy. A user explicitly passing -M 0 would expect 0 visible chars but masking would be silently skipped. Low impact since 0 visible chars is not a useful setting, but worth noting.


Testing

❌ No unit tests for new functionsmask_pii, mask_json_pii, behavior_rank, and fmt_behavior_change have no tests. These are good candidates for unit tests since they're pure/near-pure functions with clear inputs and outputs:

  • mask_pii("John Smith", 2)"Jo** Sm***"
  • mask_pii("", 2)""
  • behavior_rank("F1E")8
  • fmt_behavior_change when behavior matches, is unknown, upgrades, downgrades

Current test coverage is essentially zero for this feature. Only tests/example_test.py exists.


Documentation

✅ CHANGELOG updated — Correctly documents what changed in 0.0.38.

✅ Version bumpedpyproject.toml updated from 0.0.37 → 0.0.38.

✅ Inline docstringsmask_pii and mask_json_pii have docstrings.


Security

✅ No hardcoded credentials.

✅ SQL parameterizationstarting_id (user input in find) is passed as a query parameter, not interpolated. The operator and order values in the f-string are not user-controlled (they're always >/< and ASC/DESC from hardcoded call sites).

✅ No sensitive data in logs — The print_message for phantom feature IDs at line 1514 logs only IDs (integers), not PII content.

✅ No license files (.lic) committed.


Summary

Severity Issue
❌ Bug MATCHED_FEAT_DESC not in PII_KEYS — PII leaks in why-tree view even with --mask_pii
❌ Bug behavior_rank("") edge case shows spurious upgrade arrows for unknown feature types
❌ Style Line 4385 exceeds 120 chars
❌ Tests No tests for any of the 4 new pure functions
⚠️ Perf find loop + N verify queries instead of 1 JOIN query
⚠️ Minor -M 0 silently no-ops

Automated code review analyzing defects and coding standards

@jbutcher21 jbutcher21 merged commit a9b016c into main Apr 8, 2026
76 checks passed
@jbutcher21 jbutcher21 deleted the 363-jb-mask-pii branch April 8, 2026 01:07
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🤖 Claude Code Review

I'll proceed with the review based on what's visible in the diff and the project's CLAUDE.md.


Code Review: PR to v0.0.38

Code Quality

Code Style
Follows the project's 120-char line length, Black formatting, and isort conventions. Module-level constant PII_KEYS is uppercase. Docstrings on new public functions.

No Commented-Out Code
The PR actually removes commented-out dead code from find() (the data_source block). One pre-existing commented-out line (# self.ftype_code_lookup["AMBIGUOUS_ENTITY"]["SORT_ORDER"] = 1) near line ~1174 remains, but it predates this PR.

Meaningful Variable Names
mask_pii, visible, ftype_behavior, behavior_rank, behavior_change, score_display are all clear.

DRY Principle
No obvious duplication introduced.

Defects / Logic Issues ❌ (minor)

  1. mask_pii — negative visible argument produces incorrect output (sz_tools/sz_explorer, mask_pii function ~line 87):
    The argparse allows -M -1 (valid int). With visible=-1, token[:visible] becomes token[:-1] (all but last char), and len(token) - visible becomes len(token) + 1, producing extra *s and wrong output. A guard is needed:

    visible = max(0, visible)
  2. -M 0 disables masking instead of masking everything (sz_tools/sz_explorer ~line 4350):
    default=0 combined with if self.mask_pii: means passing -M 0 is indistinguishable from not passing -M at all. Users expecting full masking with 0 visible chars would be surprised. Consider using None as the default and checking self.mask_pii is not None, or document that 0 is not a valid value.

  3. fetch_all_rows in find() — potential performance concern (sz_tools/sz_explorer, find method):
    The new implementation iterates all rows from RES_ENT ordered by ID, verifying each in RES_ENT_OKEY. In the normal (non-phantom) case this stops immediately. In a degraded database with many phantom IDs it could iterate many rows in memory. This is acceptable for a tool, just worth documenting.

  4. separator check readability (sz_tools/sz_explorer, mask_pii ~line 89):

    token not in ',;:|/\\-'

    This works (substring check on single-char tokens), but intent is clearer as a set literal:

    token not in {',', ';', ':', '|', '/', '\\', '-'}
  5. fmt_behavior_change strip inconsistency: In the columnar view behavior_change.strip() is called (removes leading space), but in the why/how tree display behavior_change is appended directly with its leading space. This is fine functionally but is mildly asymmetric.

CLAUDE.md
Well-structured and not environment-specific.


Testing ❌

No test files are added or modified in this PR. New logic introduced includes:

  • mask_pii() — string masking with token splitting
  • mask_json_pii() — recursive dict/list masking
  • behavior_rank() — ranking calculation with suffix parsing
  • fmt_behavior_change() — behavioral change detection and formatting
  • Reworked find() — entity navigation with phantom-entity skipping

These are all independently testable pure/near-pure functions. mask_pii in particular has non-obvious edge cases (empty string, all-separator input, visible=0, Unicode). Unit tests are missing.


Documentation

CHANGELOG.md ✅ — Updated with clear entries for all changes.

Version bump ✅ — pyproject.toml updated to 0.0.38.

README — Cannot verify from the diff if --mask_pii / -M is documented in the README or any usage docs. If the tool has a usage section in the README, it should be updated.

Inline comments ✅ — mask_pii and mask_json_pii have docstrings. The PII_KEYS frozenset is self-documenting.

Markdown formatting ✅ — CHANGELOG additions follow the existing format.


Security

No hardcoded credentials

PII masking feature ✅ — Correctly applied at the API response level (_call_sdk) and at the direct DB access level (feat_cache). Both paths are covered.

Warning message exposes identifiers — The new warning at sz_tools/sz_explorer ~line 1516:

print_message(f"Feature ID {lib_feat_id} on record {record['DATA_SOURCE']}: {record['RECORD_ID']} not in entity features", "warning")

This logs DATA_SOURCE and RECORD_ID — these are identifiers, not PII values, so this is acceptable. Worth noting that this message is emitted even when --mask_pii is active, but record IDs and data source names are not PII.

No .lic files

Input validation — See defect #1 above (negative visible).


Summary

Category Status
Code style
No commented-out code
DRY / naming
Defects ❌ Negative visible bug; -M 0 disables rather than fully-masks
Unit tests ❌ None added for new functions
Documentation ✅ (README should be verified)
Security

Must fix: The visible < 0 bug in mask_pii. Recommend also addressing the -M 0 ambiguity and adding at minimum a few unit tests for mask_pii and behavior_rank.

Automated code review analyzing defects and coding standards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants