Release 0.0.38: PII masking, next/previous fixes, how display fixes, behavior indicator by jbutcher21 · Pull Request #365 · senzing-garage/sz-python-tools

jbutcher21 · 2026-04-07T07:42:50Z

Summary

sz_explorer: add --mask_pii flag for PII data masking #363 Add --mask_pii (-M) CLI flag to sz_explorer for PII data masking with configurable visible characters (-M defaults to 2, -M1 for 1, -M3 for 3)
sz_explorer: show behavior downgrade indicator on why/how #364 Show scoring behavior downgrade/upgrade indicator (↓/↑) on why/how displays
sz_explorer: how sometimes has features reversed on decision tree #341 Fix how columnar view showing features reversed — use INBOUND_VIRTUAL_ENTITY_ID from SDK response instead of assuming VIRTUAL_ENTITY_1 is always inbound
sz_explorer next/previous issue #340 Fix previous command going to beginning instead of previous entity (add ORDER BY, use RES_ENT table with RES_ENT_OKEY verification)
sz_explorer next/previous issue #340 Fix next/previous crash with no prior get and when direct database access is not available
Guard against phantom feature IDs in record FEATURE_IDS (SDK 4.2.0-4.2.1 bug where record-level FEATURE_IDS references LIB_FEAT_IDs not present in entity-level FEATURES or LIB_FEAT table)

Test plan

Resolves #363
Resolves #364
Resolves #341
Resolves #340

github-actions · 2026-04-07T07:45:11Z

🤖 Claude Code Review

Now I have enough information to write the review.

PR Code Review: v0.0.38

Summary: This PR adds --mask_pii CLI flag for PII masking, scoring behavior upgrade/downgrade indicators, fixes next/previous navigation bugs, and fixes the columnar view feature reversal bug.

Code Quality

✅ No commented-out code — The diff actually removes previously commented-out code (e.g., # data_source = None, dead SQL branches in find).

✅ Meaningful variable names — behavior_rank, fmt_behavior_change, mask_json_pii, ftype_behavior, etc. are all clear.

✅ DRY principle — PII masking logic is centralized in mask_pii/mask_json_pii; no duplication observed.

❌ Minor style issue — argparse flag uses underscore instead of hyphen
sz_explorer:4349 uses --mask_pii but POSIX/CLI convention (and argparse convention) is to use --mask-pii (hyphen). Argparse converts it to args.mask_pii automatically:

argParser.add_argument(
    "-M",
    "--mask-pii",   # <-- hyphen, not underscore
    ...
)

❌ Potential defect — find() doesn't guard operator == ">" when starting_id is None
sz_explorer:3903 sets starting_id = self.last_entity_id which is initialized to 0 at line 2997, so starting_id = 0 for a fresh session. The guard at line 3909 only checks not starting_id and operator == "<". For operator == ">", the SQL becomes RES_ENT_ID > 0 which works, but this relies on the initialization value of 0 being intentionally meaningful. The logic is technically correct but fragile — a comment would clarify the intent.

❌ Minor defect — mask_pii separator check uses string in instead of set membership
sz_explorer:92:

token not in ',;:|/\\-'

This works for single-char separators (substring check on a 7-char string), but is semantically misleading — it reads as substring containment, not set membership. Should use:

token not in {',', ';', ':', '|', '/', '\\', '-'}

✅ No logic errors in behavior_rank — The rank calculation correctly strips ES suffix before indexing into freq_rank, and checks for E/S in the stripped suffix portion.

✅ ROLE-based fix for columnar view (#341) — The switch from checking STEP_TYPE to checking entity_list[0]["ROLE"] is correct. The ROLE is assigned before sorting by MEMBER_COUNT, so it correctly preserves the original INBOUND/CANDIDATE assignment regardless of post-sort order.

✅ find() fix is sound — Querying RES_ENT for ordering and verifying via RES_ENT_OKEY correctly skips deleted/orphaned entity IDs. The early return on operator == "<" with no starting_id is correct since entity IDs start at 1.

✅ mask_json_pii mutates in-place before caching — Line 1270: masking happens before appending to last_call_cache, so cached data is consistently masked. No double-masking risk.

Testing

❌ No unit tests for new pure functions — mask_pii and mask_json_pii are stateless pure functions that would be straightforward to unit test (e.g., edge cases: empty string, single-char token, tokens with only separators, deeply nested JSON). No tests were added.

❌ No tests for behavior_rank / fmt_behavior_change — The ranking logic has several branches (F1, FF, FM, FVM, NAME, unknown, E/S modifiers) that would benefit from tests.

❌ No existing sz_explorer test coverage — No test files for sz_explorer exist in the repo. While this is pre-existing, the new features (PII masking, behavior indicators) add untested surface area.

Documentation

✅ CHANGELOG.md updated — All four fixes and two features are listed with issue references.

✅ Version bumped — pyproject.toml updated from 0.0.37 to 0.0.38.

✅ CLI help text provided — The --mask_pii argument includes a descriptive help= string.

✅ No inline comments needed — The PII masking functions have docstrings. behavior_rank logic is compact but readable given the domain context.

Security

✅ No hardcoded credentials

✅ No license files (.lic) checked in

✅ PII masking is additive/opt-in — The default is False; no data is masked unless explicitly requested.

⚠️ Note on PII masking scope — PII_KEYS at line 77–83 covers ENTITY_NAME, FEAT_DESC, INBOUND_FEAT_DESC, CANDIDATE_FEAT_DESC, FELEM_VALUE. It does not cover raw source record fields like NAME_LAST, NAME_FIRST, ADDR_LINE1, etc. This may be intentional (masking only resolved/scored output, not raw records), but worth confirming the scope is sufficient for the intended use case.

✅ No SQL injection risk — The find() method uses sz_dbo.sql_prep() (parameterized queries) for both SQL statements, with starting_id passed as a bound parameter, not interpolated.

Summary

Category	Result
Code Quality	Minor issues (argparse flag naming, separator check style)
Testing	No tests for new functions
Documentation	✅ Complete
Security	✅ No critical issues; PII scope worth confirming

Key items to address before merge:

--mask_pii → --mask-pii (CLI convention)
Consider adding unit tests for mask_pii/mask_json_pii/behavior_rank
token not in ',;:|/\\-' → use a set literal for clarity

Automated code review analyzing defects and coding standards

github-actions · 2026-04-07T21:08:15Z

🤖 Claude Code Review

Now I have enough context for a full review. Here's my analysis:

PR Code Review: v0.0.38

Code Quality

✅ No commented-out code — The diff also removes previously commented-out lines (e.g., data_source logic in find), which is a positive cleanup.

✅ Meaningful variable names — ftype_behavior, behavior_rank, fmt_behavior_change, mask_pii, visible are all clear and descriptive.

✅ DRY principle — mask_pii is reused correctly. fmt_behavior_change centralizes behavior arrow logic.

❌ Line length — sz_tools/sz_explorer:3917 slightly approaches the 120-char limit and reads poorly:

sql = sz_dbo.sql_prep(f"select RES_ENT_ID from RES_ENT where RES_ENT_ID {operator} ? order by RES_ENT_ID {order}")

Consider breaking the f-string onto its own line.

⚠️ mask_json_pii mutates in-place AND returns — sz_tools/sz_explorer:102-114. The function modifies data in place but also returns it. At every call site the return value is discarded. This is an inconsistent API — pick one convention: either return the modified value (and don't mutate) or mutate in-place and return None. As-is, it's misleading.

⚠️ find loop is unbounded — sz_tools/sz_explorer:3919-3923. The method fetches all entities greater/less than starting_id with no LIMIT, then verifies each against RES_ENT_OKEY in a separate query per row. On a large database with many entities that lack OKEY entries, this could be very slow. A LIMIT N with retry or a JOIN would be safer.

⚠️ behavior_rank locals not extracted — sz_tools/sz_explorer:1213-1221. freq_rank is rebuilt as a new dict on every call since it's defined inside the static method. Consider making it a module-level or class-level constant.

✅ ROLE fix is correct — The switch from step_type == "Create virtual entity" to step[ve_num]["VIRTUAL_ENTITY_ID"] == step["INBOUND_VIRTUAL_ENTITY_ID"] at sz_tools/sz_explorer:1550 is the right approach and aligns with the columnar view fix.

✅ Phantom feature ID guard — sz_tools/sz_explorer:1513-1515. The guard for missing lib_feat_id in entity_features is correct and the warning message is useful.

✅ No SQL injection risk — operator in find only ever comes from internal callers passing ">" or "<", and order is derived from that. Low risk, but worth documenting with a comment if this function is ever refactored.

Testing

❌ No tests for new functionality — tests/example_test.py is only a placeholder. There are no unit tests for:

mask_pii (various inputs: empty string, tokens shorter than visible, tokens at boundary, multi-token strings with delimiters)
mask_json_pii (recursive structure, non-PII keys untouched, list traversal)
behavior_rank (known frequency strings, with/without E/S suffixes, unknown strings)
fmt_behavior_change (upgrade vs. downgrade arrow, matching behavior returns "")
The find fix (next/previous with no prior get, no DB access case)

These are particularly important for mask_pii and behavior_rank since they contain non-trivial logic.

❌ Test coverage — Coverage cannot exceed the placeholder test. This should be addressed.

Documentation

✅ CHANGELOG.md updated — Version 0.0.38 entry is correctly placed with added features and fixes, dated 2026-04-07.

✅ pyproject.toml version bumped to 0.0.38.

✅ Inline docstrings present on mask_pii and mask_json_pii.

⚠️ CLI help text — sz_tools/sz_explorer:4358: "mask PII data, optionally specify number of visible characters (default: 2)" doesn't mention that -M alone uses the default of 2 visible characters. Consider: "mask PII data; use -M alone for 2 visible chars, or -M N for N visible chars".

Security

✅ No hardcoded credentials.

✅ No sensitive data in logs — The warning at sz_tools/sz_explorer:1514 prints DATA_SOURCE and RECORD_ID (not PII field values), which is appropriate.

✅ No license files (.lic) detected.

⚠️ PII masking applied after SDK call caching — sz_tools/sz_explorer:1266-1270. The call cache (self.last_call_cache) is populated with the raw response before PII masking is applied (at line 1269-1270 the masking runs on response_data, and the cache append is at 1272). Since response_data is a dict and mask_json_pii mutates it in-place, the cached version is masked. However, this is subtle — a reader could incorrectly believe the cache stores pre-masked data. A comment clarifying the mutation behavior would help. If the intent were ever to cache raw data for debugging, this would silently mask it.

Summary

Area	Status
Code style	✅ mostly clean, minor line-length and API-consistency issues
No commented-out code	✅
DRY / naming	✅
Defects	⚠️ unbounded DB loop in `find`, `freq_rank` rebuilt each call
Tests	❌ no tests for new/fixed logic
CHANGELOG / docs	✅
Security	⚠️ subtle mutation-before-cache ordering (actually safe but fragile)

Blocking items: The lack of unit tests for `mask_pii`, `behavior_rank`, and `fmt_behavior_change` is a significant gap given the logic complexity. The `find` loop's potential performance issue on large databases is worth addressing before merge.

Automated code review analyzing defects and coding standards

github-actions · 2026-04-07T21:58:32Z

🤖 Claude Code Review

Now I have enough context to provide a thorough review.

PR Code Review: v0.0.38

Code Quality

✅ No commented-out code — The PR actually removes previously commented-out dead code (# data_source = None, # else: # data_source = parm.upper()).

✅ Meaningful variable names — mask_pii, visible, ftype_behavior, behavior_rank, fmt_behavior_change are all clear and descriptive.

✅ DRY principle — PII masking is centralized in mask_pii() and mask_json_pii(), reused consistently across call sites.

❌ Line length violations — The project enforces 120-char lines via Black, but several new lines exceed this:

sz_explorer:1525 — print_message(f"Feature ID {lib_feat_id} on record {record['DATA_SOURCE']}: {record['RECORD_ID']} not in entity features", "warning") (~146 chars)
sz_explorer:1559 — "ROLE": "INBOUND" if step[ve_num]["VIRTUAL_ENTITY_ID"] == step["INBOUND_VIRTUAL_ENTITY_ID"] else "CANDIDATE", (~131 chars)
sz_explorer:3923 — sql = sz_dbo.sql_prep(f"select RES_ENT_ID from RES_ENT where RES_ENT_ID {operator} ? order by RES_ENT_ID {order}") (~123 chars)
sz_explorer:4391 — sdk_wrapper = EdaSdkWrapper(engine_config, debug_trace=args.debug_trace, webapp_url=args.webapp_url, mask_pii=args.mask_pii) (~132 chars)

❌ Input validation missing for --mask_pii — The argument accepts any int, including negative values. Calling mask_pii("hello", -1) produces "hell******" (Python slice token[:-1] + len(token) - (-1) stars), which is silently wrong. Should add type=int + a choices range or explicit guard:

# In argparser:
type=int,
metavar="N",
# Or add a validation after parse_args:
if args.mask_pii is not None and args.mask_pii < 0:
    argParser.error("--mask_pii must be >= 0")

✅ No defects found in core logic — The find() fix is correct: last_entity_id initializes to 0 (line 3006), so not starting_id and operator == "<" properly guards the "previous with no prior get" case, while > 0 correctly returns the first entity for next.

✅ Behavior rank logic — behavior_rank() correctly handles the rstrip("ES") suffix extraction. Empty/unknown behavior gracefully returns rank 990.

ℹ️ Minor: close() swallows all exceptions broadly — sz_explorer:1212-1216. Silently ignoring any Exception is broad. If sz_factory was never assigned (some unforeseen init path), an AttributeError would be eaten. Consider at minimum catching SzError specifically or checking hasattr(self, 'sz_factory').

ℹ️ Minor: mask_json_pii mutates in place but also returns — The return value is unused at call sites (call_sdk, get_features). This is harmless but inconsistent; consider dropping the return data or using it at call sites for clarity.

Testing

❌ No unit tests for new functions — mask_pii(), mask_json_pii(), behavior_rank(), and fmt_behavior_change() are pure/static functions that are straightforward to unit test. The test suite currently contains only example_test.py with a trivial assertion. Key cases to cover:

mask_pii("John Smith", 2) → "Jo** Sm***"
mask_pii("", 2) → "" (empty passthrough)
mask_pii("Jo", 2) → "Jo" (token length == visible)
mask_json_pii with nested dicts, missing PII keys, non-string values
behavior_rank("F1E") vs behavior_rank("FF") ordering
fmt_behavior_change when score_behavior == expected (returns "")

❌ Test coverage — No coverage measurement can currently verify the >80% threshold with only the example test.

Documentation

✅ CHANGELOG.md updated — Correct Keep a Changelog format with [0.0.38] - 2026-04-07 section, properly listing added features and fixes.

✅ pyproject.toml version bumped — 0.0.37 → 0.0.38.

✅ Inline comments — New sections use the existing # --- comment block style consistently.

Security

✅ No hardcoded credentials

✅ No sensitive data in logs — The warning message at sz_explorer:1525 prints DATA_SOURCE and RECORD_ID but not PII feature values. Appropriate.

✅ No license files (.lic) or AQAAAD-prefixed strings detected

✅ SQL parameterization — The new SQL in find() (line 3923-3924) uses parameterized queries (?) for user-supplied values. The operator (> / <) and order (ASC / DESC) are controlled by internal code only, not user input.

✅ PII masking applied before caching — mask_json_pii runs before last_call_cache.append(...) in call_sdk, so the debug export cache also contains masked data when PII masking is enabled.

Summary

Category	Status
Code style (line length)	❌ 4 violations
Input validation (`--mask_pii` negative)	❌ Missing
Unit tests for new functions	❌ Missing
Commented-out code	✅
DRY / naming	✅
Logic correctness	✅
Security	✅
Documentation	✅

The functional changes (PII masking, behavior indicator, `next`/`previous` fixes, SDK factory cleanup) are correct and well-structured. The main gaps are missing unit tests for the new pure functions and the unvalidated negative-integer edge case for `--mask_pii`.

Automated code review analyzing defects and coding standards

github-actions · 2026-04-07T22:11:51Z

🤖 Claude Code Review

Now I have enough context. Here is the full review:

Code Review: `sz_explorer` v0.0.38

Code Quality

✅ No commented-out code — The PR actually removes previously commented-out code from find(). Good cleanup.

✅ Meaningful variable names — ftype_behavior, behavior_rank, mask_pii, score_display are all clear.

✅ DRY principle — PII masking is applied at appropriate call sites without unnecessary duplication.

❌ Bug: behavior_rank incorrectly handles "NAME" frequency

sz_tools/sz_explorer:1215

freq = behavior.rstrip("ES")

str.rstrip("ES") strips any combination of characters 'E' and 'S' from the right. The string "NAME" ends in 'E', so:

"NAME".rstrip("ES")  →  "NAM"
freq_rank.get("NAM", 99)  →  99   # should be 5

Any feature type with FTYPE_FREQ = "NAME" (with or without the E/S suffix) will receive the wrong rank (990+), causing the upgrade/downgrade arrow to point the wrong direction for name-type features.

Fix: parse the freq code as a known prefix instead of using rstrip:

@staticmethod
def behavior_rank(behavior):
    freq_rank = {"F1": 1, "FF": 2, "FM": 3, "FVM": 4, "NAME": 5}
    for freq, rank in sorted(freq_rank.items(), key=lambda x: -len(x[0])):
        if behavior.startswith(freq):
            suffix = behavior[len(freq):]
            r = rank * 10
            if "E" in suffix:
                r -= 2
            if "S" in suffix:
                r -= 1
            return r
    return 99 * 10

❌ Line length violations (project enforces 120 chars)

sz_tools/sz_explorer:1553 (~137 chars with indentation):

"ROLE": "INBOUND" if step[ve_num]["VIRTUAL_ENTITY_ID"] == step["INBOUND_VIRTUAL_ENTITY_ID"] else "CANDIDATE",

sz_tools/sz_explorer:1519 (~145 chars with indentation):

print_message(f"Feature ID {lib_feat_id} on record {record['DATA_SOURCE']}: {record['RECORD_ID']} not in entity features", "warning")

⚠️ mask_json_pii mutates response_data before caching

sz_tools/sz_explorer:1269-1272

if self.mask_pii:
    mask_json_pii(response_data, self.mask_pii)

self.last_call_cache.append([caller, api_name, flags, parms, response_data])

mask_json_pii modifies the dict in-place, so the call cache (used for --debug/export) stores already-masked data. If the intent is to let developers export raw API responses for debugging while still masking display output, this is a defect. If masking the cache is acceptable, add a comment to make it explicit.

⚠️ -M 0 silently disables masking

sz_tools/sz_explorer:4350-4358 — With type=int and default=0, passing -M 0 sets self.mask_pii = 0, which is falsy, so masking is disabled rather than showing 0 visible characters. A user invoking -M 0 expecting fully-masked output (all asterisks) will be confused. Consider rejecting 0 or treating it as "mask all."

✅ find() refactor logic is correct — fetch_all_rows calls fetchall() internally (confirmed in _sz_database.py:446), so all rows are in memory before the inner fetch_row call runs — no cursor-in-cursor conflict.

✅ INBOUND/CANDIDATE role fix — Using ENTITY_LIST[0]["ROLE"] rather than STEP_TYPE is more robust and directly reflects what the SDK reports.

Testing

❌ No tests for new logic — mask_pii, mask_json_pii, and behavior_rank are pure functions that are easy to unit test. The project has only a placeholder test (tests/example_test.py). The behavior_rank bug above would have been caught by a simple test:

assert EdaSdkWrapper.behavior_rank("NAME") == 50  # fails with current code → 988

Documentation

✅ CHANGELOG.md updated — Entries for 0.0.38 are accurate and well-structured.

✅ Version bumped in pyproject.toml.

Security

✅ No hardcoded credentials.

✅ No license files checked in.

✅ SQL in find() — operator (> / <) and order (ASC / DESC) are both hardcoded internally, not user-supplied. No injection risk.

✅ PII masking feature itself is a positive security addition.

Summary

Severity	Issue
High	`behavior_rank` incorrectly handles `"NAME"` frequency — wrong arrows on name features
Medium	Line length violations on 2 lines
Medium	`mask_json_pii` mutates in-place before caching — debug cache contains masked data
Low	`-M 0` silently disables masking instead of masking all characters
Low	No unit tests for pure functions (`mask_pii`, `mask_json_pii`, `behavior_rank`)

Automated code review analyzing defects and coding standards

… -M3) #341 Fix how INBOUND/CANDIDATE role using INBOUND_VIRTUAL_ENTITY_ID from SDK Guard against phantom feature IDs in record FEATURE_IDS (SDK 4.2.0-4.2.1 bug)

…y on exit" This reverts commit a543325.

github-actions · 2026-04-08T00:57:02Z

🤖 Claude Code Review

Now I have enough context for a thorough review.

Code Review: PR to v0.0.38

Code Quality

✅ No commented-out code — The diff actually removes two blocks of commented-out code (good cleanup).

✅ Meaningful variable names — ftype_behavior, behavior_rank, mask_json_pii, starting_id, candidate_id are all clear.

✅ DRY principle — PII masking logic is centralized in mask_pii / mask_json_pii. The fmt_behavior_change method is reused across why and how display paths.

❌ Line length violation — sz_tools/sz_explorer:4385 is ~132 characters, exceeding the 120-char Black limit:

sdk_wrapper = EdaSdkWrapper(engine_config, debug_trace=args.debug_trace, webapp_url=args.webapp_url, mask_pii=args.mask_pii)

Should be wrapped (Black would split this across lines).

❌ Potential defect — MATCHED_FEAT_DESC not masked — PII_KEYS (sz_tools/sz_explorer:77-83) includes FEAT_DESC, INBOUND_FEAT_DESC, CANDIDATE_FEAT_DESC, but omits MATCHED_FEAT_DESC. This value is rendered directly into the why display at line 1767:

matched_desc = f"\u2514\u2500\u2500 {feature['MATCHED_FEAT_DESC']} ({score}){behavior_change}"

With --mask_pii enabled, this field will still show unmasked PII in the tree view.

❌ Edge case in behavior_rank with unknown feature type — fmt_behavior_change (sz_tools/sz_explorer:1223) calls self.ftype_behavior.get(ftype_code, ""), which returns "" for unknown feature types. behavior_rank("") then returns 990, causing almost any score_behavior to appear as an "upgrade" (↑) regardless of actual behavior. Should guard:

if not score_behavior or not expected or score_behavior == expected:
    return ""

❌ Performance concern in find — sz_tools/sz_explorer:3917-3923 scans RES_ENT row-by-row, verifying each against RES_ENT_OKEY in a loop. On large databases with many orphaned RES_ENT rows, this could be very slow. A single JOIN query would be more efficient:

SELECT r.RES_ENT_ID FROM RES_ENT r
JOIN RES_ENT_OKEY o ON r.RES_ENT_ID = o.RES_ENT_ID
WHERE r.RES_ENT_ID {operator} ? ORDER BY r.RES_ENT_ID {order} LIMIT 1

⚠️ Minor: -M 0 has no effect — sz_tools/sz_explorer:4355 uses default=0 and the check if self.mask_pii: (line 1269) treats 0 as falsy. A user explicitly passing -M 0 would expect 0 visible chars but masking would be silently skipped. Low impact since 0 visible chars is not a useful setting, but worth noting.

Testing

❌ No unit tests for new functions — mask_pii, mask_json_pii, behavior_rank, and fmt_behavior_change have no tests. These are good candidates for unit tests since they're pure/near-pure functions with clear inputs and outputs:

mask_pii("John Smith", 2) → "Jo** Sm***"
mask_pii("", 2) → ""
behavior_rank("F1E") → 8
fmt_behavior_change when behavior matches, is unknown, upgrades, downgrades

Current test coverage is essentially zero for this feature. Only tests/example_test.py exists.

Documentation

✅ CHANGELOG updated — Correctly documents what changed in 0.0.38.

✅ Version bumped — pyproject.toml updated from 0.0.37 → 0.0.38.

✅ Inline docstrings — mask_pii and mask_json_pii have docstrings.

Security

✅ No hardcoded credentials.

✅ SQL parameterization — starting_id (user input in find) is passed as a query parameter, not interpolated. The operator and order values in the f-string are not user-controlled (they're always >/< and ASC/DESC from hardcoded call sites).

✅ No sensitive data in logs — The print_message for phantom feature IDs at line 1514 logs only IDs (integers), not PII content.

✅ No license files (.lic) committed.

Summary

Severity	Issue
❌ Bug	`MATCHED_FEAT_DESC` not in `PII_KEYS` — PII leaks in why-tree view even with `--mask_pii`
❌ Bug	`behavior_rank("")` edge case shows spurious upgrade arrows for unknown feature types
❌ Style	Line 4385 exceeds 120 chars
❌ Tests	No tests for any of the 4 new pure functions
⚠️ Perf	`find` loop + N verify queries instead of 1 JOIN query
⚠️ Minor	`-M 0` silently no-ops

Automated code review analyzing defects and coding standards

github-actions · 2026-04-08T01:09:29Z

🤖 Claude Code Review

I'll proceed with the review based on what's visible in the diff and the project's CLAUDE.md.

Code Review: PR to v0.0.38

Code Quality

Code Style ✅
Follows the project's 120-char line length, Black formatting, and isort conventions. Module-level constant PII_KEYS is uppercase. Docstrings on new public functions.

No Commented-Out Code ✅
The PR actually removes commented-out dead code from find() (the data_source block). One pre-existing commented-out line (# self.ftype_code_lookup["AMBIGUOUS_ENTITY"]["SORT_ORDER"] = 1) near line ~1174 remains, but it predates this PR.

Meaningful Variable Names ✅
mask_pii, visible, ftype_behavior, behavior_rank, behavior_change, score_display are all clear.

DRY Principle ✅
No obvious duplication introduced.

Defects / Logic Issues ❌ (minor)

mask_pii — negative visible argument produces incorrect output (sz_tools/sz_explorer, mask_pii function ~line 87):
The argparse allows -M -1 (valid int). With visible=-1, token[:visible] becomes token[:-1] (all but last char), and len(token) - visible becomes len(token) + 1, producing extra *s and wrong output. A guard is needed:
```
visible = max(0, visible)
```
-M 0 disables masking instead of masking everything (sz_tools/sz_explorer ~line 4350):
default=0 combined with if self.mask_pii: means passing -M 0 is indistinguishable from not passing -M at all. Users expecting full masking with 0 visible chars would be surprised. Consider using None as the default and checking self.mask_pii is not None, or document that 0 is not a valid value.
fetch_all_rows in find() — potential performance concern (sz_tools/sz_explorer, find method):
The new implementation iterates all rows from RES_ENT ordered by ID, verifying each in RES_ENT_OKEY. In the normal (non-phantom) case this stops immediately. In a degraded database with many phantom IDs it could iterate many rows in memory. This is acceptable for a tool, just worth documenting.
separator check readability (sz_tools/sz_explorer, mask_pii ~line 89):
```
token not in ',;:|/\\-'
```
This works (substring check on single-char tokens), but intent is clearer as a set literal:
```
token not in {',', ';', ':', '|', '/', '\\', '-'}
```
fmt_behavior_change strip inconsistency: In the columnar view behavior_change.strip() is called (removes leading space), but in the why/how tree display behavior_change is appended directly with its leading space. This is fine functionally but is mildly asymmetric.

CLAUDE.md ✅
Well-structured and not environment-specific.

Testing ❌

No test files are added or modified in this PR. New logic introduced includes:

mask_pii() — string masking with token splitting
mask_json_pii() — recursive dict/list masking
behavior_rank() — ranking calculation with suffix parsing
fmt_behavior_change() — behavioral change detection and formatting
Reworked find() — entity navigation with phantom-entity skipping

These are all independently testable pure/near-pure functions. mask_pii in particular has non-obvious edge cases (empty string, all-separator input, visible=0, Unicode). Unit tests are missing.

Documentation

CHANGELOG.md ✅ — Updated with clear entries for all changes.

Version bump ✅ — pyproject.toml updated to 0.0.38.

README — Cannot verify from the diff if --mask_pii / -M is documented in the README or any usage docs. If the tool has a usage section in the README, it should be updated.

Inline comments ✅ — mask_pii and mask_json_pii have docstrings. The PII_KEYS frozenset is self-documenting.

Markdown formatting ✅ — CHANGELOG additions follow the existing format.

Security

No hardcoded credentials ✅

PII masking feature ✅ — Correctly applied at the API response level (_call_sdk) and at the direct DB access level (feat_cache). Both paths are covered.

Warning message exposes identifiers — The new warning at sz_tools/sz_explorer ~line 1516:

print_message(f"Feature ID {lib_feat_id} on record {record['DATA_SOURCE']}: {record['RECORD_ID']} not in entity features", "warning")

This logs DATA_SOURCE and RECORD_ID — these are identifiers, not PII values, so this is acceptable. Worth noting that this message is emitted even when --mask_pii is active, but record IDs and data source names are not PII.

No .lic files ✅

Input validation — See defect #1 above (negative visible).

Summary

Category	Status
Code style	✅
No commented-out code	✅
DRY / naming	✅
Defects	❌ Negative `visible` bug; `-M 0` disables rather than fully-masks
Unit tests	❌ None added for new functions
Documentation	✅ (README should be verified)
Security	✅

Must fix: The `visible < 0` bug in `mask_pii`. Recommend also addressing the `-M 0` ambiguity and adding at minimum a few unit tests for `mask_pii` and `behavior_rank`.

Automated code review analyzing defects and coding standards

jbutcher21 requested a review from a team as a code owner April 7, 2026 07:42

jbutcher21 requested review from antaenc and docktermj April 7, 2026 07:43

docktermj approved these changes Apr 7, 2026

View reviewed changes

jbutcher21 added 10 commits April 7, 2026 17:53

#363 Add --mask_pii flag for PII data masking in sz_explorer

45b8cf3

#340 Fix previous command going to beginning instead of previous entity

9a41d87

#340 Fix next/previous crash with no prior get and falsy-zero param bug

cc76698

#340 Guard next/previous when direct database access is not available

7dce232

#341 Fix how columnar view sometimes showing features reversed

9d8b455

#364 Show scoring behavior downgrade/upgrade indicator on why/how

66efb5a

Prepare release 0.0.38

802ae0f

#363 Make --mask_pii accept configurable visible char count (-M, -M1,…

3d88018

… -M3) #341 Fix how INBOUND/CANDIDATE role using INBOUND_VIRTUAL_ENTITY_ID from SDK Guard against phantom feature IDs in record FEATURE_IDS (SDK 4.2.0-4.2.1 bug)

#363 Add EdaSdkWrapper.close() to properly destroy SDK factory on exit

4cd25db

Revert "#363 Add EdaSdkWrapper.close() to properly destroy SDK factor…

3c8c8e8

…y on exit" This reverts commit a543325.

jbutcher21 force-pushed the 363-jb-mask-pii branch from 6d55ef3 to 3c8c8e8 Compare April 8, 2026 00:53

jbutcher21 enabled auto-merge (squash) April 8, 2026 00:53

This was referenced Apr 8, 2026

sz_explorer next/previous issue #340

Closed

sz_explorer: how sometimes has features reversed on decision tree #341

Closed

sz_explorer: add --mask_pii flag for PII data masking #363

Closed

sz_explorer: show behavior downgrade indicator on why/how #364

Closed

Add frozenset, isspace, rstrip to cspell dictionary

8bf831a

jbutcher21 merged commit a9b016c into main Apr 8, 2026
76 checks passed

jbutcher21 deleted the 363-jb-mask-pii branch April 8, 2026 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.0.38: PII masking, next/previous fixes, how display fixes, behavior indicator#365

Release 0.0.38: PII masking, next/previous fixes, how display fixes, behavior indicator#365
jbutcher21 merged 11 commits intomainfrom
363-jb-mask-pii

jbutcher21 commented Apr 7, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jbutcher21 commented Apr 7, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

PR Code Review: v0.0.38

Code Quality

Testing

Documentation

Security

Summary

Uh oh!

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

PR Code Review: v0.0.38

Code Quality

Testing

Documentation

Security

Summary

Blocking items: The lack of unit tests for mask_pii, behavior_rank, and fmt_behavior_change is a significant gap given the logic complexity. The find loop's potential performance issue on large databases is worth addressing before merge.

Uh oh!

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

PR Code Review: v0.0.38

Code Quality

Testing

Documentation

Security

Summary

The functional changes (PII masking, behavior indicator, next/previous fixes, SDK factory cleanup) are correct and well-structured. The main gaps are missing unit tests for the new pure functions and the unvalidated negative-integer edge case for --mask_pii.

Uh oh!

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Code Review: sz_explorer v0.0.38

Code Quality

Testing

Documentation

Security

Summary

Uh oh!

github-actions bot commented Apr 8, 2026

🤖 Claude Code Review

Code Review: PR to v0.0.38

Code Quality

Testing

Documentation

Security

Summary

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2026

🤖 Claude Code Review

Code Review: PR to v0.0.38

Code Quality

Testing ❌

Documentation

Security

Summary

Must fix: The visible < 0 bug in mask_pii. Recommend also addressing the -M 0 ambiguity and adding at minimum a few unit tests for mask_pii and behavior_rank.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jbutcher21 commented Apr 7, 2026 •

edited by github-actions bot

Loading

Blocking items: The lack of unit tests for `mask_pii`, `behavior_rank`, and `fmt_behavior_change` is a significant gap given the logic complexity. The `find` loop's potential performance issue on large databases is worth addressing before merge.

The functional changes (PII masking, behavior indicator, `next`/`previous` fixes, SDK factory cleanup) are correct and well-structured. The main gaps are missing unit tests for the new pure functions and the unvalidated negative-integer edge case for `--mask_pii`.

Code Review: `sz_explorer` v0.0.38

Must fix: The `visible < 0` bug in `mask_pii`. Recommend also addressing the `-M 0` ambiguity and adding at minimum a few unit tests for `mask_pii` and `behavior_rank`.