From dc4528d2bce821a677f142d04ca980d3a70c0b13 Mon Sep 17 00:00:00 2001 From: Richard Palethorpe Date: Fri, 19 Jun 2026 10:20:16 +0100 Subject: [PATCH 1/2] fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow-up to the NER tier engine (#10360), already on master. This carries only the incremental review fixes and tests that postdate that merge — the feature itself is not re-introduced. Review fixes: - openai_completion.go: remove the dead `elem >= 0` conjunct in applyAnyText (the `elem < 0` guard above already returns). - application.go: collapse ResolvePIIPolicy's inline re-implementation of PIIIsEnabled to a single cfg.PIIIsEnabled() call (sole source of the "explicit pii.enabled wins, else cloud-proxy default" rule) and return true past the !enabled guard where it is provable. - pattern.go: hoist the triple `appConfig != nil && EnableTracing` check in patternDetector.Detect into one local. - grammar.go: MaxQuantifier was 4096, but Go's regexp/syntax rejects repeat bounds above 1000 at Parse time, so walk()'s {n,m} guard could never fire — dead code shadowed by the parser. Lower it to 512 so a bound in (512,1000] is rejected here with an actionable error; >1000 still fails closed via Parse. Specs pin the relationship so the guard can't silently revert. - PatternListEditor.jsx: clamp a directly-typed negative min_len to >=0 and force the DOM value back when clamping (min={0} only constrained the spinner, so a negative reached saved config and silently disabled the length filter). Tests: - piipattern_test.go: MaxQuantifier guard specs (must stay live, not dead). - model-config.spec.js: assert the min_len clamp, and that entity_actions collapses a duplicate group to a single row (map semantics; regression guard against emitting an array that drops a row on save). - tests/e2e-backends: token_classify capability driving the TokenClassify gRPC RPC against the backend image, asserting byte-correct, UTF-8 rune-aligned spans (entity.Text == text[start:end]) at threshold 0. Verified on CPU via `make test-extra-backend-privacy-filter` (3/3 specs). - Makefile: test-extra-backend-privacy-filter wrapper. - tests/e2e: e2e_pii_ner_test.go drives /api/pii/analyze + /api/pii/redact (mask + block) through the full HTTP -> detector -> redactor path; gated on PII_NER_MODEL_GGUF so the default suite is unaffected. - .github/workflows/tests-pii-ner-e2e.yml: path-filtered / nightly CI job running the container harness on CPU. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe --- .github/workflows/tests-pii-ner-e2e.yml | 97 +++++++++ Makefile | 10 + core/application/application.go | 10 +- core/http/react-ui/e2e/model-config.spec.js | 48 +++++ .../src/components/PatternListEditor.jsx | 13 +- .../routing/piiadapter/openai_completion.go | 2 +- core/services/routing/piidetector/pattern.go | 7 +- core/services/routing/piipattern/grammar.go | 14 +- .../routing/piipattern/piipattern_test.go | 40 ++++ tests/e2e-backends/backend_test.go | 81 ++++++-- tests/e2e/e2e_pii_ner_test.go | 186 ++++++++++++++++++ tests/e2e/e2e_suite_test.go | 43 ++++ 12 files changed, 516 insertions(+), 35 deletions(-) create mode 100644 .github/workflows/tests-pii-ner-e2e.yml create mode 100644 tests/e2e/e2e_pii_ner_test.go diff --git a/.github/workflows/tests-pii-ner-e2e.yml b/.github/workflows/tests-pii-ner-e2e.yml new file mode 100644 index 000000000000..2f95f3f46690 --- /dev/null +++ b/.github/workflows/tests-pii-ner-e2e.yml @@ -0,0 +1,97 @@ +--- +name: 'PII NER tier E2E (live GGUF, CPU)' + +# Runs the real privacy-filter GGUF NER tier end-to-end on CPU — the gap the +# hermetic tests/e2e suite cannot cover (it only exercises the in-process +# pattern tier). Heavy (builds the C++ backend image + downloads a ~2.7 GB +# GGUF), so it is path-filtered on PRs and otherwise runs nightly / on demand. +# +# This drives the container-level harness (tests/e2e-backends) via +# `make test-extra-backend-privacy-filter`: it builds the privacy-filter image, +# downloads the model, loads it on CPU, and asserts byte-correct, UTF-8-aligned +# TokenClassify spans. The complementary HTTP-path specs in tests/e2e +# (e2e_pii_ner_test.go) Skip unless PII_NER_MODEL_GGUF is wired. + +on: + workflow_dispatch: + schedule: + - cron: '0 3 * * *' + push: + branches: + - master + paths: + - 'backend/cpp/privacy-filter/**' + - 'backend/Dockerfile.privacy-filter' + - 'core/services/routing/pii/**' + - 'core/services/routing/piidetector/**' + - 'core/backend/token_classify.go' + - 'core/http/endpoints/localai/pii.go' + - 'core/schema/pii.go' + - 'tests/e2e-backends/**' + - 'tests/e2e/e2e_pii_ner_test.go' + - 'tests/e2e/e2e_suite_test.go' + - '.github/workflows/tests-pii-ner-e2e.yml' + pull_request: + paths: + - 'backend/cpp/privacy-filter/**' + - 'backend/Dockerfile.privacy-filter' + - 'core/services/routing/pii/**' + - 'core/services/routing/piidetector/**' + - 'core/backend/token_classify.go' + - 'core/http/endpoints/localai/pii.go' + - 'core/schema/pii.go' + - 'tests/e2e-backends/**' + - 'tests/e2e/e2e_pii_ner_test.go' + - 'tests/e2e/e2e_suite_test.go' + - '.github/workflows/tests-pii-ner-e2e.yml' + +concurrency: + group: ci-tests-pii-ner-e2e-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }} + cancel-in-progress: ${{ github.event_name == 'pull_request' }} + +jobs: + tests-pii-ner-e2e: + runs-on: ubuntu-latest + strategy: + matrix: + go-version: ['1.25.x'] + steps: + - name: Clone + uses: actions/checkout@v6 + with: + submodules: true + - name: Free disk space + run: | + sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true + sudo docker image prune --all --force || true + df -h + - name: Configure apt mirror on runner + uses: ./.github/actions/configure-apt-mirror + - name: Setup Go ${{ matrix.go-version }} + uses: actions/setup-go@v5 + with: + go-version: ${{ matrix.go-version }} + cache: false + - name: Proto Dependencies + run: | + curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \ + unzip -j -d /usr/local/bin protoc.zip bin/protoc && \ + rm protoc.zip + go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 + go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af + PATH="$PATH:$HOME/go/bin" make protogen-go + - name: Dependencies + run: | + sudo apt-get update + sudo apt-get install -y build-essential + # Builds local-ai-backend:privacy-filter, downloads the GGUF, loads it on + # CPU and runs the token_classify capability spec (byte-offset contract). + - name: Run live PII NER backend E2E + run: PATH="$PATH:$HOME/go/bin" make test-extra-backend-privacy-filter + - name: Setup tmate session if tests fail + if: ${{ failure() }} + uses: mxschmitt/action-tmate@v3.23 + with: + detached: true + connect-timeout-seconds: 180 + limit-access-to-actor: true diff --git a/Makefile b/Makefile index 8da9aacee2dd..be0711b47baf 100644 --- a/Makefile +++ b/Makefile @@ -690,6 +690,16 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp BACKEND_TEST_CTX_SIZE=2048 \ $(MAKE) test-extra-backend +## privacy-filter: the PII/NER token-classification backend. Exercises the +## TokenClassify RPC and asserts byte-correct, UTF-8-aligned span offsets +## against the openai-privacy-filter multilingual GGUF (CPU-runnable, ~50M +## active params). This is the live-backend coverage for the PII NER tier. +test-extra-backend-privacy-filter: docker-build-privacy-filter + BACKEND_IMAGE=local-ai-backend:privacy-filter \ + BACKEND_TEST_MODEL_URL=https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf \ + BACKEND_TEST_CAPS=health,load,token_classify \ + $(MAKE) test-extra-backend + ## vllm is resolved from a HuggingFace model id (no file download) and ## exercises Predict + streaming + tool-call extraction via the hermes parser. ## Requires a host CPU with the SIMD instructions the prebuilt vllm CPU diff --git a/core/application/application.go b/core/application/application.go index d5c286318dbd..9bbf26bb8bd7 100644 --- a/core/application/application.go +++ b/core/application/application.go @@ -341,11 +341,9 @@ func (a *Application) ResolvePIIPolicy(cfg *config.ModelConfig) (enabled bool, d } appCfg := a.ApplicationConfig() - if cfg.PII.Enabled != nil { - enabled = *cfg.PII.Enabled - } else { - enabled = cfg.PIIIsEnabled() // backend default (cloud-proxy) - } + // PIIIsEnabled already encodes "explicit pii.enabled wins, else backend + // default (cloud-proxy)" — the single source of that rule. + enabled = cfg.PIIIsEnabled() if !enabled { return false, nil } @@ -354,7 +352,7 @@ func (a *Application) ResolvePIIPolicy(cfg *config.ModelConfig) (enabled bool, d if len(detectors) == 0 { detectors = append([]string(nil), appCfg.PIIDefaultDetectors...) } - return enabled, detectors + return true, detectors // enabled is necessarily true past the !enabled guard } // PIIPolicyResolver adapts ResolvePIIPolicy to pii.PolicyResolver for diff --git a/core/http/react-ui/e2e/model-config.spec.js b/core/http/react-ui/e2e/model-config.spec.js index 2d7f0f8bdcd3..96a73b5433a0 100644 --- a/core/http/react-ui/e2e/model-config.spec.js +++ b/core/http/react-ui/e2e/model-config.spec.js @@ -288,6 +288,21 @@ test.describe('Model Editor - Interactive Tab', () => { await expect(page.locator('input[placeholder^="match,"]')).toBeVisible() }) + test('pattern min_len clamps a directly-typed negative to 0', async ({ page }) => { + const searchInput = page.locator('input[placeholder="Search fields to add..."]') + await searchInput.fill('Custom Secret Patterns') + const dropdown = searchInput.locator('..').locator('..') + await dropdown.locator('div', { hasText: 'Custom Secret Patterns' }).first().click() + + await page.locator('button', { hasText: 'Add pattern' }).click() + // The number input's min={0} only limits the spinner arrows, not keyboard + // entry; the editor must sanitise a typed negative so a meaningless + // negative length floor never reaches the saved config. + const minLen = page.locator('input[aria-label="Minimum length"]') + await minLen.fill('-5') + await expect(minLen).toHaveValue('0') + }) + // Regression: a map-typed field (entity_actions) present in the loaded YAML // must render WITH its values. flattenConfig used to recurse into the map, // scattering it across pii_detection.entity_actions. paths that match @@ -329,4 +344,37 @@ test.describe('Model Editor - Interactive Tab', () => { await expect(page.getByText(/block —/i).first()).toBeVisible() }) + // A map cannot hold two values for one key, so renaming a row to an existing + // group must collapse to a single row (Object.fromEntries, last write wins) + // rather than rendering two conflicting rows that silently lose one on save. + test('entity_actions collapses a duplicate group to a single row', async ({ page }) => { + await page.route('**/api/models/edit/ner-model', (route) => { + route.fulfill({ + contentType: 'application/json', + body: JSON.stringify({ + name: 'ner-model', + config: [ + 'name: ner-model', + 'backend: llama-cpp', + 'pii_detection:', + ' entity_actions:', + ' SSN: block', + ' EMAIL: mask', + '', + ].join('\n'), + }), + }) + }) + + await page.goto('/app/model-editor/ner-model') + + const groupInputs = page.locator('input[aria-label="Entity group"]') + await expect(groupInputs).toHaveCount(2) + + // Rename the EMAIL row to duplicate SSN; the editor collapses to one SSN row. + await groupInputs.nth(1).fill('SSN') + await expect(groupInputs).toHaveCount(1) + await expect(groupInputs.nth(0)).toHaveValue('SSN') + }) + }) diff --git a/core/http/react-ui/src/components/PatternListEditor.jsx b/core/http/react-ui/src/components/PatternListEditor.jsx index f5a82148a638..a8965246cd99 100644 --- a/core/http/react-ui/src/components/PatternListEditor.jsx +++ b/core/http/react-ui/src/components/PatternListEditor.jsx @@ -74,7 +74,18 @@ export default function PatternListEditor({ value, onChange }) { min={0} value={r.min_len || 0} title="Minimum match length (0 = no floor)" - onChange={e => update(i, { min_len: parseInt(e.target.value, 10) || 0 })} + // min={0} only constrains the spinner, not keyboard entry. Clamp a + // typed negative to 0 (a negative floor is meaningless and would + // disable the length filter). When we clamp, force the DOM value + // too: the resulting 0->0 state change is a no-op, so React's + // controlled input would otherwise keep displaying the rejected + // "-5" even though the saved value is 0. + onChange={e => { + const parsed = parseInt(e.target.value, 10) + const n = Math.max(0, parsed || 0) + if (parsed < 0) e.target.value = String(n) + update(i, { min_len: n }) + }} style={{ width: 80, fontSize: '0.8125rem' }} aria-label="Minimum length" /> diff --git a/core/services/routing/piiadapter/openai_completion.go b/core/services/routing/piiadapter/openai_completion.go index 53e158fd91fd..ee956829061a 100644 --- a/core/services/routing/piiadapter/openai_completion.go +++ b/core/services/routing/piiadapter/openai_completion.go @@ -44,7 +44,7 @@ func applyAnyText(v any, elem int, text string) any { if elem < 0 { return text } - if arr, ok := v.([]any); ok && elem >= 0 && elem < len(arr) { + if arr, ok := v.([]any); ok && elem < len(arr) { arr[elem] = text } return v diff --git a/core/services/routing/piidetector/pattern.go b/core/services/routing/piidetector/pattern.go index 1f4e01d1d929..347defb926f1 100644 --- a/core/services/routing/piidetector/pattern.go +++ b/core/services/routing/piidetector/pattern.go @@ -39,8 +39,9 @@ type patternDetector struct { // When tracing is enabled it records a pattern_pii BackendTrace so the matches // (group, byte range, text) show in the Traces UI alongside NER detections. func (d *patternDetector) Detect(_ context.Context, text string) ([]pii.NEREntity, error) { + tracing := d.appConfig != nil && d.appConfig.EnableTracing var start time.Time - if d.appConfig != nil && d.appConfig.EnableTracing { + if tracing { trace.InitBackendTracingIfEnabled(d.appConfig.TracingMaxItems, d.appConfig.TracingMaxBodyBytes) start = time.Now() } @@ -50,12 +51,12 @@ func (d *patternDetector) Detect(_ context.Context, text string) ([]pii.NEREntit var traceEnts []backend.TokenEntity for _, mt := range matches { out = append(out, pii.NEREntity{Group: mt.Group, Start: mt.Start, End: mt.End, Score: 1.0, Text: mt.Text}) - if d.appConfig != nil && d.appConfig.EnableTracing { + if tracing { traceEnts = append(traceEnts, backend.TokenEntity{Group: mt.Group, Start: mt.Start, End: mt.End, Score: 1.0, Text: mt.Text}) } } - if d.appConfig != nil && d.appConfig.EnableTracing { + if tracing { trace.RecordBackendTrace(patternPIITrace(d.modelName, text, traceEnts, start)) } return out, nil diff --git a/core/services/routing/piipattern/grammar.go b/core/services/routing/piipattern/grammar.go index 93ca34c7279f..5171e533f39d 100644 --- a/core/services/routing/piipattern/grammar.go +++ b/core/services/routing/piipattern/grammar.go @@ -28,10 +28,16 @@ const ( // credential shape, small enough that the compiled program stays tiny. MaxPatternLen = 256 // MaxQuantifier caps an explicit {n,m} upper bound. RE2 expands a bounded - // repeat into that many copies, so an uncapped {0,1000000} would blow up - // the compiled program's memory. Unbounded {n,} (no upper) is a loop, not - // an expansion, and is allowed. - MaxQuantifier = 4096 + // repeat into that many copies, so a large bound inflates the compiled + // program. Go's regexp/syntax independently rejects any bound above 1000 + // at Parse time, so this cap MUST stay strictly below 1000 to be a live + // guard rather than dead code shadowed by the parser: a bound in + // (MaxQuantifier, 1000] reaches walk and is rejected here with an + // actionable error, while >1000 is caught earlier by Parse. 512 is far + // larger than any real credential token yet keeps the guard meaningful and + // is defence in depth should the stdlib cap ever rise. Unbounded {n,} (no + // upper) is a loop, not an expansion, and is allowed. + MaxQuantifier = 512 // MaxAlternation caps the arms of a single `a|b|c` alternation. MaxAlternation = 64 // MaxAST bounds recursion depth so a pathologically nested pattern can't diff --git a/core/services/routing/piipattern/piipattern_test.go b/core/services/routing/piipattern/piipattern_test.go index ef38a4992d06..590142c3d68c 100644 --- a/core/services/routing/piipattern/piipattern_test.go +++ b/core/services/routing/piipattern/piipattern_test.go @@ -1,6 +1,7 @@ package piipattern import ( + "fmt" "strings" "testing" @@ -36,6 +37,45 @@ var _ = Describe("ValidatePattern", func() { ) }) +var _ = Describe("MaxQuantifier guard (must stay live, not dead code)", func() { + // Go's regexp/syntax hard-caps repeat bounds at 1000 and rejects anything + // larger at Parse time, before walk() runs. So the walk() {n,m} guard only + // fires for bounds in (MaxQuantifier, 1000]; if MaxQuantifier ever creeps + // to >= 1000 the guard becomes unreachable dead code. These specs pin the + // relationship and prove the guard is the binding constraint in that band. + const stdlibRepeatCap = 1000 + + It("is strictly below the stdlib repeat cap so the guard is reachable", func() { + Expect(MaxQuantifier).To(BeNumerically("<", stdlibRepeatCap), + "MaxQuantifier must be < %d or walk()'s {n,m} guard is dead code (Parse rejects larger bounds first)", stdlibRepeatCap) + }) + + It("accepts a bound at exactly MaxQuantifier", func() { + Expect(ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, MaxQuantifier))).To(Succeed()) + }) + + It("rejects a bound just above MaxQuantifier with our actionable error (proves the guard runs)", func() { + // MaxQuantifier+1 is still parseable (<= stdlib cap), so it reaches + // walk(), where our guard — not the parser — rejects it. + err := ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, MaxQuantifier+1)) + Expect(err).To(HaveOccurred()) + Expect(err.Error()).To(ContainSubstring("bound is too large"), + "a bound in (MaxQuantifier, stdlib cap] must be rejected by walk(), not the parser") + }) + + It("rejects an unbounded {n,} whose lower bound exceeds MaxQuantifier", func() { + err := ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d,}`, MaxQuantifier+1)) + Expect(err).To(HaveOccurred()) + Expect(err.Error()).To(ContainSubstring("bound is too large")) + }) + + It("still fails closed above the stdlib cap (Parse rejects before walk)", func() { + // >1000: caught by syntax.Parse; the message is the parser's, but it + // still fails closed — defence in depth. + Expect(ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, stdlibRepeatCap+1))).NotTo(Succeed()) + }) +}) + var _ = Describe("Compile", func() { It("compiles a valid pattern with leftmost-longest semantics", func() { re, err := Compile(`sk-ant-[A-Za-z0-9_-]{4,}`) diff --git a/tests/e2e-backends/backend_test.go b/tests/e2e-backends/backend_test.go index 6d0d27276851..c60077fd7293 100644 --- a/tests/e2e-backends/backend_test.go +++ b/tests/e2e-backends/backend_test.go @@ -11,6 +11,7 @@ import ( "path/filepath" "strings" "time" + "unicode/utf8" pb "github.com/mudler/LocalAI/pkg/grpc/proto" . "github.com/onsi/ginkgo/v2" @@ -85,27 +86,28 @@ import ( // file path to LoadModel, so GGUF, ONNX, safetensors, .bin etc. all work so // long as the backend under test accepts that format. const ( - capHealth = "health" - capLoad = "load" - capPredict = "predict" - capStream = "stream" - capEmbeddings = "embeddings" - capTools = "tools" - capTranscription = "transcription" - capTTS = "tts" - capImage = "image" - capFaceDetect = "face_detect" - capFaceEmbed = "face_embed" - capFaceVerify = "face_verify" - capFaceAnalyze = "face_analyze" - capFaceAntispoof = "face_antispoof" - capVoiceEmbed = "voice_embed" - capVoiceVerify = "voice_verify" - capVoiceAnalyze = "voice_analyze" + capHealth = "health" + capLoad = "load" + capPredict = "predict" + capStream = "stream" + capEmbeddings = "embeddings" + capTools = "tools" + capTranscription = "transcription" + capTTS = "tts" + capImage = "image" + capFaceDetect = "face_detect" + capFaceEmbed = "face_embed" + capFaceVerify = "face_verify" + capFaceAnalyze = "face_analyze" + capFaceAntispoof = "face_antispoof" + capVoiceEmbed = "voice_embed" + capVoiceVerify = "voice_verify" + capVoiceAnalyze = "voice_analyze" capAudioTransform = "audio_transform" - capLogprobs = "logprobs" - capLogitBias = "logit_bias" - capTokenize = "tokenize" + capLogprobs = "logprobs" + capLogitBias = "logit_bias" + capTokenize = "tokenize" + capTokenClassify = "token_classify" defaultPrompt = "The capital of France is" streamPrompt = "Once upon a time" @@ -550,6 +552,45 @@ var _ = Describe("Backend container", Ordered, func() { GinkgoWriter.Printf("Embedding: %d dims\n", len(res.GetEmbeddings())) }) + // TokenClassify is the PII-NER RPC (privacy-filter backend). The crown-jewel + // invariant is byte-offset correctness: Start/End are half-open BYTE offsets + // into the original UTF-8 text, and the backend's emitted text for a span must + // equal text[Start:End]. We run at Threshold 0 (raw, unfiltered) and assert + // every returned span is in range, rune-aligned, and self-consistent. The + // prompt carries multibyte runes BEFORE the PII so a rune/byte confusion in + // the engine would surface as a shifted slice here. Override the text with + // BACKEND_TEST_TOKEN_CLASSIFY_TEXT for a model that detects a different class. + It("classifies PII spans with byte-correct offsets via TokenClassify", func() { + if !caps[capTokenClassify] { + Skip("token_classify capability not enabled") + } + text := os.Getenv("BACKEND_TEST_TOKEN_CLASSIFY_TEXT") + if text == "" { + text = "Müller paid at café in Zürich; reach john.doe@example.com tomorrow." + } + ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second) + defer cancel() + res, err := client.TokenClassify(ctx, &pb.TokenClassifyRequest{Text: text, Threshold: 0}) + Expect(err).NotTo(HaveOccurred(), "TokenClassify RPC failed") + ents := res.GetEntities() + Expect(ents).NotTo(BeEmpty(), "TokenClassify returned no entities for an obvious-PII sentence") + for _, e := range ents { + start, end := int(e.GetStart()), int(e.GetEnd()) + Expect(start).To(BeNumerically(">=", 0)) + Expect(end).To(BeNumerically(">", start)) + Expect(end).To(BeNumerically("<=", len(text))) + Expect(utf8.RuneStart(text[start])).To(BeTrue(), "start %d is mid-rune in %q", start, text) + if end < len(text) { + Expect(utf8.RuneStart(text[end])).To(BeTrue(), "end %d is mid-rune in %q", end, text) + } + slice := text[start:end] + Expect(utf8.ValidString(slice)).To(BeTrue(), "span %q is not valid UTF-8", slice) + Expect(e.GetText()).To(Equal(slice), "entity text must equal text[start:end]") + GinkgoWriter.Printf("TokenClassify: %q [%d:%d] %s score=%.3f\n", + slice, start, end, e.GetEntityGroup(), e.GetScore()) + } + }) + It("generates an image via GenerateImage", func() { if !caps[capImage] { Skip("image capability not enabled") diff --git a/tests/e2e/e2e_pii_ner_test.go b/tests/e2e/e2e_pii_ner_test.go new file mode 100644 index 000000000000..ec8c6954cdc8 --- /dev/null +++ b/tests/e2e/e2e_pii_ner_test.go @@ -0,0 +1,186 @@ +package e2e_test + +import ( + "bytes" + "context" + "encoding/json" + "io" + "net/http" + "unicode/utf8" + + "github.com/mudler/LocalAI/core/backend" + "github.com/mudler/LocalAI/core/schema" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +// Live PII NER tier e2e. These specs run the real privacy-filter GGUF on CPU +// through the full TokenClassify path — the gap the hermetic suite cannot +// cover (it only exercises the in-process pattern tier). They Skip unless +// PII_NER_MODEL_GGUF is wired in BeforeSuite, so the default PR suite is +// unaffected; the dedicated CI job sets it. +// +// The crown-jewel invariant is byte-offset correctness: entity Start/End are +// half-open BYTE offsets into the original UTF-8 text, and the model's emitted +// text for a span must equal the corresponding byte slice. We assert that two +// ways — directly against ModelTokenClassify (raw, Threshold 0, no redactor +// merge) and against the /api/pii/analyze HTTP contract (post-merge, +// post-MinScore). The multibyte case proves offsets are bytes, not runes. +var _ = Describe("PII NER tier (live privacy-filter GGUF)", func() { + const ( + // Reliable, unambiguous PII the multilingual NER model detects. + emailText = "Please contact John Doe at john.doe@example.com about invoice 4421." + // Multibyte chars BEFORE the email push its byte offset past its rune + // offset, so a rune/byte confusion in the engine or the Go bridge would + // surface as a mismatched slice here but not in the ASCII case above. + multibyteText = "Müller paid at café in Zürich; reach john.doe@example.com tomorrow." + ) + + BeforeEach(func() { + if piiNERModel == "" { + Skip("live PII NER model not wired (set PII_NER_MODEL_GGUF + REALTIME_BACKENDS_PATH; see tests-pii-ner-e2e.yml)") + } + }) + + Context("raw TokenClassify (byte-offset contract)", func() { + It("returns byte-correct, rune-aligned spans for an ASCII email", func() { + ents := tokenClassify(emailText) + Expect(ents).NotTo(BeEmpty(), "model must detect at least one entity in an obvious-PII sentence") + for _, e := range ents { + assertByteCorrectSpan(emailText, e.Start, e.End, e.Text) + } + Expect(spanCoversSubstring(emailText, ents, "john.doe@example.com")).To(BeTrue(), + "some detected span must cover the email address") + }) + + It("keeps byte offsets correct when multibyte runes precede the PII", func() { + ents := tokenClassify(multibyteText) + Expect(ents).NotTo(BeEmpty()) + for _, e := range ents { + // This is the assertion that fails if offsets were computed in + // runes rather than bytes: the slice would be shifted left. + assertByteCorrectSpan(multibyteText, e.Start, e.End, e.Text) + } + Expect(spanCoversSubstring(multibyteText, ents, "john.doe@example.com")).To(BeTrue()) + }) + }) + + Context("HTTP /api/pii/analyze", func() { + It("reports ner-source entities with byte-correct offsets", func() { + status, resp := analyze(schema.PIIAnalyzeRequest{ + Text: emailText, + Detectors: []string{piiNERModel}, + }) + Expect(status).To(Equal(http.StatusOK)) + Expect(resp.Entities).NotTo(BeEmpty()) + for _, e := range resp.Entities { + Expect(e.Source).To(Equal("ner"), "privacy-filter detections must be tagged source=ner") + Expect(e.Action).To(Equal("mask"), "default_action mask must propagate to each entity") + assertByteCorrectSpan(emailText, e.Start, e.End, emailText[e.Start:e.End]) + Expect(e.Score).To(BeNumerically(">=", 0.5), "below-MinScore spans are dropped before the response") + } + }) + }) + + Context("HTTP /api/pii/redact", func() { + It("masks detected PII out of the returned text", func() { + status, body := redact(schema.PIIAnalyzeRequest{ + Text: emailText, + Detectors: []string{piiNERModel}, + }) + Expect(status).To(Equal(http.StatusOK)) + var resp schema.PIIRedactResponse + Expect(json.Unmarshal(body, &resp)).To(Succeed()) + Expect(resp.Masked).To(BeTrue()) + Expect(resp.RedactedText).NotTo(Equal(emailText)) + Expect(resp.RedactedText).NotTo(ContainSubstring("john.doe@example.com"), + "the masked email must not survive in the redacted body") + }) + + It("rejects the request with pii_blocked when an entity action is block", func() { + status, body := redact(schema.PIIAnalyzeRequest{ + Text: emailText, + Detectors: []string{piiNERBlockModel}, + }) + Expect(status).To(Equal(http.StatusBadRequest)) + Expect(string(body)).To(ContainSubstring("pii_blocked")) + Expect(string(body)).NotTo(ContainSubstring("john.doe@example.com"), + "a blocked response must never echo the raw secret") + }) + }) +}) + +// tokenClassify drives core/backend.ModelTokenClassify against the live model +// with the loader/config the running server uses — the same path the NER +// detector takes, but at Threshold 0 so we see the raw, unmerged spans. +func tokenClassify(text string) []backend.TokenEntity { + GinkgoHelper() + cfg, ok := localAIApp.ModelConfigLoader().GetModelConfig(piiNERModel) + Expect(ok).To(BeTrue(), "model config %q must be loaded", piiNERModel) + fn, err := backend.ModelTokenClassify(text, backend.TokenClassifyOptions{}, + localAIApp.ModelLoader(), cfg, localAIApp.ApplicationConfig()) + Expect(err).NotTo(HaveOccurred()) + ents, err := fn(context.TODO()) + Expect(err).NotTo(HaveOccurred()) + return ents +} + +// assertByteCorrectSpan is the shared byte-offset invariant: a half-open byte +// range within text, aligned to UTF-8 rune boundaries, whose slice equals the +// entity's own reported text. +func assertByteCorrectSpan(text string, start, end int, got string) { + GinkgoHelper() + Expect(start).To(BeNumerically(">=", 0)) + Expect(end).To(BeNumerically(">", start)) + Expect(end).To(BeNumerically("<=", len(text))) + Expect(utf8.RuneStart(text[start])).To(BeTrue(), "start %d is mid-rune in %q", start, text) + if end < len(text) { + Expect(utf8.RuneStart(text[end])).To(BeTrue(), "end %d is mid-rune in %q", end, text) + } + slice := text[start:end] + Expect(utf8.ValidString(slice)).To(BeTrue(), "span %q is not valid UTF-8", slice) + Expect(slice).To(Equal(got), "entity text must equal text[start:end]") +} + +func spanCoversSubstring(text string, ents []backend.TokenEntity, sub string) bool { + lo := bytes.Index([]byte(text), []byte(sub)) + if lo < 0 { + return false + } + hi := lo + len(sub) + for _, e := range ents { + // any overlap with [lo,hi) + if e.Start < hi && e.End > lo { + return true + } + } + return false +} + +func analyze(req schema.PIIAnalyzeRequest) (int, schema.PIIAnalyzeResponse) { + GinkgoHelper() + status, body := postJSON("/api/pii/analyze", req) + var resp schema.PIIAnalyzeResponse + if status == http.StatusOK { + Expect(json.Unmarshal(body, &resp)).To(Succeed()) + } + return status, resp +} + +func redact(req schema.PIIAnalyzeRequest) (int, []byte) { + GinkgoHelper() + return postJSON("/api/pii/redact", req) +} + +func postJSON(path string, payload any) (int, []byte) { + GinkgoHelper() + data, err := json.Marshal(payload) + Expect(err).NotTo(HaveOccurred()) + httpResp, err := http.Post(anthropicBaseURL+path, "application/json", bytes.NewReader(data)) + Expect(err).NotTo(HaveOccurred()) + defer func() { _ = httpResp.Body.Close() }() + body, err := io.ReadAll(httpResp.Body) + Expect(err).NotTo(HaveOccurred()) + return httpResp.StatusCode, body +} diff --git a/tests/e2e/e2e_suite_test.go b/tests/e2e/e2e_suite_test.go index 5a257bdb023e..38e49f1cc9f3 100644 --- a/tests/e2e/e2e_suite_test.go +++ b/tests/e2e/e2e_suite_test.go @@ -47,6 +47,15 @@ var ( // cloud-proxy model YAMLs can point at their URLs at startup time. cpOpenAIUpstream *fakeOpenAIUpstreamServer cpAnthropicUpstream *fakeAnthropicUpstreamServer + + // Live PII NER tier. Set only when PII_NER_MODEL_GGUF points at a + // privacy-filter GGUF and the privacy-filter backend is discoverable + // (REALTIME_BACKENDS_PATH). Empty => the NER specs Skip, exactly like the + // cloud-proxy specs Skip without their binary. This is what the hermetic + // suite cannot do (e2e_suite_test.go comment at the cp-translate detector): + // run the real GGUF NER tier instead of only the in-process pattern tier. + piiNERModel string + piiNERBlockModel string ) var _ = BeforeSuite(func() { @@ -535,6 +544,40 @@ var _ = BeforeSuite(func() { } } + // Live PII NER tier. When PII_NER_MODEL_GGUF points at a downloaded + // privacy-filter GGUF, register two detector models that drive the real + // gRPC TokenClassify path on the privacy-filter backend (discovered via + // REALTIME_BACKENDS_PATH). Two models so we can exercise both policy + // outcomes against the same weights: mask (redact) and block (reject). + // NOTE: no pii_detection.builtins/patterns here — that would flip the + // detector to the in-process regex tier instead of the GGUF NER tier. + if gguf := os.Getenv("PII_NER_MODEL_GGUF"); gguf != "" { + piiNERModel = "privacy-filter-ner" + piiNERBlockModel = "privacy-filter-ner-block" + nerModelConfig := func(name, defaultAction string) map[string]any { + return map[string]any{ + "name": name, + "backend": "privacy-filter", + "embeddings": true, // required: TOKEN_CLS pooling loads via the embeddings flag + "known_usecases": []string{"token_classify"}, + "parameters": map[string]any{"model": gguf}, + "pii_detection": map[string]any{ + "min_score": 0.5, + "default_action": defaultAction, + }, + } + } + for _, cfg := range []map[string]any{ + nerModelConfig(piiNERModel, "mask"), + nerModelConfig(piiNERBlockModel, "block"), + } { + data, err := yaml.Marshal(cfg) + Expect(err).ToNot(HaveOccurred()) + Expect(os.WriteFile(filepath.Join(modelsPath, cfg["name"].(string)+".yaml"), data, 0644)).To(Succeed()) + } + xlog.Info("wired live PII NER models", "gguf", gguf, "models", []string{piiNERModel, piiNERBlockModel}) + } + systemState, err := system.GetSystemState(systemOpts...) Expect(err).ToNot(HaveOccurred()) From 8d4e058cbab0337fac82443df408c3ec6fad0300 Mon Sep 17 00:00:00 2001 From: Richard Palethorpe Date: Fri, 19 Jun 2026 11:39:50 +0100 Subject: [PATCH 2/2] feat(gallery): add privacy-filter-nemotron (f16 + q8) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GGUF conversions of OpenMed/privacy-filter-nemotron — a fine-grained English PII token-classifier (55 categories / 221 BIOES classes), fine-tuned from openai/privacy-filter on NVIDIA's Nemotron-PII dataset. Sibling to the existing privacy-filter-multilingual entry, trading language breadth for category depth. - privacy-filter-nemotron: F16 reference artifact (~2.8 GB). - privacy-filter-nemotron-q8: Q8_0 quant (~1.64 GB) for RAM-constrained / edge use; description notes the size/speed tradeoff and to validate on your own data (a single dropped span is a PII leak). Both run on the privacy-filter backend with known_usecases [token_classify] and a default mask policy (min_score 0.5); operators add per-category entity_actions as needed. sha256s taken from the HF repo's LFS object ids. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe --- gallery/index.yaml | 92 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) diff --git a/gallery/index.yaml b/gallery/index.yaml index 710bc274042b..1fae903fcd4a 100644 --- a/gallery/index.yaml +++ b/gallery/index.yaml @@ -1206,6 +1206,98 @@ - filename: privacy-filter/models/privacy-filter-multilingual/privacy-filter-multilingual-f16.gguf sha256: 01b76572f80b7d2ebee80a27cb9c3699c26b04cae1c402eee7664fc17a4b5ce6 uri: https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf +- name: "privacy-filter-nemotron" + url: "github:mudler/LocalAI/gallery/virtual.yaml@master" + icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png + urls: + - https://huggingface.co/OpenMed/privacy-filter-nemotron + - https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF + description: | + A fine-grained English PII token-classification model: a fine-tune of + openai/privacy-filter by OpenMed on NVIDIA's Nemotron-PII dataset. It labels + every token with a BIOES tag over 55 PII categories (221 classes), trading + the multilingual sibling's language breadth for category depth - identity, + contact, address, dates, government IDs, financial, healthcare, enterprise, + vehicle and digital entities (including api_key, ipv4/ipv6 and mac_address). + For multilingual text prefer privacy-filter-multilingual instead. + + In LocalAI this is a PII detector for the NER redactor tier: set + known_usecases to [token_classify] (as below), and any model opts into + redaction by listing this one under pii.detectors. The detection policy + (which categories to mask vs block, and the score threshold) lives on this + model's own pii_detection block - see the overrides below. It runs locally + with no Python, served by the standalone privacy-filter backend's + TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset + entity spans). + + Architecture: gpt-oss-style sparse MoE (8 layers, d_model 640, 128 experts + top-4, ~1.5B total / ~50M active per token), bidirectional banded attention, + o200k tokenizer and a 221-way token-classification head; served via the + openai-privacy-filter architecture. F16, ~2.8 GB. (A smaller Q8_0 quant + exists on the GGUF repo for RAM-constrained use - validate it on your own + data, since for PII a single dropped span is a leak.) + license: apache-2.0 + tags: + - token-classification + - ner + - pii + - privacy + - nemotron + - gguf + overrides: + backend: privacy-filter + embeddings: true + known_usecases: + - token_classify + parameters: + model: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-f16.gguf + pii_detection: + min_score: 0.5 + default_action: mask + files: + - filename: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-f16.gguf + sha256: 70dfe91ff220ff04594168a83e296dcc2054449cde77f98d0e782edbb6a31f5a + uri: https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF/resolve/main/privacy-filter-nemotron-f16.gguf +- name: "privacy-filter-nemotron-q8" + url: "github:mudler/LocalAI/gallery/virtual.yaml@master" + icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png + urls: + - https://huggingface.co/OpenMed/privacy-filter-nemotron + - https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF + description: | + Q8_0 quant of privacy-filter-nemotron (~1.64 GB, vs ~2.8 GB for F16) for + RAM-constrained / edge use (e.g. a 4 GB Raspberry Pi 5). The MoE expert + weights are stored 8-bit; attention, embeddings and the classifier head + stay F16. Same model, policy and runtime as the F16 entry - see + privacy-filter-nemotron for the full description. + + Prefer the F16 entry when you can afford it: it is the reference artifact. + On a mixed-PII document the publisher measured q8 matching F16 on 99.93% of + token labels with an identical span set at threshold 0.5 - but one token + flipped, and for PII a single dropped span is a leak. Treat q8 as a + deliberate size/speed tradeoff and validate it on your own data. + license: apache-2.0 + tags: + - token-classification + - ner + - pii + - privacy + - nemotron + - gguf + overrides: + backend: privacy-filter + embeddings: true + known_usecases: + - token_classify + parameters: + model: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-q8.gguf + pii_detection: + min_score: 0.5 + default_action: mask + files: + - filename: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-q8.gguf + sha256: 2ec11c154e572a2686f4d77e861b7f74e6917e09638fe9bd27156d48bd99e21a + uri: https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF/resolve/main/privacy-filter-nemotron-q8.gguf - name: "secret-filter" url: "github:mudler/LocalAI/gallery/virtual.yaml@master" description: |