daemon/rule: stop mutating regex Data field on Compile (#1587)#1606
Open
tredondo wants to merge 1 commit into
Open
daemon/rule: stop mutating regex Data field on Compile (#1587)#1606tredondo wants to merge 1 commit into
tredondo wants to merge 1 commit into
Conversation
Compile() for Regexp operators with Sensitive==false used to do
o.Data = strings.ToLower(o.Data)
before compiling the pattern. That works as an ASCII-only way to make
matching case-insensitive (pattern and subject both lowercased), but it
mangles the pattern string itself. A regex like [0-9A-Za-z]+ becomes
[0-9a-za-z]+: still a valid character class that matches the same set
of characters, but visibly wrong, and the mangled value then bleeds
into:
- the rule JSON written to disk by loader.Save (json.MarshalIndent
serializes the post-Compile o.Data)
- the GUI display of the operator data and the slugified rule name
This is most visible on the auto-generated rules for AppImages, whose
process.path lives under /tmp/.mount_<name>_<random>/, producing the
[0-9a-za-z]+ duplication the user can see in the popup and on disk.
Switch to Go regexp's native case-insensitivity instead: prepend the
(?i) flag to a local pattern string and compile that. o.Data is no
longer touched by Compile, so the original pattern survives both
serialization and the round-trip back to the UI. reCmp can drop its
matching strings.ToLower(data) call, since the compiled regex now
handles case folding itself.
Existing rules on disk that already have a lowercased pattern keep
working: (?i) over a [0-9a-z]+-style class is equivalent to the
previous lowercase-both-sides behavior, and case-sensitive rules are
unaffected (no (?i) is added).
A regression test (TestNewOperatorRegexpDataNotMutated) pins both the
pattern-preservation contract and case-insensitive matching against an
appimage-style path.
Fixes evilsocket#1587
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1587.
Compile() for Regexp operators with Sensitive==false used to do
before compiling the pattern. That works as an ASCII-only way to make matching case-insensitive (pattern and subject both lowercased), but it mangles the pattern string itself. A regex like [0-9A-Za-z]+ becomes [0-9a-za-z]+: still valid, but visibly wrong, and the mangled value then bleeds into:
This is most visible on the auto-generated rules for AppImages, whose process.path lives under /tmp/.mount__/, producing the [0-9a-za-z]+ duplication the user can see in the popup and on disk.
Switch to Go regexp's native case-insensitivity instead: prepend the (?i) flag to a local pattern string and compile that. o.Data is no longer touched by Compile, so the original pattern survives both serialization and the round-trip back to the UI. reCmp can drop its matching strings.ToLower(data) call, since the compiled regex now handles case folding itself.
Existing rules on disk that already have a lowercased pattern keep working: (?i) over a [0-9a-z]+-style class is equivalent to the previous lowercase-both-sides behavior, and case-sensitive rules are unaffected (no (?i) is added).
A regression test (TestNewOperatorRegexpDataNotMutated) pins both the pattern-preservation contract and case-insensitive matching against an appimage-style path.