Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make HypGrep accept a
RegExpas the query and surface a grep-style CLI, while keeping per-query transfer bytes bounded via index pruning.Key changes
src/regex.js: extracts mandatory literal substrings from a regex source in DNF form (handles escapes, character classes, groups, quantifiers, anchors, top-level alternation).queryIndexacceptsRegExp: each top-level alternation branch becomes its own n-gram conjunction; a block is a candidate if any branch is fully satisfied (Zoekt-style branch ORing).parquetFind/parquetSearchacceptRegExpdirectly — default row predicate becomesregex.test(value); scorer forparquetSearchuses the regex's own flags././,/foo|bar/where a branch matches anything) — correct grep semantics, no index acceleration.hypgrep searchCLI subcommand:npx hypgrep search file.parquet '<pattern>', with--limit N,--index <path>,-c/--count,-i/--ignore-case. Patterns wrapped as/regex/flagsare parsed as regex; exit code 1 on no matches (grep convention)./foo|bar/previously yielded['foo']).Breaking changes
extractRegexLiteralsnow returnsstring[][](DNF) instead ofstring[]. Internal only — not part of the public API.Known limitations (documented in ARCH.md)
/abc(foo|bar)def/yields['abc','def'], not the tighter DNF). Group recursion is the next step.