Skip to content

[k2] add compile-time regexp#1623

Open
denisichh wants to merge 3 commits intomasterfrom
dzubarev/k2/compile-time_regexp
Open

[k2] add compile-time regexp#1623
denisichh wants to merge 3 commits intomasterfrom
dzubarev/k2/compile-time_regexp

Conversation

@denisichh
Copy link
Copy Markdown
Contributor

No description provided.

@denisichh denisichh self-assigned this Apr 29, 2026
@denisichh denisichh added runtime Feature related to runtime compiler Feature related to compiler k2 Affects compiler or runtime in K2 mode labels Apr 29, 2026
kphp::pcre2::match_context match_context;
kphp::pcre2::match_data match_data;

RegexInstanceState()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing noexcept here

kphp::pcre2::match_data match_data;

RegexInstanceState() noexcept
RegexBaseState() noexcept
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hide it in kphp::regex::details namespace

}

template<typename... Args>
void pattern_compilation_warning(const char* function, const char* file, std::format_string<kphp::log::impl::wrapped_arg_t<Args>...> fmt,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need separate logging utility for regexp?

Args&&... args) const noexcept {
static constexpr size_t LOG_BUFFER_SIZE = 256UZ;
std::array<char, LOG_BUFFER_SIZE> log_buffer; // NOLINT
const size_t message_size{kphp::log::impl::format_log_message(log_buffer, fmt, std::forward<Args>(args)...)};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎🤏😳🕶🤏

namespace kphp::regex {
class regexp final {
private:
void compile_regex(RegexBaseState& regex_state, const string& pattern, const string& subject = {}, const char* function = nullptr,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Shouldn't we take pattern and subject by copy?
  2. Can we get rid of raw pointers?

}
}

std::optional<std::reference_wrapper<const kphp::pcre2::regex>> _re{};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Maybe rename to m_re to follow existing naming policies?
  • Redundant init for this member

_re = regex_state.add_compiled_regex(pattern, this->compile_options, std::move(re))->get().regex_code;
}

std::optional<bool> find_compiled_regex(const RegexBaseState& regex_state, const string& pattern) noexcept {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't bool as a return type enough here?

return true;
}

inline Optional<int64_t> preg_match_base(const regexp& regex, const string& subject,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common note: let's avoid generic names like *base*. Can we use some specific name that reflects what the function actually does instead?


static constexpr size_t MAX_SUBPATTERNS_COUNT{512};

kphp::stl::unordered_map<string, compiled_regex, kphp::memory::script_allocator, hasher_type> regex_pcre2_code_cache;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As now it's also used in *ImageState, we must ensure that all string objects in this map have their reference counter set to ExtraRefCnt::for_global_const

auto group_names{kphp::regex::details::collect_group_names(re)};
auto unsigned_limit{limit == kphp::regex::PREG_NOLIMIT ? std::numeric_limits<uint64_t>::max() : static_cast<uint64_t>(limit)};
regex_info.replace_count = 0;
int64_t replace_count = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use braced init here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler Feature related to compiler k2 Affects compiler or runtime in K2 mode runtime Feature related to runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants