Conversation
5af4403 to
bb71ad8
Compare
|
@Taepper this currently in a bit of a cmake limbo :). |
Hehe :) it is quite the big change, I will also take a look tomorrow! |
WillAyd
left a comment
There was a problem hiding this comment.
Cool nice work. Some suggestions to improve the meson integration, which hopefully can help while the cmake issues get figured out (?)
| 'RapidJSON', | ||
| fallback: ['rapidjson', 'rapidjson_dep'], | ||
| ) | ||
| simdjson_dep = dependency('simdjson', fallback: ['simdjson', 'simdjson_dep']) |
There was a problem hiding this comment.
| simdjson_dep = dependency('simdjson', fallback: ['simdjson', 'simdjson_dep']) | |
| simdjson_dep = dependency('simdjson') |
The fallback as written is not correct (the variable name in the simdjson module is actually still just simdjson), but I also don't think its required in this case
| source_hash = b9290a9a6d444c8e049bd589ab804e0ccf2b05dc5984a19ed5ae75d090064806 | ||
| source_fallback_url = https://apache.jfrog.io/artifactory/arrow/thirdparty/7.0.0/rapidjson-232389d4f1012dddec4ef84861face2d2ba85709.tar.gz | ||
| patch_directory = rapidjson | ||
| directory = simdjson-3.12.3 |
There was a problem hiding this comment.
There's a wrap entry for 4.2.2 at this point - maybe worth using instead of 3.12? Or, if we wanted to stay in the 3 series, there's also 3.13
| # under the License. | ||
|
|
||
| project('rapidjson', 'cpp') | ||
| project('simdjson', 'cpp', version: '3.12.3', default_options: ['cpp_std=c++17']) |
There was a problem hiding this comment.
You should be able to remove this file altogether. subprojects/packagefiles contain user-specific overrides for packages (i.e. you want to create a meson.build file that is "better" than what you get upstream).
In the case of simdjson, we should probably trust the configuration provided by the wrapdb, unless we know better
Thanks @WillAyd ! I've not had a chance to properly look into meson config yet and this helps! |
Taepper
left a comment
There was a problem hiding this comment.
A few comments – impressive how fast you went through the code-base! I would have thought that it would first only replace a few rapidjson usages.
Generally, we can choose whether to use the simdjson::dom or simdjson::ondemand frontend for arrow.
simdjson::ondemand can be faster and less memory-intensive. [1]
Its api is a little more difficult to use, but might be fine for our use-case, as we do not intend to parse multiple values in the document in parallel.
I see that you are using ondemand frontend for json/parser.cc, and dom for all other use-cases. It seems sensible to me at first glance, but it might make sense to document the reasoning for this decision
| py3-numpy-dev \ | ||
| python3-dev \ | ||
| rapidjson-dev \ | ||
| simdjson-dev \ |
There was a problem hiding this comment.
Is this required when you bundle simdjson in all cases?
| add_arrow_example(rapidjson_row_converter EXTRA_LINK_LIBS RapidJSON) | ||
| add_arrow_example(from_json_string_example EXTRA_LINK_LIBS RapidJSON) | ||
| endif() | ||
| add_arrow_example(from_json_string_example) |
There was a problem hiding this comment.
simdjson is not always enabled. Should this stay conditional?
if(ARROW_JSON OR ARROW_FLIGHT_SQL_ODBC)
set(ARROW_WITH_SIMDJSON ON)
endif()
| ) | ||
|
|
||
| externalproject_add(rapidjson_ep | ||
| set(SIMDJSON_CMAKE_ARGS |
There was a problem hiding this comment.
We might want to set this option to OFF:
option(SIMDJSON_EXCEPTIONS "Enable simdjson's exception-throwing interface" ON)
https://github.com/simdjson/simdjson/blob/6fffc99dac4029a67dd5a794d2f59ec6ba84a1b1/cmake/exception-flags.cmake#L1
This disables any exception-throwing API-functions from simdjson
| python3-rados \ | ||
| rados-objclass-dev \ | ||
| rapidjson-dev \ | ||
| libsimdjson-dev \ |
| pkg-config \ | ||
| protobuf-compiler \ | ||
| rapidjson-dev \ | ||
| libsimdjson-dev \ |
|
Thanks for the review @Taepper ! As a general comment, I thought I would first create a PR here and work on a POC "in secret" to see if it's feasible and then split it up into manageable PR against main. |
This is a discussion PR. It will likely be broken up in multiple PRs.