Warning
The instructions below are for Apple Silicon (M1-M5). We've had success with an Apple M4 Pro running macOS Sequoia 15.6.1. Instructions for other systems will be added at a later date.
- Clang (LLVM Clang 18)
- CMake (>= 3.12)
- vcpkg
- A BLAS implementation
- OpenMP
Once you have these you can build the extension.
brew install llvm@18Note
When building the extension (discussed below), explicitly pass the LLVM 18
compiler binaries to make sure LLVM@18 is used. For example:
CC=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang CXX=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang++ make.
Versions >= 3.12 are supported.
brew install cmakeOr instead build and install from the source.
Install vcpkg prerequisite: pkg-config.
brew install pkg-configInstall vcpkg using the instructions below, or check out the extension template's instructions for vcpkg.
Change to a directory where you want vcpkg to be installed.
cd ~git clone https://github.com/Microsoft/vcpkg.gitOptionally check out the version we've used.
git checkout 4334d8b4c8./vcpkg/bootstrap-vcpkg.sh -disableMetricsexport VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmakeOn Apple Silicon (M1-M5) we rely on the Apple Accelerate framework. This means there is nothing to install.
brew install libompNote: You might have to set OpenMP_ROOT in your .zshrc file.
export OpenMP_ROOT=$(brew --prefix)/opt/libomp-
Clone the repo and ensure the submodules are also cloned:
git clone --recurse-submodules https://github.com/Noorts/PDXearch.git
-
Build the extension (the default is an optimized
releasebuild):cd PDXearchmake
Other build modes include
debugandreldebug. IncludeDISABLE_SANITIZER=1to avoid errors raised by the DuckDB core itself.DISABLE_SANITIZER=1 make debug
-
[Recommended] For faster builds, install ccache and ninja, and then set the generator when you build:
brew install ccache ninja
GEN=ninja make
@Noorts personally uses
GEN=ninja DISABLE_SANITIZER=1 CC=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang CXX=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang++ EXTRA_CMAKE_ARGS="-DCMAKE_EXPORT_COMPILE_COMMANDS=1" makeThe built extension artifact can be found at PDXearch/build/release/extension/pdxearch/pdxearch.duckdb_extension.
For PDXearch extension usage, please see to the README.md.
The extension's internals have been implemented in two variants. These variants both implement index creation, non-filtered search, and filtered search.
1. Row group: The first variant creates a separate internal IVF index for each row group. This allows the extension to parallelize across row groups, which speeds up both the creation of the index and using it for search.
2. Global: The second variant uses a single internal IVF index for all embeddings in the targeted table column.
From the user's perspective the creation of an index (CREATE INDEX) and using
it for search is the same. The user always interacts with a single database
index.
You decide which variant to use at compile time. Row group is the default. Build the extension with the following argument to use the global variant:
EXT_FLAGS="-DPDX_USE_ALTERNATIVE_GLOBAL_VERSION=1" makeFor clangd support in VSCode-based editors, ensure you build the compilation database.
Include the following argument in your build command (each time you build):
EXTRA_CMAKE_ARGS="-DCMAKE_EXPORT_COMPILE_COMMANDS=1" makeThen in the PDXearch root directory, symlink the generated compilation database (once).
ln -s ./build/release/compile_commands.json ./You might have to manually set the clangd executable path in the VSCode clangd extension configuration.
make cleanmake formatPortable make format alternative:
uv run --with black --with clang_format==11.0.1 --with cmake-format duckdb/scripts/format.py --all --fix --noconfirm --directories src testmake tidy-checkTidy-check alternative that explicitly uses LLVM 18:
make tidy-check TIDY_BINARY=/opt/homebrew/opt/llvm@18/bin/clang-tidyAfter building the extension you can run the tests. See test/README.md for more details.
make testmake test_debugRun one specific test:
./build/release/"/test/unittest" "test/sql/search/index_scan_uncommon_dimensions.test"- Q: I pulled the latest commits and now I run into compiler errors when I build the extension.
- A: Did the latest commits include the bump of a submodule? (e.g., DuckDB was updated) If this is the case, then when you run
git statusit will state"modified: duckdb (new commits)". In the root PDXearch directory, rungit submodule update --init --recursiveto ensure your local submodules match the committed versions.
- A: Did the latest commits include the bump of a submodule? (e.g., DuckDB was updated) If this is the case, then when you run