Skip to content

Latest commit

 

History

History
240 lines (164 loc) · 6 KB

File metadata and controls

240 lines (164 loc) · 6 KB

Development

Warning

The instructions below are for Apple Silicon (M1-M5). We've had success with an Apple M4 Pro running macOS Sequoia 15.6.1. Instructions for other systems will be added at a later date.

Prerequisites

  • Clang (LLVM Clang 18)
  • CMake (>= 3.12)
  • vcpkg
  • A BLAS implementation
  • OpenMP

Once you have these you can build the extension.

Install Clang

brew install llvm@18

Note

When building the extension (discussed below), explicitly pass the LLVM 18 compiler binaries to make sure LLVM@18 is used. For example: CC=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang CXX=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang++ make.

Install CMake

Versions >= 3.12 are supported.

brew install cmake

Or instead build and install from the source.

Install vcpkg

Install vcpkg prerequisite: pkg-config.

brew install pkg-config

Install vcpkg using the instructions below, or check out the extension template's instructions for vcpkg.

Change to a directory where you want vcpkg to be installed.

cd ~
git clone https://github.com/Microsoft/vcpkg.git

Optionally check out the version we've used.

git checkout 4334d8b4c8
./vcpkg/bootstrap-vcpkg.sh -disableMetrics
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake

Install BLAS

On Apple Silicon (M1-M5) we rely on the Apple Accelerate framework. This means there is nothing to install.

Install OpenMP

brew install libomp

Note: You might have to set OpenMP_ROOT in your .zshrc file.

export OpenMP_ROOT=$(brew --prefix)/opt/libomp

Build

Building the Extension

  1. Clone the repo and ensure the submodules are also cloned:

    git clone --recurse-submodules https://github.com/Noorts/PDXearch.git
  2. Build the extension (the default is an optimized release build):

    cd PDXearch
    make

    Other build modes include debug and reldebug. Include DISABLE_SANITIZER=1 to avoid errors raised by the DuckDB core itself.

    DISABLE_SANITIZER=1 make debug
  3. [Recommended] For faster builds, install ccache and ninja, and then set the generator when you build:

    brew install ccache ninja
    GEN=ninja make

@Noorts personally uses

GEN=ninja DISABLE_SANITIZER=1 CC=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang CXX=$HOMEBREW_PREFIX/opt/llvm@18/bin/clang++ EXTRA_CMAKE_ARGS="-DCMAKE_EXPORT_COMPILE_COMMANDS=1" make

The built extension artifact can be found at PDXearch/build/release/extension/pdxearch/pdxearch.duckdb_extension.

For PDXearch extension usage, please see to the README.md.

PDXearch Variants

The extension's internals have been implemented in two variants. These variants both implement index creation, non-filtered search, and filtered search.

1. Row group: The first variant creates a separate internal IVF index for each row group. This allows the extension to parallelize across row groups, which speeds up both the creation of the index and using it for search.

2. Global: The second variant uses a single internal IVF index for all embeddings in the targeted table column.

From the user's perspective the creation of an index (CREATE INDEX) and using it for search is the same. The user always interacts with a single database index.

You decide which variant to use at compile time. Row group is the default. Build the extension with the following argument to use the global variant:

EXT_FLAGS="-DPDX_USE_ALTERNATIVE_GLOBAL_VERSION=1" make

Clangd Language Server Support

For clangd support in VSCode-based editors, ensure you build the compilation database.

Include the following argument in your build command (each time you build):

EXTRA_CMAKE_ARGS="-DCMAKE_EXPORT_COMPILE_COMMANDS=1" make

Then in the PDXearch root directory, symlink the generated compilation database (once).

ln -s ./build/release/compile_commands.json ./

You might have to manually set the clangd executable path in the VSCode clangd extension configuration.

Clean, Format, Tidy-Check

make clean
make format

Portable make format alternative:

uv run --with black --with clang_format==11.0.1 --with cmake-format duckdb/scripts/format.py --all --fix --noconfirm --directories src test
make tidy-check

Tidy-check alternative that explicitly uses LLVM 18:

make tidy-check TIDY_BINARY=/opt/homebrew/opt/llvm@18/bin/clang-tidy

Test

After building the extension you can run the tests. See test/README.md for more details.

make test
make test_debug

Run one specific test:

./build/release/"/test/unittest" "test/sql/search/index_scan_uncommon_dimensions.test"

FAQ

  • Q: I pulled the latest commits and now I run into compiler errors when I build the extension.
    • A: Did the latest commits include the bump of a submodule? (e.g., DuckDB was updated) If this is the case, then when you run git status it will state "modified: duckdb (new commits)". In the root PDXearch directory, run git submodule update --init --recursive to ensure your local submodules match the committed versions.