PLSSVM - Parallel Least Squares Support Vector Machine

Introduction To PLSSVM
Getting Started
Usage
Citing PLSSVM
License

Introduction to PLSSVM

A Support Vector Machine (SVM) is a supervised machine learning model. In its basic form SVMs are used for binary classification tasks. Their fundamental idea is to learn a hyperplane which separates the two classes best, i.e., where the widest possible margin around its decision boundary is free of data. This is also the reason, why SVMs are also called "large margin classifiers". To predict to which class a new, unseen data point belongs, the SVM simply has to calculate on which side of the previously calculated hyperplane the data point lies. This is very efficient since it only involves a single scalar product of the size corresponding to the numer of features of the data set.

However, normal SVMs suffer in their potential parallelizability. Determining the hyperplane boils down to solving a convex quadratic problem. For this, most SVM implementations use Sequential Minimal Optimization (SMO), an inherently sequential algorithm. The basic idea of this algorithm is that it takes a pair of data points and calculates the hyperplane between them. Afterward, two new data points are selected and the existing hyperplane is adjusted accordingly. This procedure is repeat until a new adjustment would be smaller than some epsilon greater than zero.

Some SVM implementations try to harness some parallelization potential by not drawing point pairs but group of points. In this case, the hyperplane calculation inside this group is parallelized. However, even then modern highly parallel hardware can not be utilized efficiently.

Therefore, we implemented a version of the original proposed SVM called Least Squares Support Vector Machine (LS-SVM). The LS-SVMs reformulated the original problem such that it boils down to solving a system of linear equations. For this kind of problem many highly parallel algorithms and implementations are known. We decided to use the Conjugate Gradient (CG) to solve the system of linear equations.

The main highlights of our SVM implementations are:

Drop-in replacement for LIBSVM's svm-train, svm-predict, and svm-scale (some features currently not implemented).
Support of multiple different programming frameworks for parallelization (also called backends in our PLSSVM implementation) which allows us to target GPUs and CPUs from different vendors like NVIDIA, AMD, or Intel:
- OpenMP
- HPX (tested with current master)
- C++ 17's standard parallelism stdpar:
  Note: due to the nature of the used USM mechanics in the stdpar implementations, the stdpar backend can't be enabled together with any other backend!
  Note: since every translation units need to be compiled with the same flag, we currently globally set CMAKE_CXX_FLAGS although it's discouraged.
  - nvc++ from NVIDIA's HPC SDK (tested with version 25.3)
  - roc-stdpar merged into upstream LLVM starting with version 18 (tested with version 18)
  - icpx as Intel's oneAPI compiler (tested with version 2025.0.0)
  - AdaptiveCpp (tested with version v24.10.0)
  - GNU GCC using TBB (tested with version GCC 14.2.0)
- CUDA (tested with version 12.6.3)
- HIP (tested with version 6.3.3)
- OpenCL (tested with CUDA and ROCm provided OpenCL implementations as well as PoCL version v6.0)
- SYCL:
  - DPC++/icpx as Intel's oneAPI compiler (tested with version 2025.0.0)
  - AdaptiveCpp, formerly known as hipSYCL (tested with version v24.10.0)
- Kokkos (all execution spaces supported except OpenMPTarget and OpenACC) (tested with version 4.6.00)
Six different kernel functions to be able to classify a large variety of different problems:
- linear: $\vec{u}^T$ $\cdot$ $\vec{v}$
- polynomial: $(\gamma$ $\cdot$ $\vec{u}^T$ $\cdot$ $\vec{v}$ $+$ $coef0)^{d}$
- radial basis function (rbf): $\exp(-\gamma$ $\cdot |$ $\vec{u}$ $-$ $\vec{v}$ $|_2^2)$
- sigmoid: $\tanh(\gamma$ $\cdot$ $\vec{u}^T$ $\cdot$ $\vec{v}$ $+$ $coef0)$
- laplacian: $\exp(-\gamma$ $\cdot |$ $\vec{u}$ $-$ $\vec{v}$ $|_1)$
- chi-squared (only well-defined for values > 0): $\exp(-\gamma \cdot \sum_i \frac{(x[i] - y[i])^2}{x[i] + y[i]})$
Two different solver types for a trade-off between memory footprint and runtime:
- cg_explicit: large memory overhead but very fast
- cg_implicit: slower but requires drastically less memory
Multi-class classification available via one vs. all (also one vs. rest or OAA) and one vs. one (also OAO):
- OAA: one huge classification task where our CG algorithm solves a system of linear equations with multiple right-hand sides. The resulting model file is not compatible with LIBSVM.
- OAO: constructs many but smaller binary classifications. The resulting model file is fully compatible with LIBSVM.
Also, support for the regression task.
Multi-GPU support for all kernel functions and GPU backends for fit as well as predict/score (note: no multi-GPU support for the stdpar backend even if run on a GPU!).
Distributed memory support via MPI for all backends.
Python bindings as drop-in replacement for sklearn.SVC and sklearn.SVR (some features currently not implemented).

To see the full power of Support Vector Machines, have a look at our live visualization examples in examples/python/interactive.

Getting Started

Dependencies

General dependencies:

a C++17 capable compiler (e.g. gcc or clang)
CMake 3.25 or newer
cxxopts ≥ v3.2.0, fast_float ≥ v8.0.2, {fmt} ≥ v11.0.2, and igor (all four are automatically build during the CMake configuration if they couldn't be found using the respective find_package call)
GoogleTest ≥ v1.16.0 if testing is enabled (automatically build during the CMake configuration if find_package(GTest) wasn't successful)
doxygen if documentation generation is enabled
Pybind11 ≥ v2.13.6 if Python bindings are enabled
OpenMP 4.0 or newer (optional) to speed-up library utilities (like file parsing)
MPI if distributed memory systems should be supported; mpi4py to enable interoperability in our Python bindings
Format.cmake if auto formatting via cmake-format and clang-format is enabled; also requires at least clang-format-18 and git, additionally, needs our custom cmake-format fork incorporating some patches
multiple Python modules used in the utility scripts, to install all modules use pip install --user -r install/python_requirements.txt

Additional dependencies for the OpenMP backend:

compiler with OpenMP support

Additional dependencies for the stdpar backend:

compiler with stdpar support

Additional dependencies for the HPX backend:

HPX @ current master

Additional dependencies for the CUDA backend:

CUDA SDK
either NVIDIA nvcc, nvc++ or clang with CUDA support enabled

Additional dependencies for the HIP backend:

working ROCm and HIP installation
clang with HIP support

Additional dependencies for the OpenCL backend:

OpenCL runtime and header files
e.g., the CUDA or ROCm provided OpenCL runtimes or PoCL

Additional dependencies for the SYCL backend:

the code must be compiled with a SYCL capable compiler; currently supported are DPC++/icpx (via spack) and AdaptiveCpp

Additional dependencies for the Kokkos backend:

a Kokkos installation with the respective execution spaces enabled; currently all execution spaces are supported except OpenMPTarget and OpenACC

Additional dependencies for the stdpar backend:

the code must be compiled with a stdpar capable compiler; currently supported are nvc++, roc-stdpar (merged into upstream LLVM starting with version 18), icpx, AdaptiveCpp, and GNU GCC)
depending on the used stdpar implementation, additional dependencies are required:
- nvc++: a CUDA SDK
- roc-stdpar: a HIP installation; it may be necessary to set export HSA_XNACK=1 (e.g., if the error Memory access fault by GPU node-2 occurs)
- icpx: Intel's oneDPL library
- AdaptiveCpp: Intel's TBB library
- GNU GCC: Boost ≥ 1.73.0 with the atomic library enabled and Intel's TBB library

Additional dependencies if PLSSVM_ENABLE_TESTING and PLSSVM_GENERATE_TEST_FILES are both set to ON:

Python3 with the argparse, timeit, sklearn, and humanize modules

Additional dependencies if PLSSVM_ENABLE_PERFORMANCE_TRACKING and PLSSVM_ENABLE_HARDWARE_SAMPLING are both set to ON:

our hardware sampling library hws ≥ v1.0.3 (automatically build during the CMake configuration if it couldn't be found using the find_package call)

Building PLSSVM

To download PLSSVM use:

git clone https://github.com/SC-SGS/PLSSVM.git
cd PLSSVM

We provided a Python3 requirements file to install all necessary Python3 dependencies:

pip install -r install/python_requirements.txt

Building the library can be done using the normal CMake approach:

mkdir build && cd build 
cmake -DPLSSVM_TARGET_PLATFORMS="..." [optional_options] .. 
cmake --build . -j

(Automatic) Target Platform Selection

The CMake option PLSSVM_TARGET_PLATFORMS is used to determine for which targets the backends should be compiled. Valid targets are:

cpu: compile for the CPU; an optional architectural specifications is allowed but only used when compiling with DPC++/icpx, e.g., cpu:avx2
nvidia: compile for NVIDIA GPUs; at least one architectural specification is necessary, e.g., nvidia:sm_86,sm_70
amd: compile for AMD GPUs; at least one architectural specification is necessary, e.g., amd:gfx906
intel: compile for Intel GPUs; at least one architectural specification is necessary, e.g., intel:skl

At least one of the above targets must be present. If the option PLSSVM_TARGET_PLATFORMS is not present, the targets are automatically determined using the Python3 utility_scripts/plssvm_target_platforms.py script.

Note that when using DPC++/icpx only a single architectural specification for cpu, nvidia or amd is allowed.

python3 utility_scripts/plssvm_target_platforms.py --help

usage: plssvm_target_platforms.py [-h] [--quiet]

optional arguments:
  -h, --help   show this help message and exit
  --quiet      only output the final PLSSVM_TARGET_PLATFORMS string
  --gpus_only  only output gpu architectures to the final PLSSVM_TARGET_PLATFORMS string

Example invocation:

python3 utility_scripts/plssvm_target_platforms.py

supported CPU SIMD flags: {'avx512': True, 'avx2': True, 'avx': True, 'sse4_2': True}

Found 1 NVIDIA GPU(s): [sm_86]

Possible -DPLSSVM_TARGET_PLATFORMS entries:
cpu:avx512;nvidia:sm_86

or with the --quiet flag provided:

python3 utility_scripts/plssvm_target_platforms.py --quiet

cpu:avx512;nvidia:sm_86

Optional CMake Options

The [optional_options] can be one or multiple of:

PLSSVM_ENABLE_OPENMP_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the OpenMP backend and fail if not available
- AUTO: check for the OpenMP backend but do not fail if not available
- OFF: do not check for the OpenMP backend
PLSSVM_ENABLE_HPX_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the HPX backend and fail if not available
- AUTO: check for the HPX backend but do not fail if not available
- OFF: do not check for the HPX backend
PLSSVM_ENABLE_STDPAR_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the stdpar backend and fail if not available
- AUTO: check for the stdpar backend but do not fail if not available
- OFF: do not check for the stdpar backend
PLSSVM_ENABLE_CUDA_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the CUDA backend and fail if not available
- AUTO: check for the CUDA backend but do not fail if not available
- OFF: do not check for the CUDA backend
PLSSVM_ENABLE_HIP_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the HIP backend and fail if not available
- AUTO: check for the HIP backend but do not fail if not available
- OFF: do not check for the HIP backend
PLSSVM_ENABLE_OPENCL_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the OpenCL backend and fail if not available
- AUTO: check for the OpenCL backend but do not fail if not available
- OFF: do not check for the OpenCL backend
PLSSVM_ENABLE_SYCL_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the SYCL backend and fail if not available
- AUTO: check for the SYCL backend but do not fail if not available
- OFF: do not check for the SYCL backend
PLSSVM_ENABLE_KOKKOS_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for the Kokkos backend and fail if not available
- AUTO: check for the Kokkos backend but do not fail if not available
- OFF: do not check for the Kokkos backend

Attention: at least one backend must be enabled and available!

PLSSVM_ENABLE_MPI=ON|OFF|AUTO (default: AUTO):
- ON: check for MPI and fail if not available
- AUTO: check for MPI but do not fail if not available
- OFF: do not check for MPI
PLSSVM_ENABLE_FAST_MATH=ON|OFF (default depending on CMAKE_BUILD_TYPE: ON for Release or RelWithDebInfo, OFF otherwise): enable fast-math compiler flags for all backends
PLSSVM_ENABLE_ASSERTS=ON|OFF (default: OFF): enables custom assertions
PLSSVM_USE_FLOAT_AS_REAL_TYPE=ON|OFF (default: OFF): use float as real_type instead of double
PLSSVM_THREAD_BLOCK_SIZE (default: 8): set a specific thread block size used in the GPU kernels (for fine-tuning optimizations)
PLSSVM_INTERNAL_BLOCK_SIZE (default: 4): set a specific internal block size used in the GPU kernels (for fine-tuning optimizations)
PLSSVM_ENABLE_LTO=ON|OFF (default: OFF): enable interprocedural optimization (IPO/LTO) if supported by the compiler
PLSSVM_ENFORCE_MAX_MEM_ALLOC_SIZE=ON|OFF (default: ON): enforce the maximum (device) memory allocation size for the plssvm::solver_type::automatic solver
PLSSVM_ENABLE_DOCUMENTATION=ON|OFF (default: OFF): enable the doc target using doxygen
PLSSVM_ENABLE_PERFORMANCE_TRACKING=ON|OFF (default: OFF): enable gathering performance characteristics for the three executables using YAML files; example Python3 scripts to perform performance measurements and to process the resulting YAML files can be found in the utility_scripts/ directory (requires the Python3 modules wrapt-timeout-decorator, pyyaml, and pint)
PLSSVM_ENABLE_TESTING=ON|OFF (default: ON): enable testing using GoogleTest and ctest
PLSSVM_ENABLE_LANGUAGE_BINDINGS=ON|OFF (default: OFF): enable language bindings
PLSSVM_STL_DEBUG_MODE_FLAGS=ON|OFF (default: OFF): enable STL debug modes (note: changes the resulting library's ABI!)
PLSSVM_ENABLE_FORMATTING=ON|OFF (default: OFF): enable automatic formatting using cmake-format and clang-format; adds additional targets check-cmake-format, cmake-format, fix-cmake-format, check-clang-format, clang-format, and fix-clang-format

If PLSSVM_ENABLE_TESTING is set to ON, the following option can also be set:

PLSSVM_GENERATE_TEST_FILES=ON|OFF (default: ON): automatically generate test files

If PLSSVM_GENERATE_TEST_FILES is set to ON, the following options can also be set:

PLSSVM_TEST_FILE_NUM_DATA_POINTS (default: 5000): the number of data points in the test file
PLSSVM_TEST_FILE_NUM_FEATURES (default: 2000): the number of features per data point in the test file
PLSSVM_TEST_FILE_NUM_CLASSES (default: 4): the number of classes in the test file

If PLSSVM_ENABLE_PERFORMANCE_TRACKING is set to ON, the following option can also be set:

PLSSVM_ENABLE_HARDWARE_SAMPLING=ON|OFF (default: OFF): enable hardware sampling like current clock frequencies, memory usage, or power draw

If PLSSVM_ENABLE_HARDWARE_SAMPLING is set to ON, the following options can also be set:

PLSSVM_HARDWARE_SAMPLING_INTERVAL (default: 100): the sampling interval for the plssvm-train, plssvm-predict, and plssvm-scale executables in milliseconds

If PLSSVM_ENABLE_LANGUAGE_BINDINGS is set to ON, the following option can also be set:

PLSSVM_ENABLE_PYTHON_BINDINGS=ON|OFF (default: PLSSVM_ENABLE_LANGUAGE_BINDINGS): enable Python bindings using Pybind11; note: PLSSVM_ENABLE_LANGUAGE_BINDINGS must be set that this option has any effect

If the OpenCL backend is available and NVIDIA GPUs should be targeted, an additional option can be set.

PLSSVM_OPENCL_BACKEND_ENABLE_PTX_INLINE_ASSEMBLY=ON|OFF (default: ON): enable PTX inline assembly to speed up the FP32/FP64 atomicAdd implementations on NVIDIA GPUs. Note: requires sm_60 or newer!

If the SYCL backend is available, additional options can be set.

PLSSVM_ENABLE_SYCL_ADAPTIVECPP_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for AdaptiveCpp as implementation for the SYCL backend and fail if not available
- AUTO: check for AdaptiveCpp as implementation for the SYCL backend but do not fail if not available
- OFF: do not check for AdaptiveCpp as implementation for the SYCL backend
PLSSVM_ENABLE_SYCL_DPCPP_BACKEND=ON|OFF|AUTO (default: AUTO):
- ON: check for DPC++/icpx as implementation for the SYCL backend and fail if not available
- AUTO: check for DPC++/icpx as implementation for the SYCL backend but do not fail if not available
- OFF: do not check for DPC++/icpx as implementation for the SYCL backend
PLSSVM_ENABLE_SYCL_HIERARCHICAL_AND_SCOPED_KERNELS (default: ON): enable SYCL's hierarchical and AdaptiveCpp's scoped kernel invocation types

To use DPC++/icpx for SYCL, simply set the CMAKE_CXX_COMPILER to the respective DPC++/icpx clang executable during CMake invocation.

If the SYCL implementation is DPC++/icpx the following additional options are available:

PLSSVM_SYCL_BACKEND_DPCPP_USE_LEVEL_ZERO (default: ON): use DPC++/icpx's Level-Zero backend instead of its OpenCL backend (only available if a CPU or Intel GPU is targeted)

If the SYCL implementation is AdaptiveCpp the following additional option is available:

PLSSVM_SYCL_BACKEND_ADAPTIVECPP_USE_GENERIC_SSCP (default: ON): use AdaptiveCpp's new SSCP compilation flow

If more than one SYCL implementation is available the environment variables PLSSVM_SYCL_ADAPTIVECPP_INCLUDE_DIR and PLSSVM_SYCL_DPCPP_INCLUDE_DIR must be set to the respective SYCL include paths. Note that those paths must not be present in the CPLUS_INCLUDE_PATH environment variable or compilation will fail.

PLSSVM_SYCL_BACKEND_PREFERRED_IMPLEMENTATION (dpcpp|adaptivecpp): specify the preferred SYCL implementation if the sycl_implementation_type option is set to automatic; additional the specified SYCL implementation is used in the plssvm::sycl namespace, the other implementations are available in the plssvm::dpcpp and plssvm::adaptivecpp namespace respectively

If the Kokkos backend is available, an additional option can be set.

PLSSVM_KOKKOS_BACKEND_SYCL_ENABLE_MULTI_GPU (default: OFF): enable multi-GPU support for the Kokkos::SYCL execution space; broken in Kokkos as of version 4.6.00!

If the stdpar backend is available, an additional option can be set.

PLSSVM_STDPAR_BACKEND_IMPLEMENTATION (default: AUTO): explicitly specify the used stdpar implementation; must be one of: AUTO, NVHPC, roc-stdpar, IntelLLVM, ACPP, GNU_TBB.

If the stdpar implementation is AdaptiveCpp, the following additional option is available:

PLSSVM_STDPAR_BACKEND_ACPP_USE_GENERIC_SSCP (default: ON): use AdaptiveCpp's new SSCP compilation flow

If the stdpar implementation is roc-stdpar, the following additional option is available:

PLSSVM_STDPAR_BACKEND_ROCSTDPAR_USE_INTERPOSE_ALLOC=ON|OFF|AUTO (default: AUTO):
- ON: always set the --hipstdpar-interpose-alloc compiler flag
- AUTO: only set the --hipstdpar-interpose-alloc compiler flag if the environment variable HSA_XNACK is not defined or set to 0
- OFF: never set the --hipstdpar-interpose-alloc compiler flag

CMake presets

We also provide a number of basic CMake presets. We currently have configure, build, test, and workflow presets. As an example, to list the available configure presets, cmake --list-presets is used (for more information regarding CMake presets see the CMake documentation):

Available configure presets:

  "openmp"                  - OpenMP backend
  "openmp_python"           - OpenMP backend + Python bindings
  "openmp_test"             - OpenMP backend tests
  "hpx"                     - HPX backend
  "hpx_python"              - HPX backend + Python bindings
  "hpx_test"                - HPX backend tests
  "stdpar"                  - stdpar backend
  "stdpar_python"           - stdpar backend + Python bindings
  "stdpar_test"             - stdpar backend tests
  "stdpar_gcc"              - stdpar GCC + TBB backend
  "stdpar_gcc_python"       - stdpar GCC + TBB backend + Python bindings
  "stdpar_gcc_test"         - stdpar GCC + TBB backend tests
  "stdpar_nvhpc"            - stdpar NVHPC (nvc++) backend
  "stdpar_nvhpc_python"     - stdpar NVHPC (nvc++) backend + Python bindings
  "stdpar_nvhpc_test"       - stdpar NVHPC (nvc++) backend tests
  "stdpar_rocstdpar"        - stdpar rocstdpar backend
  "stdpar_rocstdpar_python" - stdpar rocstdpar backend + Python bindings
  "stdpar_rocstdpar_test"   - stdpar rocstdpar backend tests
  "stdpar_acpp"             - stdpar AdaptiveCpp backend
  "stdpar_acpp_python"      - stdpar AdaptiveCpp backend + Python bindings
  "stdpar_acpp_test"        - stdpar AdaptiveCpp backend tests
  "stdpar_intelllvm"        - stdpar IntelLLVM (icpx) backend
  "stdpar_intelllvm_python" - stdpar IntelLLVM (icpx) backend + Python bindings
  "stdpar_intelllvm_test"   - stdpar IntelLLVM (icpx) backend tests
  "cuda"                    - CUDA backend
  "cuda_python"             - CUDA backend + Python bindings
  "cuda_test"               - CUDA backend tests
  "hip"                     - HIP backend
  "hip_python"              - HIP backend + Python bindings
  "hip_test"                - HIP backend tests
  "opencl"                  - OpenCL backend
  "opencl_python"           - OpenCL backend + Python bindings
  "opencl_test"             - OpenCL backend tests
  "acpp"                    - AdaptiveCpp SYCL backend
  "acpp_python"             - AdaptiveCpp SYCL backend + Python bindings
  "acpp_test"               - AdaptiveCpp SYCL backend tests
  "dpcpp"                   - DPC++ SYCL backend
  "dpcpp_python"            - DPC++ backend + Python bindings
  "dpcpp_test"              - DPC++ backend tests
  "icpx"                    - icpx SYCL backend
  "icpx_python"             - icpx backend + Python bindings
  "icpx_test"               - icpx backend tests
  "kokkos"                  - Kokkos backend
  "kokkos_python"           - Kokkos backend + Python bindings
  "kokkos_test"             - Kokkos backend tests
  "all"                     - All available backends
  "all_python"              - All available backends + Python bindings
  "all_test"                - All available backends tests

With these presets, building and testing, e.g., our CUDA backend is as simple as typing (in the PLSSVM root directory):

cmake --workflow --preset cuda_test

Note: not all possible combinations of CMake presets are provided by us (e.g., performance tracking and hardware sampling) since that would result in way to many presets. However, these additional options can be enabled using normal CMake options.

Note: the all presets always exclude the stdpar backend since it is currently not supported to enable them with any other backend.

Note: the only difference between the dpcpp and icpx presets is the automatically set CMAKE_CXX_COMPILER. Internally, both presets use the same SYCL implementation.

Running the Tests

To run the tests after building the library (with PLSSVM_ENABLE_TESTING set to ON) use:

ctest

Note: due to floating point inaccuracies, it is advisable to disable PLSSVM_ENABLE_FAST_MATH for testing.

Note: GoogleTest's death tests are currently not supported in conjunction with the stdpar backend. If you wish to use test with stdpar, you have to set PLSSVM_ENABLE_ASSERTS to OFF.

Note: If the used stdpar implementation is nvc++, PLSSVM_ENABLE_PERFORMANCE_TRACKING must be set to OFF in order to run the tests.

Note: the stdpar tests may fail if executed in parallel via ctest -j $(nproc).

Note: our tests do not support the execution with more than one MPI process launched via mpirun.

Generating Test Coverage Results

To enable the generation of test coverage reports using locv the library must be compiled using the custom Coverage CMAKE_BUILD_TYPE. Additionally, it's advisable to use smaller test files to shorten the ctest step.

cmake -DCMAKE_BUILD_TYPE=Coverage -DPLSSVM_TARGET_PLATFORMS="..." \
      -DPLSSVM_TEST_FILE_NUM_DATA_POINTS=100 \
      -DPLSSVM_TEST_FILE_NUM_FEATURES=50 ..
cmake --build . -- coverage

The resulting html coverage report is located in the coverage folder in the build directory.

Automatic Source File Formatting

To enable automatic formatting PLSSVM_ENABLE_FORMATTING must be set to ON and a clang-format and git executables must be available in PATH (minimum clang-format version is 18). Additionally, our custom cmake-format fork must be used since it has incorporated some necessary patches. Our cmake-format can be installed via:

pip install "git+https://github.com/vancraar/cmake_format@master"

To check whether formatting changes must be applied use:

cmake --build . --target check-cmake-format
cmake --build . --target check-clang-format

To auto format all files use:

cmake --build . --target fix-cmake-format
cmake --build . --target fix-clang-format

Creating the Documentation

If doxygen is installed and PLSSVM_ENABLE_DOCUMENTATION is set to ON the documentation can be build using

cmake --build . -- doc

The documentation of the current state of the main branch can be found here.

Installing

Install via CMake

The library supports the install target:

cmake --build . -- install

Afterward, the necessary exports should be performed:

export CMAKE_PREFIX_PATH=${CMAKE_INSTALL_PREFIX}/share/plssvm/cmake:${CMAKE_PREFIX_PATH}
export MANPATH=${CMAKE_INSTALL_PREFIX}/share/man:$MANPATH

export PATH=${CMAKE_INSTALL_PREFIX}/bin:${PATH}
export LD_LIBRARY_PATH=${CMAKE_INSTALL_PREFIX}/lib:${CMAKE_INSTALL_PREFIX}/lib64:${LD_LIBRARY_PATH}
export CPLUS_INCLUDE_PATH=${CMAKE_INSTALL_PREFIX}/include:${CPLUS_INCLUDE_PATH}

If our library was built with the Python bindings enabled, the PYTHONPATH must additionally be set:

export PYTHONPATH=${CMAKE_INSTALL_PREFIX}/lib:${CMAKE_INSTALL_PREFIX}/lib64:${PYTHONPATH}

Install via pip

We also support a pip packages that can be used to install our library:

pip install plssvm

This pip install behaves as if the CMake all_python preset is used. This means that the PLSSVM_TARGET_PLATFORMS are automatically determined and PLSSVM is build with all supported backends that available on the target machine at the point of the pip install plssvm invocation. To check the installation, including, e.g., the installed backends, we provide the plssvm-install-check command after PLSSVM has been installed via pip. An example output of this command can look like:

PLSSVM - Parallel Least Squares Support Vector Machine (3.0.0)

Copyright(C) 2018-today The PLSSVM project - All Rights Reserved
This is free software distributed under the MIT license.

Available target platforms: TargetPlatform.AUTOMATIC, TargetPlatform.GPU_NVIDIA, TargetPlatform.CPU
Default target platform: TargetPlatform.GPU_NVIDIA

Available backends: BackendType.AUTOMATIC, BackendType.OPENMP, BackendType.CUDA, BackendType.OPENCL, BackendType.SYCL
Default backend for target platform TargetPlatform.GPU_NVIDIA: BackendType.CUDA
Default backend for target platform TargetPlatform.CPU: BackendType.SYCL

Available SYCL implementations: ImplementationType.AUTOMATIC, ImplementationType.ADAPTIVECPP


Repository: https://github.com/SC-SGS/PLSSVM.git
Documentation: https://sc-sgs.github.io/PLSSVM
Issues: https://github.com/SC-SGS/PLSSVM/issues

Usage

PLSSVM provides three executables: plssvm-train, plssvm-predict, and plssvm-scale. In addition, PLSSVM can also be used as a library in third-party code. For more information, see the respective man pages which are installed via cmake --build . -- install.

Generating Artificial Data

The repository comes with a Python3 script (in the utility_scripts/ directory) to simply generate arbitrarily large classification and regression data sets.

In order to use all functionality, the following Python3 modules must be installed: argparse, timeit, numpy, pandas, sklearn, arff, matplotlib, mpl_toolkits, and humanize.

usage: generate_data.py [-h] [--output OUTPUT] [--format FORMAT] --samples SAMPLES [--test_samples TEST_SAMPLES] --features FEATURES [--scale SCALE SCALE] [--plot] {classification,regression} ...

positional arguments:
  {classification,regression}
    classification      create a classification data set
    regression          create regression data set

optional arguments:
  -h, -?, --help        show this help message and exit
  --output OUTPUT       the output file to write the samples to (without extension)
  --format FORMAT       the file format; either arff, libsvm, or csv
  --samples SAMPLES     the number of training samples to generate
  --test_samples TEST_SAMPLES
                        the number of test samples to generate; default: 0
  --features FEATURES   the number of features per data point
  --scale SCALE SCALE   scale the features to the provided range
  --plot                plot training samples; only possible if 0 < samples <= 2000 and 1 < features <= 3


classification specific arguments:

usage: generate_data.py classification [-h] [--problem {blobs,blobs_merged,planes,ball}] [--classes CLASSES]

optional arguments:
  -h, --help            show this help message and exit
  --problem {blobs,blobs_merged,planes,ball}
                        the problem to solve
  --classes CLASSES     the number of classes to generate; default: 2


regression specific arguments:

usage: generate_data.py regression [-h] [--problem {linear,linear_noisy,friedman1}]

optional arguments:
  -h, --help            show this help message and exit
  --problem {linear,linear_noisy,friedman1}
                        the problem to solve

An example invocation generating a classification data set consisting of blobs with 1000 data points with 200 features each and 4 classes could look like:

python3 generate_data.py --output data_file --format libsvm --problem blobs --samples 1000 --features 200 classification --classes 4

An example invocation generating a linear regression data set consisting of 1000 data points with 200 features each could look like:

python3 generate_data.py --output data_file --format libsvm --problem linear --samples 1000 --features 200 regression

Training using `plssvm-train`

./plssvm-train --help

LS-SVM with multiple (GPU-)backends
Usage:
  ./plssvm-train [OPTION...] training_set_file [model_file]

  -s, --svm_type arg            set type of SVM
                                         0 -- C-SVC
                                         1 -- C-SVR (default: 0)
  -t, --kernel_type arg         set type of kernel function. 
                                         0 -- linear: u'*v
                                         1 -- polynomial: (gamma*u'*v+coef0)^degree
                                         2 -- rbf: exp(-gamma*|u-v|^2)
                                         3 -- sigmoid: tanh(gamma*u'*v+coef0)
                                         4 -- laplacian: exp(-gamma*|u-v|_1)
                                         5 -- chi_squared: exp(-gamma*sum_i((x[i]-y[i])^2/(x[i]+y[i]))) (default: 2)
  -d, --degree arg              set degree in kernel function (default: 3)
  -g, --gamma arg               set gamma in kernel function (default: "1 / num_features")
  -r, --coef0 arg               set coef0 in kernel function (default: 0)
  -c, --cost arg                set the parameter C (default: 1)
  -e, --epsilon arg             set the tolerance of termination criterion (default: 1e-10)
  -i, --max_iter arg            set the maximum number of CG iterations (default: num_features)
  -l, --solver arg              choose the solver: automatic|cg_explicit|cg_implicit (default: automatic)
  -a, --classification arg      the classification strategy to use for multi-class classification: oaa|oao (default: oaa)
  -b, --backend arg             choose the backend: automatic|openmp|hpx|cuda|hip|opencl|sycl|kokkos|stdpar (default: automatic)
  -p, --target_platform arg     choose the target platform: automatic|cpu|gpu_nvidia|gpu_amd|gpu_intel (default: automatic)
      --sycl_kernel_invocation_type arg
                                choose the kernel invocation type when using SYCL as backend: automatic|basic|work_group|hierarchical|scoped (default: automatic)
      --sycl_implementation_type arg
                                choose the SYCL implementation to be used in the SYCL backend: automatic|dpcpp|adaptivecpp (default: automatic)
      --kokkos_execution_space arg
                                choose the Kokkos execution space to be used in the Kokkos backend: automatic|Cuda|OpenMP|Serial (default: automatic)
      --performance_tracking arg
                                the output YAML file where the performance tracking results are written to; if not provided, the results are dumped to stderr
      --mpi_load_balancing_weights arg
                                can be used to load balance for MPI (must be integers); number of provided values must match the number of MPI ranks
      --use_strings_as_labels   use strings as labels instead of plane numbers
      --verbosity               choose the level of verbosity: full|timing|libsvm|quiet (default: full)
  -q, --quiet                   quiet mode (no outputs regardless the provided verbosity level!)
  -h, --help                    print this helper message
  -v, --version                 print version information
      --input training_set_file
                                
      --model model_file

The help message only print options available based on the CMake invocation. For example, if CUDA was not available during the build step, it will not show up as possible backend in the description of the --backend option.

The most minimal example invocation is:

./plssvm-train /path/to/data_file

An example invocation using the CUDA backend could look like:

./plssvm-train --backend cuda --input /path/to/data_file

Another example targeting NVIDIA GPUs using the SYCL backend looks like:

./plssvm-train --backend sycl --target_platform gpu_nvidia --input /path/to/data_file

The --backend=automatic option works as follows:

if the gpu_nvidia target is available, check for existing backends in order cuda 🠦 hip 🠦 opencl 🠦 sycl 🠦 kokkos 🠦 stdpar
otherwise, if the gpu_amd target is available, check for existing backends in order hip 🠦 opencl 🠦 sycl 🠦 kokkos 🠦 stdpar
otherwise, if the gpu_intel target is available, check for existing backends in order sycl 🠦 opencl 🠦 kokkos 🠦 stdpar
otherwise, if the cpu target is available, check for existing backends in order sycl 🠦 kokkos 🠦 opencl 🠦 openmp 🠦 hpx 🠦 stdpar

Note that during CMake configuration it is guaranteed that at least one of the above combinations does exist.

The --target_platform=automatic option works for the different backends as follows:

OpenMP: always selects a CPU
HPX: always selects a CPU
CUDA: always selects an NVIDIA GPU (if no NVIDIA GPU is available, throws an exception)
HIP: always selects an AMD GPU (if no AMD GPU is available, throws an exception)
OpenCL: tries to find available devices in the following order: NVIDIA GPUs 🠦 AMD GPUs 🠦 Intel GPUs 🠦 CPU
SYCL: tries to find available devices in the following order: NVIDIA GPUs 🠦 AMD GPUs 🠦 Intel GPUs 🠦 CPU
Kokkos: checks which execution spaces are available and which target platforms they support and then tries to find available devices in the following order: NVIDIA GPUs 🠦 AMD GPUs 🠦 Intel GPUs 🠦 CPU
stdpar: target device must be selected at compile time (using PLSSVM_TARGET_PLATFORMS) or using environment variables at runtime

The --sycl_kernel_invocation_type and --sycl_implementation_type flags are only used if the --backend is sycl, otherwise a warning is emitted on stderr. If the --sycl_kernel_invocation_type is automatic, the work_group invocation type is currently always used. If the --sycl_implementation_type is automatic, the used SYCL implementation is determined by the PLSSVM_SYCL_BACKEND_PREFERRED_IMPLEMENTATION CMake flag. If the --kokkos_execution_space is automatic, uses the best fitting execution space based on the provided and/or available target platforms.

Predicting using `plssvm-predict`

Our predict utility is fully conform to LIBSVM's model files. This means that our plssvm-predict can be used on model files learned with, e.g., LIBSVM's svm-train. Note: this is not the case for the regression task since the svm_type filed mismatch between LIBSVM (epsilon_svr) and PLSSVM (c_svr). To automatically convert between the two, simply use the convert_model.py script (in the utility_scripts/ directory) which simply replaces these fields with the respectively expected one (note that for large files doing that manually may be faster):

usage: convert_model.py [-h] [-o OUTPUT] [--to_plssvm] [--to_libsvm] model_file

positional arguments:
  model_file            the regression model file to convert

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output the regression model to the new file, otherwise the regression model us updated inplace
  --to_plssvm           convert the regression model to a PLSSVM conform model file
  --to_libsvm           convert the regression model to a LIBSVM conform model file

An example invocation could look like:

python3 convert_model.py --to_libsvm -o 5x4_libsvm.libsvm.model 5x4.libsvm.model

Converting a PLSSVM model file to a LIBSVM model file.

After a correct model file exists, predict works as follows:

./plssvm-predict --help

LS-SVM with multiple (GPU-)backends
Usage:
  ./plssvm-predict [OPTION...] test_file model_file [output_file]

  -b, --backend arg             choose the backend: automatic|openmp|hpx|cuda|hip|opencl|sycl|kokkos|stdpar (default: automatic)
  -p, --target_platform arg     choose the target platform: automatic|cpu|gpu_nvidia|gpu_amd|gpu_intel (default: automatic)
      --sycl_kernel_invocation_type arg
                                choose the kernel invocation type when using SYCL as backend: automatic|basic|work_group|hierarchical|scoped (default: automatic)
      --sycl_implementation_type arg
                                choose the SYCL implementation to be used in the SYCL backend: automatic|dpcpp|adaptivecpp (default: automatic)
      --kokkos_execution_space arg
                                choose the Kokkos execution space to be used in the Kokkos backend: automatic|Cuda|OpenMP|Serial (default: automatic)
      --performance_tracking arg
                                the output YAML file where the performance tracking results are written to; if not provided, the results are dumped to stderr
      --mpi_load_balancing_weights arg
                                can be used to load balance for MPI (must be integers); number of provided values must match the number of MPI ranks
      --use_strings_as_labels   use strings as labels instead of plane numbers
      --verbosity               choose the level of verbosity: full|timing|libsvm|quiet (default: full)
  -q, --quiet                   quiet mode (no outputs regardless the provided verbosity level!)
  -h, --help                    print this helper message
  -v, --version                 print version information
      --test test_file          
      --model model_file        
      --output output_file

An example invocation could look like:

./plssvm-predict --backend cuda --test /path/to/test_file --model /path/to/model_file

Another example targeting NVIDIA GPUs using the SYCL backend looks like:

./plssvm-predict --backend sycl --target_platform gpu_nvidia --test /path/to/test_file --model /path/to/model_file

The --target_platform=automatic and --sycl_implementation_type flags work like in the training (./plssvm-train) case.

Data Scaling using `plssvm-scale`

./plssvm-scale --help

LS-SVM with multiple (GPU-)backends
Usage:
  ./plssvm-scale [OPTION...] input_file [scaled_file]

  -l, --lower arg               lower is the lowest (minimal) value allowed in each dimension (default: -1)
  -u, --upper arg               upper is the highest (maximal) value allowed in each dimension (default: 1)
  -f, --format arg              the file format to output the scaled data set to (default: libsvm)
  -s, --save_filename arg       the file to which the scaling factors should be saved
  -r, --restore_filename arg    the file from which previous scaling factors should be loaded
      --performance_tracking arg
                                the output YAML file where the performance tracking results are written to; if not provided, the results are dumped to stderr
      --use_strings_as_labels   use strings as labels instead of plane numbers
      --verbosity               choose the level of verbosity: full|timing|libsvm|quiet (default: full)
  -q, --quiet                   quiet mode (no outputs regardless the provided verbosity level!)
  -h, --help                    print this helper message
  -v, --version                 print version information
      --input input_file        
      --scaled scaled_file

An example invocation could look like:

./plssvm-scale -l -0.5 -u 1.5 --input /path/to/input_file --scaled /path/to/scaled_file

An example invocation to scale a train and test file in the same way looks like:

./plssvm-scale -l -1.0 -u 1.0 -s scaling_parameter.txt train_file.libsvm train_file_scaled.libsvm
./plssvm-scale -r scaling_parameter.txt test_file.libsvm test_file_scaled.libsvm

Distributed Memory Support via MPI

We support distributed memory via MPI for plssvm-train and plssvm-predict while simultaneously allowing multiple devices per MPI rank. In order to use it, MPI must be found during the CMake configuration step. Note that if MPI couldn't be found, PLSSVM still works in shared memory mode only and internally disables all MPI related functionality. For example, to run PLSSVM via MPI on four nodes simply use the normal mpirun command:

mpirun -N 4 ./plssvm-train --backend cuda --input /path/to/data_file

We also have support for a rudimentary, manual load balancing:

mpirun -N 4 ./plssvm-train mpi_load_balancing_weights=1,2,2,1 --backend cuda --input /path/to/data_file

The above command results in MPI rank 1 and 2 computing twice the matrix elements than the ranks 0 and 3. This can be used to load balance our computations in scenarios where heterogeneous hardware is used. Note that the number of provided load balancing weights must be equal to the used MPI ranks and is independent of the number of devices per MPI rank. If one MPI rank has more than one device, all these devices on one MPI rank compute the same number of matrix elements.

Our MPI implementation, however, currently has some limitations:

the training, test, and model data is fully read by every MPI rank
the training, test, and model data is fully stored on each compute device on every MPI rank
only the kernel matrix is really divided across all MPI ranks
while the expensive BLAS level 3 operations in the CG algorithm are computed in a distributed way, everything else is computed on every MPI rank
in the CG algorithm we communicate the whole matrix, although it would be sufficient to communicate only matrix parts
only the main MPI rank (per default rank 0) writes the output files
plssvm-scale does not support more than one MPI rank

Example Code for PLSSVM Used as a Library

A simple C++ program (main_classification.cpp) using PLSSVM as library for classification could look like:

#include "plssvm/core.hpp"

#include <exception>
#include <iostream>
#include <vector>

int main() {
    // correctly initialize and finalize environments
    plssvm::environment::scope_guard environment_guard{};
    
    try {
        // create a new C-SVC parameter set, explicitly overriding the default kernel function
        const plssvm::parameter params{ plssvm::kernel_type = plssvm::kernel_function_type::polynomial };

        // create two data sets: one with the training data scaled to [-1, 1] 
        // and one with the test data scaled like the training data
        const plssvm::classification_data_set train_data{ "train_file.libsvm", { -1.0, 1.0 } };
        const plssvm::classification_data_set test_data{ "test_file.libsvm", train_data.scaling_factors()->get() };

        // create C-SVC using the default backend and the previously defined parameter
        const auto svc = plssvm::make_csvc(params);

        // fit using the training data, (optionally) set the termination criterion
        const plssvm::classification_model model = svc->fit(train_data, plssvm::epsilon = 1e-6);

        // get accuracy of the trained model
        const double model_accuracy = svc->score(model);
        std::cout << "model accuracy: " << model_accuracy << std::endl;

        // predict the labels
        const std::vector<int> predicted_label = svc->predict(model, test_data);
        // output a more complete classification report
        const std::vector<int> &correct_label = test_data.labels().value();
        std::cout << plssvm::classification_report{ correct_label, predicted_label } << std::endl;

        // write model file to disk
        model.save("model_file.libsvm");
    } catch (const plssvm::exception &e) {
        std::cerr << e.what_with_loc() << std::endl;
    } catch (const std::exception &e) {
        std::cerr << e.what() << std::endl;
    }

    return 0;
}

A simple C++ program (main_regression.cpp) using PLSSVM as library for regression could look like:

#include "plssvm/core.hpp"

#include <exception>
#include <iostream>
#include <vector>

int main() {
    // correctly initialize and finalize environments
    plssvm::environment::scope_guard environment_guard{};
    
    try {
        // create a new C-SVR parameter set, explicitly overriding the default kernel function
        const plssvm::parameter params{ plssvm::kernel_type = plssvm::kernel_function_type::polynomial };

        // create two data sets: one with the training data scaled to [-1, 1] 
        // and one with the test data scaled like the training data
        const plssvm::regression_data_set train_data{ "train_file.libsvm", { -1.0, 1.0 } };
        const plssvm::regression_data_set test_data{ "test_file.libsvm", train_data.scaling_factors()->get() };

        // create C-SVR using the default backend and the previously defined parameter
        const auto svr = plssvm::make_csvr(params);

        // fit using the training data, (optionally) set the termination criterion
        const plssvm::regression_model model = svr->fit(train_data, plssvm::epsilon = 1e-6);

        // get accuracy of the trained model
        const double model_accuracy = svr->score(model);
        std::cout << "model accuracy: " << model_accuracy << std::endl;

        // predict the labels
        const std::vector<plssvm::real_type> predicted_values = svc->predict(model, test_data);
        // output a more complete regression report
        const std::vector<plssvm::real_type> &correct_values = test_data.labels().value();
        std::cout << plssvm::regression_report{ correct_label, predicted_label } << std::endl;

        // write model file to disk
        model.save("model_file.libsvm");
    } catch (const plssvm::exception &e) {
        std::cerr << e.what_with_loc() << std::endl;
    } catch (const std::exception &e) {
        std::cerr << e.what() << std::endl;
    }

    return 0;
}

The examples/cpp directory also contains the same examples using MPI to support distributed memory systems.

With a corresponding minimal CMake file:

cmake_minimum_required(VERSION 3.25)

project(LibraryUsageExample LANGUAGES CXX)

find_package(plssvm CONFIG REQUIRED)
# CMake's COMPONENTS mechanism can also be used if a specific library component is required, e.g.:
# find_package(plssvm REQUIRED COMPONENTS CUDA)

# classification executable example
add_executable(classification main_classification.cpp)
# classification executable example using MPI
add_executable(classification_mpi main_classification_mpi.cpp)
# regression executable example
add_executable(regression main_regression.cpp)
# regression executable example using MPI
add_executable(regression_mpi main_regression_mpi.cpp)

# link PLSSVM against executables
foreach (target classification classification_mpi regression regression_mpi)
    target_compile_features(${target} PUBLIC cxx_std_17)
    target_link_libraries(${target} PUBLIC plssvm::plssvm)
    # can also only link against a single library component, e.g.:
    # target_link_libraries(${target} PUBLIC plssvm::cuda)
endforeach ()

The examples/python directory contains the same examples using our PLSSVM Python bindings. Additionally, it contains Python examples leveraging MPI to target distributed memory systems.

Example Using the `sklearn` like Python Bindings Available For PLSSVM

A classification example using PLSSVM's SVC Python binding and sklearn's breast cancer data set:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

########################################################################################################################
# Authors: Alexander Van Craen, Marcel Breyer                                                                          #
# Copyright (C): 2018-today The PLSSVM project - All Rights Reserved                                                   #
# License: This file is part of the PLSSVM project which is released under the MIT license.                            #
#          See the LICENSE.md file in the project root for full license information.                                   #
########################################################################################################################

import matplotlib.pyplot as plt
import sklearn.datasets
import sklearn.metrics
import sklearn.inspection
import numpy as np
from plssvm.svm import SVC  # identical to from sklearn.svm import SVC

# load the breast cancer datasets
cancer = sklearn.datasets.load_breast_cancer()
X = cancer.data[:, :2]
y = cancer.target
y_label = cancer.target_names

# build the SVC model
svm = SVC(kernel="rbf", gamma=0.5, C=1.0).fit(X, y)

# score the model
print(sklearn.metrics.classification_report(y, svm.predict(X)))
print("Score: {:.2f}%".format(svm.score(X, y) * 100))

# plot the decision boundary
sklearn.inspection.DecisionBoundaryDisplay.from_estimator(
    svm,
    X,
    response_method="predict",
    cmap=plt.cm.Spectral,
    alpha=0.8,
    xlabel=cancer.feature_names[0],
    ylabel=cancer.feature_names[1],
)

# scatter plot the decision boundary
viridis = plt.cm.get_cmap('viridis', len(np.unique(y)))
plt.scatter(X[:, 0], X[:, 1],
            cmap=viridis,
            c=y,
            s=20, edgecolors="k")

# generate legend handles and add handle
legend_handles = [plt.scatter([], [], color=viridis(color), label=f'{label}')
                  for label, color in zip(y_label, np.unique(y))]
plt.legend(handles=legend_handles)

plt.title("SVC classifier on breast cancer dataset")
plt.show()

with an example output:

              precision    recall  f1-score   support

           0       0.91      0.85      0.88       212
           1       0.91      0.95      0.93       357

    accuracy                           0.91       569
   macro avg       0.91      0.90      0.91       569
weighted avg       0.91      0.91      0.91       569

Score: 91.39%

A regression example comparing PLSSVM's SVR Python binding and sklearn.SVR using a sine curve:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

########################################################################################################################
# Authors: Alexander Van Craen, Marcel Breyer                                                                          #
# Copyright (C): 2018-today The PLSSVM project - All Rights Reserved                                                   #
# License: This file is part of the PLSSVM project which is released under the MIT license.                            #
#          See the LICENSE.md file in the project root for full license information.                                   #
########################################################################################################################

import numpy as np
import matplotlib.pyplot as plt

# generate sample data (sine curve with noise)
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()

# add noise to targets
y[::5] += 3 * (0.5 - np.random.rand(8))

plt.scatter(X, y, color='darkorange', label='data')

# fit the sklearn regression model
from sklearn.svm import SVR

sklearn_svr_lin = SVR(kernel='linear', C=100, epsilon=0.1)
y_lin_sklearn = sklearn_svr_lin.fit(X, y).predict(X)
plt.plot(X, y_lin_sklearn, lw=2, linestyle='dashed', label='Linear model sklearn')

sklearn_svr_poly = SVR(kernel='poly', C=100, degree=3, epsilon=0.1, coef0=1)
y_poly_sklearn = sklearn_svr_poly.fit(X, y).predict(X)
plt.plot(X, y_poly_sklearn, lw=2, linestyle='dashed', label='Polynomial model sklearn')

sklearn_svr_rbf = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
y_rbf_sklearn = sklearn_svr_rbf.fit(X, y).predict(X)
plt.plot(X, y_rbf_sklearn, lw=2, linestyle='dashed', label='RBF model sklearn')

# fit the PLSSVM regression model
from plssvm.svm import SVR

plssvm_svr_lin = SVR(kernel='linear', C=100)
y_lin_plssvm = plssvm_svr_lin.fit(X, y).predict(X)
plt.plot(X, y_lin_plssvm, lw=2, label='Linear model plssvm')

plssvm_svr_poly = SVR(kernel='poly', C=100, degree=3, coef0=1)
y_poly_plssvm = plssvm_svr_poly.fit(X, y).predict(X)
plt.plot(X, y_poly_plssvm, lw=2, label='Polynomial model plssvm')

plssvm_svr_rbf = SVR(kernel='rbf', C=100, gamma=0.1)
y_rbf_plssvm = plssvm_svr_rbf.fit(X, y).predict(X)
plt.plot(X, y_rbf_plssvm, lw=2, label='RBF model plssvm')

# show the result plots
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()

with an example output:

Note that currently not all sklearn SVC and SVR functionality has been implemented in PLSSVM. The respective functions will throw a Python AttributeError if called. For a detailed overview of the functions that are currently implemented, see our API documentation.

There are more examples located in the examples/python/sklearn directory that are copied from the sklearn repository and slightly changed for PLSSVM.

Citing PLSSVM

If you use PLSSVM in your research, we kindly request you to cite:

@inproceedings{9835379,
  author={Van Craen, Alexander and Breyer, Marcel and Pfl\"{u}ger, Dirk},
  booktitle={2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)}, 
  title={PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine}, 
  year={2022},
  volume={},
  number={},
  pages={818-827},
  doi={10.1109/IPDPSW55747.2022.00138}
}

For a full list of all publications involving PLSSVM see our Wiki Page.

License

The PLSSVM library is distributed under the MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLSSVM - Parallel Least Squares Support Vector Machine

Table of Contents

Introduction to PLSSVM

Getting Started

Dependencies

Building PLSSVM

(Automatic) Target Platform Selection

Optional CMake Options

CMake presets

Running the Tests

Generating Test Coverage Results

Automatic Source File Formatting

Creating the Documentation

Installing

Install via CMake

Install via pip

Usage

Generating Artificial Data

Training using `plssvm-train`

Predicting using `plssvm-predict`

Data Scaling using `plssvm-scale`

Distributed Memory Support via MPI

Example Code for PLSSVM Used as a Library

Example Using the `sklearn` like Python Bindings Available For PLSSVM

Citing PLSSVM

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PLSSVM - Parallel Least Squares Support Vector Machine

Table of Contents

Introduction to PLSSVM

Getting Started

Dependencies

Building PLSSVM

(Automatic) Target Platform Selection

Optional CMake Options

CMake presets

Running the Tests

Generating Test Coverage Results

Automatic Source File Formatting

Creating the Documentation

Installing

Install via CMake

Install via pip

Usage

Generating Artificial Data

Training using plssvm-train

Predicting using plssvm-predict

Data Scaling using plssvm-scale

Distributed Memory Support via MPI

Example Code for PLSSVM Used as a Library

Example Using the sklearn like Python Bindings Available For PLSSVM

Citing PLSSVM

License

Training using `plssvm-train`

Predicting using `plssvm-predict`

Data Scaling using `plssvm-scale`

Example Using the `sklearn` like Python Bindings Available For PLSSVM