Pybind11 bindings for whisper.cpp
Install with pip:
pip install whispercppTo use the latest version, install from source:
pip install git+https://github.com/aarnphm/whispercpp.gitFor local setup, initialize all submodules:
git submodule update --init --recursiveBuild the wheel:
# Option 1: using pypa/build
python3 -m build -w
# Option 2: using bazel
./tools/bazel build //:whispercpp_wheelInstall the wheel:
# Option 1: via pypa/build
pip install dist/*.whl
# Option 2: using bazel
pip install $(./tools/bazel info bazel-bin)/*.whlThe binding provides a Whisper class:
from whispercpp import Whisper
w = Whisper.from_pretrained("tiny.en")Currently, the inference API is provided via transcribe:
w.transcribe(np.ones((1, 16000)))You can use ffmpeg or librosa
to load audio files into a Numpy array, then pass it to transcribe:
import ffmpeg
import numpy as np
try:
y, _ = (
ffmpeg.input("/path/to/audio.wav", threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
.run(
cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
w.transcribe(arr)The Pybind11 bindings supports all of the features from whisper.cpp.
The binding can also be used via api:
from whispercpp import api
ctx = api.Context.from_file("/path/to/saved_weight.bin")
params = api.Params()
ctx.full(arr, params)See DEVELOPMENT.md
Whisper.from_pretrained(model_name: str) -> WhisperLoad a pre-trained model from the local cache or download and cache if needed.
w = Whisper.from_pretrained("tiny.en")
The model will be saved to $XDG_DATA_HOME/whispercpp or ~/.local/share/whispercpp if the environment variable is
not set.
Whisper.transcribe(arr: NDArray[np.float32], num_proc: int = 1)Running transcription on a given Numpy array. This calls
fullfromwhisper.cpp. Ifnum_procis greater than 1, it will usefull_parallelinstead.w.transcribe(np.ones((1, 16000)))
api is a direct binding from whisper.cpp, that has similar APIs to whisper-rs.
api.ContextThis class is a wrapper around
whisper_contextfrom whispercpp import api ctx = api.Context.from_file("/path/to/saved_weight.bin")
Note
The context can also be accessed from the
Whisperclass viaw.contextapi.ParamsThis class is a wrapper around
whisper_paramsfrom whispercpp import api params = api.Params()
Note
The params can also be accessed from the
Whisperclass viaw.params
- whispercpp.py. There are a few key differences here:
- They provides the Cython bindings. From the UX standpoint, this achieves the same goal as
whispercpp. The difference iswhispercppuse Pybind11 instead. Feel free to use it if you prefer Cython over Pybind11. Note thatwhispercpp.pyandwhispercppare mutually exclusive, as they also use thewhispercppnamespace. whispercppdoesn't pollute your$HOMEdirectory, rather it follows the XDG Base Directory Specification for saved weights.
- They provides the Cython bindings. From the UX standpoint, this achieves the same goal as
- Using
cdllandctypesand be done with it?- This is also valid, but requires a lot of hacking and it is pretty slow comparing to Cython and Pybind11.