Skip to content

feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery#10352

Open
localai-bot wants to merge 3 commits into
masterfrom
backend/depth-anything
Open

feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery#10352
localai-bot wants to merge 3 commits into
masterfrom
backend/depth-anything

Conversation

@localai-bot

@localai-bot localai-bot commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What

Adds a new depth-anything backend to LocalAI, mirroring the existing locate-anything-cpp backend. It wraps the Depth Anything 3 ggml port — monocular metric depth + camera pose estimation in C++/ggml, loaded via purego (cgo-less, no Python at inference).

Depth has no native OpenAI endpoint, so the model is exposed two ways:

  • GenerateImage(src, dst) — runs depth on the src image and writes a min-max-normalised grayscale depth PNG to dst.
  • Predict(images[0]) — runs depth+pose and returns a JSON blob with the depth dimensions, depth stats and the camera extrinsics (3x4) / intrinsics (3x3).

The C side shares a ggml graph allocator and is not reentrant, so the backend embeds base.SingleThread to serialize inference.

Backend

backend/go/depth-anything-cpp/ mirrors locate-anything-cpp's structure (single Go package main: main.go dlopens the variant .so and registers the da_capi_* symbols; godepthanythingcpp.go implements Load/Predict/GenerateImage). The CMakeLists.txt + Makefile clone and build depth-anything.cpp pinned to commit 61ede2a6f1402aab3875729126830b61561db6ae, using -DDA_SHARED=ON -DDA_BUILD_CLI=OFF -DDA_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF to produce a self-contained libdepthanythingcpp-<variant>.so (ggml linked statically) per CPU variant (avx/avx2/avx512/fallback), exactly like locate-anything-cpp.

Models (gallery)

Eight Depth Anything 3 GGUFs published at huggingface.co/mudler/depth-anything.cpp-gguf, all backend: depth-anything:

  • depth-anything-3-base (q4_k default), -q8_0, -f16, -f32
  • depth-anything-3-small (vits), -large (vitl), -giant (vitg)
  • depth-anything-3-mono-large (monocular, depth + sky mask, no pose)

Registration

  • backend/index.yamldepth-anything meta + all hardware-variant capability/image-tag entries (cpu, cuda12, cuda13, intel-sycl-f32/f16, vulkan, nvidia-l4t-arm64, cuda13-l4t), same quay.io/go-skynet/local-ai-backends:... + localai/localai-backends:... mirror scheme as locate-anything-cpp.
  • .github/backend-matrix.yml — one build entry per hardware variant for backend: "depth-anything-cpp".

Note / deviation

Faithfully mirrors locate-anything-cpp, which is part of LocalAI's root Go module (no per-backend go.mod/proto). The backend therefore uses pkg/grpc + pkg/grpc/proto + base.SingleThread rather than the standalone go.mod/proto from the depth-anything.cpp repo — required for compatibility with LocalAI's gRPC server contract.

API — typed Depth RPC + POST /v1/depth

This PR also adds a typed Depth gRPC RPC (mirroring the existing Detect
RPC end-to-end) and a REST endpoint that expose the full Depth Anything 3
output surface, not just the depth PNG / stats JSON.

Proto (backend/backend.proto):

rpc Depth(DepthRequest) returns (DepthResponse) {}

message DepthRequest {
  string src = 1;                  // image: filesystem path or base64 payload
  string dst = 2;                  // optional output dir for exports
  bool include_depth = 3;
  bool include_confidence = 4;
  bool include_pose = 5;
  bool include_sky = 6;
  bool include_points = 7;
  float points_conf_thresh = 8;
  repeated string exports = 9;     // "glb", "colmap"
}

message DepthResponse {
  int32 width = 1;  int32 height = 2;
  repeated float depth = 3;        // width*height row-major metric depth
  repeated float confidence = 4;   // DualDPT
  repeated float sky = 5;          // mono models
  repeated float extrinsics = 6;   // 12 (3x4)
  repeated float intrinsics = 7;   // 9 (3x3)
  int32 num_points = 8;
  repeated float points = 9;       // num_points*3 xyz (world space)
  bytes point_colors = 10;         // num_points*3 u8 rgb
  repeated string export_paths = 11;
  bool is_metric = 12;
}

REST: POST /v1/depth (request schema.DepthRequestschema.DepthResponse).
Accepts the image as a file path, base64, or URL (like the other vision
endpoints); point_colors is base64-encoded in the JSON response. When no
include_* flag is set the endpoint returns everything the model produces
(depth + confidence + pose + sky). Routed via a new depth usecase
(FLAG_DEPTH / MethodDepth) wired into backend_capabilities.go for the
depth-anything backend.

Full data surface: per-pixel metric depth + confidence (DualDPT) or
depth + sky (mono), camera pose (extrinsics 3x4 / intrinsics 3x3), a
back-projected 3D point cloud (xyz + rgb, conf-thresholded), and glb /
COLMAP exports — backed by the native ABI-3 C-API
(da_capi_depth_dense, da_capi_points, da_capi_export_glb,
da_capi_export_colmap). The existing Predict (stats JSON) and
GenerateImage (depth PNG) paths are kept.

Wiring mirrors Detect across every layer: backend.proto, the regenerated
pkg/grpc/proto (gitignored, built via make protogen-go), pkg/grpc
(interface.go/backend.go/client.go/base/embed/server),
pkg/model + core/services/nodes client wrappers, core/backend/depth.go,
and core/http/endpoints/localai/depth.go + route registration.

Test Plan

  • cd backend/go/depth-anything-cpp && make depth-anything-cpp (clones + builds depth-anything.cpp variants + Go binary)
  • make test (downloads depth-anything-small-f32.gguf + a test image, runs the gRPC Load/Predict smoke test in main_test.go)
  • make package produces a self-contained package dir (run.sh + variant .sos + bundled libc/libstdc++/libgomp)
  • backend-matrix CI builds the depth-anything-cpp images across hardware variants
  • Gallery: install depth-anything-3-base, call GenerateImage (depth PNG) and Predict (JSON depth stats + pose)

🤖 Generated with Claude Code

… + gallery

Mirrors the locate-anything-cpp backend to register a new depth-anything
backend that wraps the Depth Anything 3 ggml port (depth-anything.cpp) via
purego (cgo-less, no Python at inference).

- backend/go/depth-anything-cpp/: gRPC backend (Load + Predict + GenerateImage),
  purego binding to the da_capi_* C ABI, CMake/Makefile/run/package/test scripts
  building depth-anything.cpp's DA_SHARED static .so per CPU variant.
- backend/index.yaml: depth-anything backend meta + all hardware-variant
  capability entries (cpu/cuda12/cuda13/intel-sycl-f32+f16/vulkan/nvidia-l4t).
- gallery/index.yaml: 8 Depth Anything 3 GGUF models (base q4_k/q8_0/f16/f32,
  small, large, giant, mono-large).
- .github/backend-matrix.yml: one build entry per hardware variant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
img.Pix[y*img.Stride+x] = uint8(n * 255)
}
}
f, err := os.Create(dst)
h, w = int(ch), int(cw)
n := h * w
if n > 0 {
src := unsafe.Slice(ptr, n)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
return nil, fmt.Errorf("depth-anything-cpp: mkdir export dir: %w", err)
}
dstDir = tmp
} else if err := os.MkdirAll(dstDir, 0o755); err != nil {
paths = append(paths, out)
case "colmap":
out := filepath.Join(dstDir, "colmap")
if err := os.MkdirAll(out, 0o755); err != nil {
if p == nil || n <= 0 {
return nil
}
src := unsafe.Slice(p, n)
if p == nil || n <= 0 {
return nil
}
src := unsafe.Slice(p, n)
The Depth RPC handler calls da_capi_depth_dense / da_capi_points (C-API ABI 3);
pin the native build to the commit that exports them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants