Image→3D: TripoSG backend — 1.5B rectified-flow geometry (MIT), selectable next to TripoSR#794
Image→3D: TripoSG backend — 1.5B rectified-flow geometry (MIT), selectable next to TripoSR#794fernandotonon wants to merge 22 commits into
Conversation
… (MIT/MIT)
Second image-to-3D backend, selectable everywhere TripoSR is: TripoSG
(VAST-AI, SIGGRAPH 2025 — MIT code AND weights; geometry quality reported at
commercial Tripo 2.0 level, Normal-FID 5.81 vs ~20 for TripoSR-class LRMs).
- TripoSGPredictor (7th ONNX consumer): DINOv2-224 image encoder (mean/std
baked into the exported graph; CFG uncond = zeros) -> hand-rolled C++
rectified-flow Euler loop over the DiT step graph (sigma_i = 1 - i/N,
timestep = 1000*sigma, update x += (sigma_i - sigma_{i+1})*v — the sign is
the OPPOSITE of stock diffusers FlowMatchEuler; CFG via two B=1 calls,
guidance 7.0, user steps knob default 25) -> VAE latent kv-cache graph run
ONCE -> per-point field decoder tiled in chunks (inside-positive, iso 0,
bounds ±1.005) -> the existing native MarchingCubes + Taubin/reproject
polish. Geometry-only: texture bake/PBR/upscale stages stay TripoSR-only;
TripoSG's background removal composites over WHITE per its reference
pipeline (vs TripoSR's gray-128).
- Deterministic seeding (same image + params -> same mesh); fp32 DiT ships
as .onnx + .onnx.data (>2GB external weights) with an int8 single-file
tier mapped from Quality::Int8; models download on first use with a clean
'not hosted yet' error until the export is run + hosted (verified).
- scripts/export-triposg-onnx.py (offline dev tool, not shipped) + measured
architecture contract in docs/TRIPOSG_EXPORT_NOTES.md (upstream source +
HF configs; license audit incl. the briaai/RMBG NON-COMMERCIAL trap —
runtime substitutes our Apache-2.0 U²-Net).
- Surfaces: CLI --backend triposr|triposg --flow-steps N; MCP backend +
flow_steps args + schema; GUI Backend dropdown (texture checkboxes
auto-disable for TripoSG) + a 'Denoise (flow steps)' row in the per-step
progress list via the new Stage::Denoise.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a selectable TripoSG image-to-3D backend, with backend-aware contract, CLI/UI/MCP wiring, a new ONNX predictor and export toolchain, and updated documentation. ChangesTripoSG Backend Feature
Estimated code review effort: 4 (Complex) | ~75 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf1ffb5456
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| auto open = [&](const QString& p) { | ||
| const std::string s = p.toStdString(); | ||
| return Ort::Session(env, s.c_str(), so); |
There was a problem hiding this comment.
Use wide ONNX model paths on Windows
When this backend is built on Windows, ONNX Runtime's Ort::Session constructor expects an ORTCHAR_T* model path, which is wchar_t* on Windows; the existing TripoSR/background-removal open paths in this repo use toStdWString() under _WIN32 for that reason. This helper always passes std::string::c_str(), so Windows ONNX builds of the new TripoSG backend will fail to build or be unable to open the model files.
Useful? React with 👍 / 👎.
| QImage resized = image.convertToFormat(QImage::Format_RGB888) | ||
| .scaled(imgSize, imgSize, Qt::IgnoreAspectRatio, | ||
| Qt::SmoothTransformation); |
There was a problem hiding this comment.
Preserve aspect ratio before TripoSG encoding
For TripoSG calls where remove_bg is false or U²-Net falls back to the original image, non-square photos are stretched directly to the DINO input size here. The export notes for this same backend specify the upstream preprocessing as aspect-preserving resize/center-crop after foreground framing, so stretching portrait or landscape inputs changes the conditioning image geometry and can produce incorrect meshes; use an aspect-preserving crop/pad path before toNCHW.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
docs/TRIPOSG_EXPORT_NOTES.md (1)
25-49: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a language tag to the ASCII-diagram fenced block.
Static analysis flags this fenced block (MD040) for missing a language identifier.
📝 Proposed fix
-``` +```text input image🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/TRIPOSG_EXPORT_NOTES.md` around lines 25 - 49, Add a language identifier to the fenced ASCII diagram block in TRIPOSG_EXPORT_NOTES so it is no longer an unlabeled fence; update the existing diagram fence to use a text/plain-style tag (for example, the diagram block near the image_embeds and marching_cubes description) while keeping the content unchanged, so MD040 is satisfied.Source: Linters/SAST tools
scripts/export-triposg-onnx.py (1)
440-451: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winBroaden the large-model warning —
quantize_dynamiccan still fail on this 5.7 GB DiT because it runs shape inference/optimization internally and can hit the protobuf size ceiling even when the input uses external data. Useextra_options={"DisableShapeInference": True}orquant_pre_processahead of time; keep the ORT version note only for external-data loading.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/export-triposg-onnx.py` around lines 440 - 451, The int8 export fallback in the `quantize_dynamic` block under `if not args.no_quant:` is too narrow because it only mentions external-data loading, but `quantize_dynamic` can also fail during internal shape inference/optimization on large DiT models. Update the warning/logging around `quantize_dynamic` to account for this case by disabling shape inference via `extra_options={"DisableShapeInference": True}` or by pre-processing with `quant_pre_process` before quantization, and keep the `onnxruntime>=1.17` note only for the external-data model loading scenario.src/MCPServer.cpp (2)
2385-2417: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winSentry breadcrumb doesn't record the selected backend or
flow_steps.
CLIPipeline.cpp's equivalent breadcrumb was updated in this same PR to includebackend=%4, but this MCP breadcrumb still only logs the filename and resolution even though it sits right after backend/flow_steps parsing. Since backend selection changes which models are downloaded and which code path runs, including it here would make production traces easier to correlate with failures, consistent with the convention of using breadcrumbs for significant user/tool-call actions.📋 Proposed fix
SentryReporter::addBreadcrumb(QStringLiteral("ai.tool_call"), - QStringLiteral("generate_mesh_from_image %1 res=%2") - .arg(QFileInfo(imagePath).fileName()).arg(opts.sdfResolution)); + QStringLiteral("generate_mesh_from_image %1 res=%2 backend=%3") + .arg(QFileInfo(imagePath).fileName()).arg(opts.sdfResolution) + .arg(opts.backend == MeshGenPredictor::Backend::TripoSG + ? QStringLiteral("triposg") : QStringLiteral("triposr")));Based on learnings, "Emit
SentryReporter::addBreadcrumb(category, message)for all user-facing actions and significant operations, using the established categories such asui.action,ai.tool_call,file.import, andfile.export."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/MCPServer.cpp` around lines 2385 - 2417, The ai.tool_call breadcrumb in MCPServer::generate_mesh_from_image only logs the filename and resolution, so it misses the parsed backend and flow_steps values. Update the SentryReporter::addBreadcrumb call near the backend/model selection logic to include the selected backend and flow_steps alongside the existing image/resolution fields, matching the breadcrumb style used elsewhere in the PR such as CLIPipeline.cpp.Source: Learnings
7276-7277: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win
resolution/qualityschema descriptions become backend-specific and inaccurate now thatbackendexists.The pre-existing
resolutiondescription says "the encoder input is fixed at 512^2, so detail gains taper off above 512" andquality's description quotes TripoSR-specific sizes (~1.7GB/~430MB). Per the PR, TripoSG uses a different encoder (DINOv2-224) and a different model set, so neither claim holds whenbackend: "triposg"is selected. Since this is an MCP tool schema that may be consumed by an LLM agent to pick parameter values, a stale "fixed at 512^2" claim could lead an agent to choose a suboptimalresolutionfor TripoSG.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/MCPServer.cpp` around lines 7276 - 7277, Update the MCP tool schema descriptions for the existing resolution and quality fields in MCPServer so they are no longer globally TripoSR-specific now that backend is selectable. Adjust the text near the backend/flow_steps property definitions to either make resolution/quality descriptions backend-aware or remove the fixed 512^2 and TripoSR model-size claims, since they are inaccurate for triposg and can mislead an agent choosing values.qml/PropertiesPanel.qml (2)
1756-1775: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winModel-tier size hints are TripoSR-specific and become misleading once TripoSG is selectable.
mgQualityCombo's labels ("fp32 (best, ~1.7GB)", "int8 (smaller, ~430MB)") describe TripoSR's encoder weights and don't change when "TripoSG" is picked in the new Backend combo, so a user selecting TripoSG + fp32/int8 sees a download-size estimate for the wrong model set.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qml/PropertiesPanel.qml` around lines 1756 - 1775, The size hints in mgQualityCombo are hardcoded for TripoSR and become misleading when mgBackendCombo allows TripoSG selection. Update the quality labels and/or the logic that builds them so they reflect the currently selected backend from MeshGenController, using the existing mgBackendCombo and mgQualityCombo symbols to switch between TripoSR and TripoSG-specific model size estimates.
1756-1775: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winNo GUI control for
flow_stepsdespite CLI/MCP exposing it.
MeshGenController::generateSelected's options map (line 1852-1858) never includes"flow_steps", so the GUI always uses the hardcoded default of 25 (MeshGenController.cppline 225-226) regardless of backend, unlike--flow-steps(CLI) andflow_steps(MCP). Consider adding a numeric field near the Backend picker so GUI users can trade off speed vs. quality for TripoSG generation like CLI/MCP users can.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qml/PropertiesPanel.qml` around lines 1756 - 1775, The PropertiesPanel GUI is missing a control for the TripoSG flow_steps option, so it always falls back to the default value instead of matching CLI/MCP behavior. Add a numeric input near the existing mgBackendCombo Backend picker in PropertiesPanel.qml, and wire its value into MeshGenController::generateSelected so the options map includes "flow_steps" for TripoSG generation. Make sure the control is only relevant for the backend that uses flow_steps and respects the current MeshGenController busy state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@qml/PropertiesPanel.qml`:
- Around line 1825-1844: The progress-step list in PropertiesPanel.qml is adding
the "background" step for every run even though TripoSG never emits a matching
event. Update the step construction logic around the steps array so "Remove
background" is only inserted when the backend path can actually post a
"background" progress event (the non-SG flow used by
MeshGenController::generate()), keeping it aligned with the stages reported by
MeshGenPredictor and the active-step index.
In `@scripts/export-triposg-onnx.py`:
- Around line 460-489: The kv-cache split path in export_triposg-onnx.py only
logs a warning when the split-vs-upstream check fails, but it still leaves
already-written ONNX artifacts on disk and reports export success. Update the
logic around the VaeQueryWrapper export and the split_ok check so that a failed
split either raises/exits nonzero or quarantines/removes
triposg_vae_latents.onnx and triposg_vae_decoder.onnx, and make the final
“export complete” message conditional on split_ok or args.monolithic so a bad
split cannot be reported as successful.
In `@src/CLIPipeline.cpp`:
- Around line 9004-9027: Add generate3d CLI test coverage in the CLIPipeline
cmdgenerate3d coverage test to exercise the new argument parsing branches in
CLIPipeline’s generate3d path. Update the existing coverage test file to include
failure cases for --backend with an invalid value, --flow-steps with values
below and above the allowed range, and a successful --backend triposg
invocation, mirroring the existing --quality and --resolution checks. Use the
generate3d CLI entrypoint and the argument parsing logic around --backend and
--flow-steps as the target for the assertions.
In `@src/ImageTo3D/MeshGenController.cpp`:
- Around line 235-262: The worker path in MeshGenController::predict is calling
BackgroundRemover::ensureModelBlocking() again, which can trigger the nested
download/event-loop path on the worker thread via ModelDownloader::startDownload
and the shared QNetworkAccessManager. Update the TripoSG flow so the model is
resolved only on the main thread in MeshGenController before the worker starts,
then either pass the resolved background-removal path into
MeshGenPredictor::predict or skip the second ensureModelBlocking call and rely
on the fallback behavior.
---
Nitpick comments:
In `@docs/TRIPOSG_EXPORT_NOTES.md`:
- Around line 25-49: Add a language identifier to the fenced ASCII diagram block
in TRIPOSG_EXPORT_NOTES so it is no longer an unlabeled fence; update the
existing diagram fence to use a text/plain-style tag (for example, the diagram
block near the image_embeds and marching_cubes description) while keeping the
content unchanged, so MD040 is satisfied.
In `@qml/PropertiesPanel.qml`:
- Around line 1756-1775: The size hints in mgQualityCombo are hardcoded for
TripoSR and become misleading when mgBackendCombo allows TripoSG selection.
Update the quality labels and/or the logic that builds them so they reflect the
currently selected backend from MeshGenController, using the existing
mgBackendCombo and mgQualityCombo symbols to switch between TripoSR and
TripoSG-specific model size estimates.
- Around line 1756-1775: The PropertiesPanel GUI is missing a control for the
TripoSG flow_steps option, so it always falls back to the default value instead
of matching CLI/MCP behavior. Add a numeric input near the existing
mgBackendCombo Backend picker in PropertiesPanel.qml, and wire its value into
MeshGenController::generateSelected so the options map includes "flow_steps" for
TripoSG generation. Make sure the control is only relevant for the backend that
uses flow_steps and respects the current MeshGenController busy state.
In `@scripts/export-triposg-onnx.py`:
- Around line 440-451: The int8 export fallback in the `quantize_dynamic` block
under `if not args.no_quant:` is too narrow because it only mentions
external-data loading, but `quantize_dynamic` can also fail during internal
shape inference/optimization on large DiT models. Update the warning/logging
around `quantize_dynamic` to account for this case by disabling shape inference
via `extra_options={"DisableShapeInference": True}` or by pre-processing with
`quant_pre_process` before quantization, and keep the `onnxruntime>=1.17` note
only for the external-data model loading scenario.
In `@src/MCPServer.cpp`:
- Around line 2385-2417: The ai.tool_call breadcrumb in
MCPServer::generate_mesh_from_image only logs the filename and resolution, so it
misses the parsed backend and flow_steps values. Update the
SentryReporter::addBreadcrumb call near the backend/model selection logic to
include the selected backend and flow_steps alongside the existing
image/resolution fields, matching the breadcrumb style used elsewhere in the PR
such as CLIPipeline.cpp.
- Around line 7276-7277: Update the MCP tool schema descriptions for the
existing resolution and quality fields in MCPServer so they are no longer
globally TripoSR-specific now that backend is selectable. Adjust the text near
the backend/flow_steps property definitions to either make resolution/quality
descriptions backend-aware or remove the fixed 512^2 and TripoSR model-size
claims, since they are inaccurate for triposg and can mislead an agent choosing
values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9ce5d8bc-a76a-4b7c-9396-34e7e5955a35
📒 Files selected for processing (14)
.gitignoreCLAUDE.mddocs/TRIPOSG_EXPORT_NOTES.mdqml/PropertiesPanel.qmlscripts/export-triposg-onnx.pysrc/CLIPipeline.cppsrc/CMakeLists.txtsrc/ImageTo3D/MeshGenController.cppsrc/ImageTo3D/MeshGenPredictor.cppsrc/ImageTo3D/MeshGenPredictor.hsrc/ImageTo3D/TripoSGPredictor.cppsrc/ImageTo3D/TripoSGPredictor.hsrc/MCPServer.cpptests/CMakeLists.txt
| if (arg == "--backend") { | ||
| if (i + 1 >= argc) { | ||
| err() << "Error: --backend requires triposr or triposg." << Qt::endl; | ||
| return 2; | ||
| } | ||
| const QString b = QString::fromLocal8Bit(argv[++i]).toLower(); | ||
| if (b == "triposr") backend = MeshGenPredictor::Backend::TripoSR; | ||
| else if (b == "triposg") backend = MeshGenPredictor::Backend::TripoSG; | ||
| else { err() << "Error: --backend must be triposr or triposg." << Qt::endl; return 2; } | ||
| continue; | ||
| } | ||
| if (arg == "--flow-steps") { | ||
| if (i + 1 >= argc) { | ||
| err() << "Error: --flow-steps requires a value (e.g. 10, 25, 50)." << Qt::endl; | ||
| return 2; | ||
| } | ||
| bool okNum = false; | ||
| flowSteps = QString::fromLocal8Bit(argv[++i]).toInt(&okNum); | ||
| if (!okNum || flowSteps < 1 || flowSteps > 200) { | ||
| err() << "Error: --flow-steps must be an integer in [1..200]." << Qt::endl; | ||
| return 2; | ||
| } | ||
| continue; | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
fd -i cli_test.cpp -g '*CLIPipeline_test*'
rg -n -C2 '\-\-backend|\-\-flow-steps|triposg' src/CLIPipeline_test.cpp 2>/dev/nullRepository: fernandotonon/QtMeshEditor
Length of output: 270
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== test files =="
fd -a -I 'CLIPipeline_test.cpp' .
fd -a -I 'clpipeline_test.cpp' .
fd -a -I 'CLIPipeline*test*' src test tests . || true
echo "== option string search =="
rg -n --hidden --glob '!build' --glob '!dist' --glob '!node_modules' --glob '*test*' --glob '*Test*' --glob '*_test.cpp' --glob '*.cpp' --glob '*.h' --glob '*.hpp' --glob '*.ts' --glob '*.js' --glob '*.qml' --glob '*.txt' --glob '*.md' --glob '*.cmake' --glob '*.pro' --glob '*.pri' --glob '*.json' --glob '*.yaml' --glob '*.yml' --glob '*.sh' --glob '*.py' -- '--backend|--flow-steps|triposg|--quality|--resolution' .Repository: fernandotonon/QtMeshEditor
Length of output: 27124
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== outline: src/CLIPipeline_test.cpp =="
ast-grep outline src/CLIPipeline_test.cpp --view expanded | sed -n '1,220p'
echo "== outline: src/ImageTo3D/CLIPipeline_cmdgenerate3d_coverage_test.cpp =="
ast-grep outline src/ImageTo3D/CLIPipeline_cmdgenerate3d_coverage_test.cpp --view expanded | sed -n '1,220p'
echo "== generate3d-related tests in src/CLIPipeline_test.cpp =="
rg -n -C3 'generate3d|--backend|--flow-steps|--quality|--resolution' src/CLIPipeline_test.cpp
echo "== generate3d-related tests in src/ImageTo3D/CLIPipeline_cmdgenerate3d_coverage_test.cpp =="
rg -n -C3 'generate3d|--backend|--flow-steps|--quality|--resolution|triposg' src/ImageTo3D/CLIPipeline_cmdgenerate3d_coverage_test.cppRepository: fernandotonon/QtMeshEditor
Length of output: 3790
Add generate3d CLI coverage for backend/flow-step parsing src/ImageTo3D/CLIPipeline_cmdgenerate3d_coverage_test.cpp should cover --backend foo, --flow-steps 0, --flow-steps 500, and a happy-path --backend triposg case to match the existing --quality/--resolution coverage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/CLIPipeline.cpp` around lines 9004 - 9027, Add generate3d CLI test
coverage in the CLIPipeline cmdgenerate3d coverage test to exercise the new
argument parsing branches in CLIPipeline’s generate3d path. Update the existing
coverage test file to include failure cases for --backend with an invalid value,
--flow-steps with values below and above the allowed range, and a successful
--backend triposg invocation, mirroring the existing --quality and --resolution
checks. Use the generate3d CLI entrypoint and the argument parsing logic around
--backend and --flow-steps as the target for the assertions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reusing the TripoSR export venv broke both stacks: TripoSG needs transformers>=4.45 while TripoSR pins 4.35, and the unpinned install dragged huggingface_hub to 1.x past transformers' <1.0 gate. The driver now builds its own venv with a coherent pin set (diffusers 0.30.3 / transformers 4.46.3 / hub 0.26.5 / tokenizers 0.20.x). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
scripts/run-triposg-export.sh (2)
26-35: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winTorch left unpinned while every other dependency is pinned.
Lines 31-34 carefully pin
diffusers,transformers,huggingface_hub, andtokenizersto exact versions specifically to avoid dependency drift (per the commit message rationale), buttorchon line 27 has no version pin. Since this export won't be re-run for a while (models aren't hosted yet), a future torch release could subtly change op behavior and silently invalidate the--verifyround-trip checks documented inexport-triposg-onnx.py.♻️ Proposed fix
-"$VENV/bin/pip" install -q torch --index-url https://download.pytorch.org/whl/cpu +"$VENV/bin/pip" install -q "torch==2.4.1" --index-url https://download.pytorch.org/whl/cpu🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/run-triposg-export.sh` around lines 26 - 35, The dependency setup in run-triposg-export.sh leaves torch unpinned while the rest of the export stack is fixed, which can allow future behavior drift. Update the pip install step in the script to pin torch to a specific compatible version alongside diffusers, transformers, huggingface_hub, and tokenizers, using the existing install block near the torch and dependency installation commands.
37-40: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUnpinned upstream checkout undermines reproducibility.
git clone --depth 1pulls whatever is onmainat clone time, with no tag/commit pin. Upstream TripoSG's ownrequirements.txtalso leavesdiffusers/transformersunpinned, so the exporter's assumptions about model APIs (that this script's pinned stack is meant to satisfy) could silently break if upstream changes before this script is next run. Pinning to a known-good commit/tag would make the export reproducible and let the pinned Python deps in lines 30-35 actually guarantee a working combination.♻️ Proposed fix
if [ ! -d TripoSG ]; then echo "[clone] VAST-AI-Research/TripoSG" - git clone --depth 1 https://github.com/VAST-AI-Research/TripoSG + git clone https://github.com/VAST-AI-Research/TripoSG + git -C TripoSG checkout <known-good-commit-sha> fiBased on the upstream repo's
requirements.txtleavingdiffusers/transformersunpinned and having no dedicated release tags referenced in its README, floating onmainrisks silent drift for this offline, infrequently-run export.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/run-triposg-export.sh` around lines 37 - 40, The TripoSG checkout in the export script is floating on the upstream default branch, which makes the workflow non-reproducible; update the cloning logic in the TripoSG install block to use a known-good tag or commit SHA instead of an unpinned `git clone --depth 1`. Keep the existing `TripoSG` directory guard, but adjust the `git clone`/checkout steps so the script always lands on the same upstream revision and remains compatible with the pinned Python stack already set up earlier in the script.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@scripts/run-triposg-export.sh`:
- Around line 26-35: The dependency setup in run-triposg-export.sh leaves torch
unpinned while the rest of the export stack is fixed, which can allow future
behavior drift. Update the pip install step in the script to pin torch to a
specific compatible version alongside diffusers, transformers, huggingface_hub,
and tokenizers, using the existing install block near the torch and dependency
installation commands.
- Around line 37-40: The TripoSG checkout in the export script is floating on
the upstream default branch, which makes the workflow non-reproducible; update
the cloning logic in the TripoSG install block to use a known-good tag or commit
SHA instead of an unpinned `git clone --depth 1`. Keep the existing `TripoSG`
directory guard, but adjust the `git clone`/checkout steps so the script always
lands on the same upstream revision and remains compatible with the pinned
Python stack already set up earlier in the script.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9a6d9f9d-ab93-4f40-b4c2-9eb8c431b58d
📒 Files selected for processing (1)
scripts/run-triposg-export.sh
… deps triposg.inference_utils does 'from diso import DiffDMC' at module scope; diso is a CUDA-only differentiable-MC package that doesn't install on macOS/CPU and is unused by the export (we only touch the pipeline's models; the app has its own native marching cubes). Stub it in sys.modules — the exact torchmcubes trick the TripoSR exporter uses. Driver also installs the remaining package import-time deps (trimesh/scipy/scikit-image/omegaconf/typeguard/tqdm/pillow). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HF's xet CDN read-times-out on the multi-GB safetensors; disable xet, bump the read timeout, and fetch the snapshot in a resume-retry loop BEFORE the export so the export itself runs once against local weights (--weights). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cludes the .onnx.data sidecar) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GetInputTypeInfo() returns an OWNING TypeInfo; GetTensorTypeAndShapeInfo() is an unowned view into it. Chaining off the temporary dangled and GetShape() segfaulted inside OrtApis::GetDimensions (memmove from null) before any output. Bind the TypeInfo to a local in both shape probes (image size, latent shape). First live end-to-end TripoSG run passes: hosted-model download -> encode -> 4-step CFG flow loop -> kv-cache -> grid decode -> marching cubes -> polish -> glb (4.5k verts). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Holding all four graphs (encoder ~1.1GB + DiT 1.3-5.4GB + vae_latents ~769MB + decoder) alive for the whole predict() made the 25-step run's working set large enough for macOS to SIGTERM the process under memory pressure. Sessions now open one at a time and are destroyed the moment their stage completes, so peak RSS is the largest single stage (the DiT) instead of the sum. Conditioning + latent buffers are also freed once consumed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| // Deterministic gaussian init (same image + seed → same mesh). | ||
| latents.resize(latCount); | ||
| { | ||
| std::mt19937 rng(opts.seed); |
TripoSG's VAE decoder CROSS-ATTENDS every query point to the 2048 kv tokens, so per-Run activation memory is linear in P with a huge constant. Reusing TripoSR's 262144-point chunk (fine for its per-point MLP decoder) transiently materialised ~90 GB of attention logits at res 256 and macOS killed the process. predict() now hard-clamps the chunk to 8192 regardless of caller input; each decoder Run's transient stays in the hundreds of MB. Verified live: full-quality run holds ~1.1 GB RSS where the previous build was OOM-killed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…knob, quant recipe Live end-to-end findings (fp32 verified coherent; details in docs/TRIPOSG_EXPORT_NOTES.md 'Live-test findings'): - Skip the TripoSR -90X/+90Y frame bake for TripoSG results (Result::bakeTripoSROrientation) — TripoSG's field is already +Y-up and the bake laid the model on its back. - Match BitImageProcessor preprocessing: resize shortest edge to 8/7·crop then centre-crop 224 (was a squash-resize). - CLI: new --guidance knob (0 disables CFG) for the TripoSG backend. - export-triposg-onnx.py: quantize per_channel+reduce_range — the shipped per-tensor int8 DiT compounds error over the flow loop and degenerates to noise (fp32 with the identical loop is coherent); the hosted int8 file needs re-export + re-upload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… int8 caveat Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Negate the decoder field into our MarchingCubes' inside-positive convention: the exported graph's own negation lands OUTSIDE-positive in practice, so every face came out inside-out (live GUI report). The refine Newton step is sign-agnostic; this is the only site the sign matters. - Geometry-only results (TripoSG) now get a shared neutral lit clay material instead of the default flat-white one — surface relief actually shades in the viewport. - CoreML/MLProgram per-graph gating: the ~48 MB point decoder (called ~2000x/run) gets the GPU session; the 1.5-5.4 GB DiT stays on CPU — measured: MLProgram compile of the DiT burned minutes + ~8 GB RSS per generation for no net win (QTMESH_TRIPOSG_COREML_DIT=1 opts in). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…int8 DiT tier TripoSG is geometry-only (no colour decoder). Its bake stage now queries TripoSR's image-conditioned colour field on the same input image — TripoSR predicts colour for ANY 3D point (including occluded ones, consistently with the photo), so it serves as the colour oracle: mesh → TripoSR-native frame (inverse of the builder fix-up) → per-axis affine fit onto TripoSR's occupied bounds (coarse density probe) → per-texel colour queries through the existing xatlas baker. The full texture chain lights up for TripoSG (diffuse → PBR maps → optional 2× upscale); GUI checkboxes re-enabled. Best-effort: missing TripoSR models / any failure keeps the clay look with a warning. Verified: TripoSG mage + baked 1204px diffuse, colours land on the right body regions. int8 DiT tier DROPPED for TripoSG: even per-channel re-quant degrades the 25-step CFG flow loop to blocky blobs (user-verified in GUI), and dynamic-int8 MatMuls are no faster than fp32 on ARM. All surfaces force the fp32 DiT (CLI prints a note); the quality tier still selects the TripoSR tier used for the colour bake. Decode is also CPU-only by default now (QTMESH_TRIPOSG_COREML_DECODER=1 opts into the CoreML decoder): per-call kv-cache re-upload made GPU decode slower than CPU. Next speed win: hierarchical extraction (coarse grid → refine near surface), upstream's approach. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… tier picker The field-only colour bake left grey patches + dark seams on the back (the TripoSR and TripoSG reconstructions differ in scale, so TripoSG surface points landed outside TripoSR's occupied volume → background grey). Switch to render-and-project: - PRIMARY: project the actual input photo onto the front of the mesh. MeshGenBuilder orients TripoSG output so the camera-facing side is the image plane; screen (u,v) from the mesh AABB in its upright frame, a per-triangle depth buffer rejects occluded texels, and background (transparent) photo pixels are skipped. Pixel-accurate on everything the photo shows — no model-alignment error. - FALLBACK: occluded / back texels still use TripoSR's colour field (mapped + affine-fitted as before). QML: the Model tier picker is now backend-aware — selecting TripoSG snaps it to fp32 and collapses the list to 'fp32 (only option for TripoSG)', locked; TripoSR keeps fp32/int8. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The photo-projection depth test had nearest/farthest inverted — the turntable camera's frame-0 sits at +Z, so the FRONT-most surface has the LARGEST z, not the smallest. With the sign flipped, photo pixels sprayed onto back faces and the whole bake read as colour speckle. Depth buffer now inits to -inf, keeps the max z per cell, and rejects points behind the near-most surface. Result: the input photo projects onto the correct (front) side — coherent textured figure instead of salt-and-pepper. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the hard front/back depth cut with a depth-band crossfade: the photo weight ramps from 1 at the near-most surface to 0 zBand behind it, so the projected photo and the TripoSR field meet without a seam. Silhouette-edge / background photo pixels defer to the field. Front view is now a coherent textured figure with smooth transitions; remaining grey is genuinely-occluded back geometry (field-fallback scale limits), not a bake artifact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…re bake Adds an optional frontPhotoPath to generateMeshTextureMultiView: when set, view 0 is filled from the ACTUAL input image (fit to the front depth-render footprint) instead of an SD generation, and only the remaining views (back/sides) are SD-generated with depth-ControlNet. The existing MultiViewTextureBaker facing-weight blend then makes the photo win the front and the generations win the back, feathered at the seam. This is the Metal-safe substitute for img2img (which is disabled on Metal) for giving TripoSG's geometry-only meshes a photo-accurate front + plausible, shape-conditioned back. Prompt is optional when a photo is pinned (neutral fallback for the generated views). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iew bake - MeshGenController.completed now carries the built entity's name. - New Object-mode 'Generate texture (AI, front photo + generated back)' checkbox (shown when the build has Stable Diffusion; warns if no SD model is loaded). On completion it selects the fresh entity and runs the multi-view bake with the input photo pinned as the front view and the back/sides SD-generated (depth-ControlNet). Most useful for the geometry-only TripoSG backend. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-drop Adds a 'TripoSG post-integration updates' note to CLAUDE.md capturing: int8 tier dropped (fp32-only), colour via photo-front-projection + TripoSR-field back with clay fallback, the GUI multi-view AI-texture option (photo pinned front + SD-generated back, ENABLE_STABLE_DIFFUSION), orientation/field-sign, and the memory/GPU/guidance behaviour. Notes SF3D/Hunyuan3D license rejections and MV-Adapter as the tracked upgrade. Documents TripoSG's no-native-colour as a model limitation, as asked. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
…ress steps Move the 'Generate texture (AI)' checkbox above 'Upscale texture 2x' so the upscale reads as the final texture-chain step. Add two progress rows (AI texture: generate back view / project + bake) driven by the SD generation-progress and texture-generated signals, so the AI pass shows distinct steps instead of reusing the earlier bar. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, export gate, progress, breadcrumb, tests - TripoSGPredictor: open ONNX sessions with wide (wchar_t) paths on Windows (Ort::Session needs ORTCHAR_T*; narrow std::string failed to open) — matches the existing TripoSR/BackgroundRemover path. [codex] - Mark the deterministic latent-init mt19937 NOSONAR + comment: it's diffusion noise, not a security context; reproducibility requires a fixed non-crypto PRNG. Unblocks the SonarQube 'Security Rating on New Code' gate (cpp:S2245 false positive). [CodeQL] - export-triposg-onnx.py: on kv-split mismatch, DELETE the mismatched VAE graphs and sys.exit(1) (unless --monolithic) instead of warning and still printing 'export complete' — can no longer ship wrong VAE graphs. [coderabbit] - PropertiesPanel.qml: only add the 'background' progress step on the TripoSR path (TripoSG removes bg inside predict() with no discrete progress event, so the row never resolved). [coderabbit] - MCPServer: include backend in the generate_mesh_from_image breadcrumb (CLI already did). [coderabbit] - Add CLIPipeline_cmdgenerate3d_coverage_test.cpp covering the new --backend/--flow-steps/--guidance parsing + range validators. [coderabbit] Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rn >65k-vert meshes Two independent 65535-vertex bugs tore large meshes apart on import/export (reproduced with an 80228-vert TripoSR mesh; the same class of bug the user hit on TripoSG res-256 output): 1. IMPORT (root cause) — Assimp/MeshProcessor always created a 16-bit Ogre index buffer and wrote indices as unsigned short. For a submesh with >65535 verts every index past 0xFFFF wrapped, collapsing the mesh onto its first 65536 vertices. Now selects IT_32BIT and writes uint32 when vertices.size() > 65535 (matches the exporter's own use32 logic). Fixes ANY imported mesh over the limit, not just generated ones. 2. EXPORT (defense-in-depth) — Assimp's glTF2/OBJ/FBX exporters truncate the VERTEX accessors at 65536 while keeping the 32-bit index buffer, dangling every index >=65536. splitLargeMeshesForExport splits any >65535-vertex plain mesh into <=65535-vert chunks (per-face packing, local reindex) before Export(), fixing scene->mMeshes + node index arrays. Skinned/morph meshes are left intact (heavier remap — follow-up). Verified: an 80228-vert / 160456-tri mesh now round-trips at full count through glb/gltf/obj/fbx/mesh (was collapsing to 65536). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>




What
Adds TripoSG (VAST-AI-Research, SIGGRAPH 2025 — MIT code + MIT weights, verified) as a second image→3D backend, selectable next to TripoSR on every surface. TripoSG is a 1.5B rectified-flow DiT over an SDF VAE with reported geometry quality ≈ commercial Tripo 2.0 (Normal-FID 5.81 vs ~20 for TripoSR-class LRMs) — the "best geometry" tier from the
docs/IMAGE_TO_3D_QUALITY.mdroadmap.How it runs (no torch at runtime — 7th ONNX consumer)
Four exported graphs (
scripts/export-triposg-onnx.py, offline dev tool; measured contract indocs/TRIPOSG_EXPORT_NOTES.md, extracted from upstream source + HF configs):zeros_like).x += (σᵢ − σᵢ₊₁)·v— note this sign is the opposite of stock diffusers FlowMatchEuler (TripoSG's custom scheduler). CFG as two B=1 calls, guidance 7.0, user-tunable steps (default 25; 50 = reference, 10 = preview). fp32 ships as.onnx+.onnx.data(>2 GB);--quality int8maps to a single-file int8 DiT tier.TripoSG is geometry-only (no colour decoder): the texture bake/PBR/upscale stages stay TripoSR-only and their GUI checkboxes auto-disable; background removal composites over white per TripoSG's reference pipeline (vs TripoSR's gray-128). Deterministic seed → same image + params = same mesh.
Surfaces
qtmesh generate3d img.png --backend triposg --flow-steps 25 [-o out.glb]generate_mesh_from_image:backend+flow_stepsargs (+ schema)Stage::Denoise)Model hosting status
The graphs are not yet exported/hosted — every surface reports a clean "not hosted yet" error (verified:
--backend triposgfails gracefully with the models path + instructions). Runningscripts/export-triposg-onnx.pyoffline (torch + ~6 GB download) and uploading tofernandotonon/QtMeshEditor-models/triposg/lights the backend up with no code change. The export script self-verifies with an ORT round-trip (--verify), checks the kv-cache split against the monolithic VAE, and emits the int8 DiT variant by default.License audit
TripoSG code + weights MIT; DINOv2-large Apache-2.0 (redistributed inside the MIT HF repo — Meta AI credited).⚠️ Upstream demos use briaai/RMBG-1.4 which is NON-COMMERCIAL — quarantined; the runtime substitutes our existing Apache-2.0 U²-Net
BackgroundRemover.🤖 Generated with Claude Code
Summary by CodeRabbit