Release MLLM-V2 V2.0.0 Release · UbiquitousLearning/mllm

New Features

Pythonic eager execution – Rapid model development
Unified hardware support – Arm CPU, OpenCL GPU, QNN NPU
Advanced optimizations – Quantization, pruning, speculative execution
NPU-ready IR – Seamless integration with NPU frameworks
Deployment toolkit – SDK + CLI inference tool
mllm JIT Kernel

News

[2026 Feb 03] 🔥🔥🔥 MLLM Qnn AOT Support for Full Graph Execution on NPU! Quick Start, Technical Report
[2025 Nov 27] Android Demo Update: Enabled stable Qwen3 and DeepSeek-OCR streaming on Android via a novel In-App Go Server Architecture.
[2025 Nov 23] MLLM v2 released!

What's Changed

Develop qnn zh by @liang1232018 in #42
Develop qnn zh by @liang1232018 in #43
Develop qnn zh by @liang1232018 in #44
fix: qnn rope file name by @liang1232018 in #46
Develop qnn zh by @liang1232018 in #47
Develop qnn zh by @liang1232018 in #48
chore: qnn arm build config by @liang1232018 in #49
Develop qnn zh by @liang1232018 in #50
Develop qnn zh by @liang1232018 in #51
Develop qnn zh by @liang1232018 in #52
fix: qnn linear quantize tensor duplicate by @liang1232018 in #53
Feat: Add new FrontEnd and model demos. by @yirongjie in #68
feat: Add OPT Tokenizer. by @lx200916 in #66
Feat: Optimize the operation process by @yirongjie in #69
Fix: Tensor:: mm(): reference not passed in as input by @yirongjie in #70
Feat: Fill in input Tensor by @yirongjie in #72
Single precision inference support for the gemma-2B model by @chenghuaWang in #75
Update README.md by @yirongjie in #76
Support for the QWen1.5-0.5B model by @chenghuaWang in #79
feat: mistral v0.2 7B support by @chenghuaWang in #83
Update requirements.txt by @lx200916 in #87
doc: Update README.md by @xumengwei in #89
feat: Add Multi-Head Latent Attention(MLA) support. by @yirongjie in #90
feat: add sparse inference like powerinfer by @XieWeikai in #86
feat: Yi-1.5-6B support by @chenghuaWang in #88
feat: Inference speed(tokens/s) profiling by @yirongjie in #91
feat: Add new demo: demo_imagebind_1mod by @yirongjie in #92
feat: Stablelm 2 1.6b support by @emt0re0 in #94
doc: Update README.md by @yirongjie in #95
feat: add elastic llama by @yirongjie in #98
feat:Add OPT support by @yirongjie in #99
feat: add Qwen 1.8B demo by @yirongjie in #100
perf: Use vector<shared_ptr<Tensor>> Tensor::graphs by @yirongjie in #101
perf: add AArch64 GEMM/GEMV for q4_0. by @yirongjie in #104
feat: add DEBUGSAVETENSOR & DEBUGOPTIME by @yirongjie in #106
feat: topk/topp sampling by @chenghuaWang in #105
fix: Qwen v1.5 Tokenizer bug by @chenghuaWang in #107
feat: add clear_kvcache && fix: BUG in quantize. by @yirongjie in #108
feat: GEMV + Bias mixed precision support for ARM Devices by @chenghuaWang in #109
feat: llamafile_sgemm bias support by @chenghuaWang in #111
chore: Disable OpenMP for Mac. by @lx200916 in #110
feat: Preliminary implementation on Qualcomm NPU (QNN) backend. by @liang1232018 in #112
doc: Update README.md by @xumengwei in #113
refactor: Layer::run & Tensor::getStaticFunc by @yirongjie in #120
feat: add Phi-3-mini model by @WhiteNight123 in #119
refactor: Tensor::run &Layer::getFunc: Tensor& -> Tensor by @yirongjie in #121
perf: CPU Function: +-*/ by @yirongjie in #122
fix: +-*/ for old front end by @yirongjie in #129
refactor: Tensor::run &Layer::getFunc by @yirongjie in #130
fix 修复windows环境 by @WhiteNight123 in #127
feat: add MiniCPM 2B demo by @yirongjie in #132
refactor:: remove Layer Class Split, replace it with Tensor::split by @yirongjie in #136
fix: python bindings, clang-tidy, set line width to 100 by @chenghuaWang in #142
fix: Memory Alignment Error by @chenghuaWang in #143
fix: calculate bugs, cmakelist and clang-tidy by @yirongjie in #144
fix: bug fix for windows compilation by @chenghuaWang in #145
fix: windows compile bug by @chenghuaWang in #147
feat: cross compile arm on windows(x86) by @chenghuaWang in #148
fix: Memory Alloc bug in CPU Backend by @chenghuaWang in #149
Fix: QNN Cmakelists Config by @oreomaker in #150
Xnnpack backend support by @chenghuaWang in #152
Fixed typos. by @hustc12 in #155
fix: SmolLM name by @chenghuaWang in #157
feat: Support QWen2.5-1.5B, OpenELM-1,1B, DCLM-1B by @yirongjie in #160
feat: add_profilling_activation by @chunfenri in #154
fix: CMakeLists.txt in example by @yirongjie in #161
refactor: add TransformerConfig by @yirongjie in #162
fix: mv Tensor::graph to Module.activation_tensors; by @yirongjie in #164
feat: add PhoneLM by @yirongjie in #165
QNN Module API(new frontend) Preliminary Support by @oreomaker in #158
fix: repe_theta is set wrong by @yirongjie in #169
feat:QNN New Frontend End to End Inference by @oreomaker in #170
feat: Add modeling bert support by @XieWeikai in #166
fix: commen used in uni by @yirongjie in #171
fix: BerTokenizer::tokenizes by @yirongjie in #172
Add Bert for JNI. by @lx200916 in #173
Xnnpack backend support by @chenghuaWang in #159
feat: Boost xnnpack backend inference speed by freeze tensor weight. by @chenghuaWang in #174
fix: CPUTensorFunction.hpp by @UbiquitousLearning in #176
feat: drop xnn wrapper and move xnnwrapper to new front-end by @chenghuaWang in #177
feat: QNN New Frontend Phonelm Support and Refactors by @oreomaker in #179
fix: smollm tokenizer regex pattern by @chenghuaWang in #180
refactor: change tokenize method parameter from std::string& to const by @lx200916 in #181
fix： NPU affect CPU by @yirongjie in #182
1. refactor: add MLLM_LOG for Android support. by @lx200916 in #184
fix: remove unused "fmt" files by @yirongjie in #185
feat: PhoneLM Instruct Android Demo. by @lx200916 in #188
Support PhoneLM decoding configuration by @liang1232018 in #190
feat: QNN Multi Chunk Execution in New Frontend by @oreomaker in #191
scripts: update build scripts by @yirongjie in #192
feat: support PhonLM-1.5B-Call demo in Android Demo by @yirongjie in #193
fix: Android LFS. by @lx200916 in #194
feat: add phi3v model by @k0zhevnikov in #186
feat: add Qwen1.5 1.8B Chat Android Demo. by @yirongjie in #195
perf: i8 * i8 -> fp32 GEMM Boost by @chenghuaWang in #187
files: Move src/backend/cpu/CPUXXX to src/backend/cpu/op/CPUXXX. by @yirongjie in #197
fix: Fix the IROPE bug caused by #197 by @yirongjie in #198
feat: Android App Add Profile. by @lx200916 in #199
Update README.md by @xumengwei in #200
feat add dealloc for activation_tensors for only CPU Backend. by @yirongjie in #201
fix: BUG in Tensor::checkDim; by @yirongjie in #202
fix: free to Tensor in Matmul files. by @yirongjie in #210
feat: Add MiniCPM MoE 8x2B. by @liang1232018 in #217
doc: Update README.md by @yirongjie in #218
feat: QNN New Frontend Pipeline by @oreomaker in #219
feat: Support Gemma 2 by @oreomaker in #223
feat: Add Llama3.2 by @XieWeikai in #221
fix: in libHelper qnn setSeqLength->setCurSequenceLength by @oreomaker in #227
Fix QNN Op and LibHelper bugs by @oreomaker in #229
Update README.md by @xumengwei in #231
Chore: remove redundant QNN SDK include files in project, improve qnn building by @oreomaker in #233
feat: MiniCPM 3 4B by @yirongjie in #235
feat: Add Qwen2-VL-2B by @yirongjie in #240
fix: qwen25 setup issues in lib helper by @chenghuaWang in #237
feat: deepseek distilled qwen2 1.5B by @chenghuaWang in #242
refactor: QNN Refactor by @oreomaker in #250
fix: qnn profile quant bugs by @oreomaker in #256
doc: Update README.md by @xumengwei in #257
feat: set Eager Execution for CPU Backend; by @yirongjie in #287
feat: add qwen3 by @HanoFleet in #280
feat: Add retrieval-based speculative decoding support to Qwen 1.5 for CPU backend by @csAugust in #254
feat: Power Counter by @chenghuaWang in #293
refactor: A journey of a thousand miles begins with a single step. by @chenghuaWang in #302
feat(compile): add symbolic expression parser and evaluator by @chenghuaWang in #303
examples(algorithms): add fancy_algorithm case by @chenghuaWang in #304
feat(utils): implement argparse and logging utilities by @chenghuaWang in #305
feat(core): Device Types, Data Types, Tensor Memory Types. by @chenghuaWang in #306
feat(utils): improve Dbg macro and add Windows support by @chenghuaWang in #307
feat(core): Tensor by @chenghuaWang in #308
feat(core, x86, compile): X86 Backend Allocator, Compiler IR README. by @chenghuaWang in #310
chore: add & rewrite yaml task schema by @oreomaker in #311
feat(core): ParameterFile & Linaer Op and Layers & Module by @chenghuaWang in #312
test(core): Test of ParamFile<CPU, V1> by @chenghuaWang in #313
fix(core): ModleFileV1 bugs by @chenghuaWang in #314
feat(tools): add mllm-params-inspector tool by @chenghuaWang in #315
feat: init QNN runtime and utils by @oreomaker in #317
build(deps): add stdexec dependency for MLLM project by @chenghuaWang in #318
feat(x86): add X86 backend support for Linear operation by @chenghuaWang in #320
feat(core): implement basic task dispatching and memory management by @chenghuaWang in #321
refactor(backends): rename x86 backend to cpu backend by @chenghuaWang in #322
fix(cpu): adjust memory alignment for different CPU architectures by @chenghuaWang in #323
refactor(cpu): remove x86-specific memory operations and simplify all… by @oreomaker in #324
feat(core): implement FillOp and optimize CPU allocation by @chenghuaWang in #325
feat(compile): implement core IR node classes and utilities by @chenghuaWang in #326
feat(compile): add graph IR and update related components by @chenghuaWang in #327
feat(compile): add pass manager and pass infrastructure by @chenghuaWang in #328
fix(IR): tensor.register build error. by @chenghuaWang in #329
feat(compile): add IR trace functionality by @chenghuaWang in #330
feat(x86 backend): implement arange and random fill operations for x8… by @chenghuaWang in #331
feat(preprocessor): add Image class for visual data preprocessing by @chenghuaWang in #332
feat(preprocessor): add tokenizers for multilingual support by @chenghuaWang in #333
feat(compile): implement program IR op and fragment op by @chenghuaWang in #334
feat(compile): implement program lowering pipeline by @chenghuaWang in #335
feat(tool) per-tools and per-viewer. by @chenghuaWang in #337
feat(cpu arm backend): add apple silicon support by @chenghuaWang in #338
feat(cpu): add ARM backend fill kernel support by @chenghuaWang in #339
docs(README): add features, usage examples, and installation instruct… by @chenghuaWang in #340
feat(cpu): add support for element-wise operations by @chenghuaWang in #342
refactor(arm): remove unused code and add CPU architecture checks by @chenghuaWang in #343
feat(cpu arm backend): add kleidiai submodule for CPU backend by @chenghuaWang in #344
feat(cpu): add support for transpose and permute operations by @chenghuaWang in #345
feat(core): add new ops and remove D2H/H2D ops by @chenghuaWang in #346
feat(core): implement tensor operations and add functional API by @chenghuaWang in #347
feat(cpu): add layer normalization and optimize bit-packing by @chenghuaWang in #348
feat(core): all close, tests kernels by @chenghuaWang in #349
feat(arm cpu): move ggml things from v1 to v2 by @chenghuaWang in #350
feat(core, lmcache): Dynamic Cache. by @chenghuaWang in #351
feat(engine): add support for async execution of Modules by @chenghuaWang in #352
feat(async): add support for multiple concurrent tasks and improve lo… by @chenghuaWang in #353
feat(backends): add OpenCL backend support by @chenghuaWang in #354
feat(auto_tune): implement auto tuning functionality for CPU operations by @chenghuaWang in #356
feat(cpu): add flash attention 2 implementation for CPU by @chenghuaWang in #357
feat(quantizer): add quantization tool for MLLM parameters by @chenghuaWang in #358
feat(pymllm): implement Python bindings for MLLM core and engine by @chenghuaWang in #359
refactor(pymllm): update API and add new classes by @chenghuaWang in #360
feat(cpu): implement FlashAttention2 kernel for CPU by @chenghuaWang in #361
feat(cpu): add flash attention2 operator support by @chenghuaWang in #362
refactor(cpu): optimize FlashAttention2 implementation and update doc… by @chenghuaWang in #363
feat(cpu): add BLAS support and ARM optimization by @chenghuaWang in #364
feat(arm): add HPC SGEMV kernel support for MLLM by @chenghuaWang in #365
feat(scripts): add CUDA core dump setup script by @chenghuaWang in #366
refactor(cpu): replace hpc_sgemm with mllm_blas_sgemm by @oreomaker in #367
feat(core): add support for SliceOp and implement StaticCache by @chenghuaWang in #368
feat(vision): add VisionRoPE (Rotary Position Embedding) support by @chenghuaWang in #369
docs: update Arm kernel support and add MLLM BLAS operations by @chenghuaWang in #370
feat(nn): implement Conv3D layer and improve Module by @chenghuaWang in #371
feat(backend/cpu): add new operations and update documentation by @chenghuaWang in #372
feat(qwen2vl): implement KV cache and reshape op by @chenghuaWang in #373
feat(models): add Qwen2VL tokenizer support by @chenghuaWang in #374
feat(core): add support for BLAS and optimize linear operations by @chenghuaWang in #375
feat(qwen2vl): add text generation and optimize conv3d operation by @chenghuaWang in #376
feat(cpu/arm): Add LlamaFile SGEMM kernel and corresponding unit tests by @oreomaker in #377
fix(cpu/arm): some llama file bugs. by @chenghuaWang in #378
feat(arm): add KLEIDIAI support for ARM backend by @chenghuaWang in #379
feat(qwen2_vl): add model support and quantization for Qwen-2VL by @chenghuaWang in #380
refactor(qwen2vl): integrate VisionRoPE operations into model layers by @chenghuaWang in #381
feat(perf): integrate Perfetto for performance tracing by @chenghuaWang in #382
cpu stft op basic version by @oreomaker in #383
fix(NDK): compile error. Add mllm_blas_sgemm by @chenghuaWang in #384
test(cpu): add GELU kernel test and update related files by @chenghuaWang in #385
fix(cpu): fix GELUOp dtype support and memory management by @chenghuaWang in #386
feat(unsafe macros, qa): add Unsafe Macros and add FAQ section on pre… by @chenghuaWang in #387
feat(audio): implement Bluestein's algorithm for non-power-of-2 FFT (… by @oreomaker in #389
perf(qwen2_vl): update quantization config and improve ARGeneration by @chenghuaWang in #390
feat(core): add support for complex indexing in Tensor by @chenghuaWang in #391
V2 by @oreomaker in #392
feat(core): add signal handling and context management by @chenghuaWang in #393
feat(sdk): enable C SDK binding and refactor CLI build process by @chenghuaWang in #394
feat(qwen2_5vl): add model support and update related files by @chenghuaWang in #396
V2 by @oreomaker in #397
fix(mllm): update OpenMP settings and optimize CPU operations by @chenghuaWang in #398
test(cpu): add transpose and permute operation tests by @chenghuaWang in #399
V2 by @oreomaker in #400
feat(bugs, log, reduce): reduce ops bugs by @chenghuaWang in #401
feat(algorithms): add lazy_vlm algorithm and update related files by @chenghuaWang in #402
feat(examples): add tracers for Qwen-2 models by @chenghuaWang in #403
refactor(compiler): remove debug information and refactor source by @chenghuaWang in #404
docs(conf): make Doxygen optional for documentation build by @chenghuaWang in #406
ci(docs): add GitHub Actions workflow for docs deployment by @chenghuaWang in #407
feat(qwen2vl_tracer): add canonicalization pass to model by @chenghuaWang in #409
docs(README): update documentation and add new features by @chenghuaWang in #410
test(docs): add static html talks slides. by @chenghuaWang in #411
V2 by @oreomaker in #412
ci(build): add GitHub Actions workflow for macOS Apple Silicon build by @chenghuaWang in #414
feat(compile): implement IR serialization and refactor IR persistence by @chenghuaWang in #415
feat(compile): Add KernelSymbolOp and ValueSymbolOp to the IR by @chenghuaWang in #416
feat(compile): add IR serialization and interpretation functionality by @chenghuaWang in #417
feat(compile, ir interpreter): add compilation, ir interpreter and lowering of LLM model by @chenghuaWang in #419
feat(lazy_vlm, nn::sequential): add lazy VLM support for Qwen-2.5VL. Add 🍬🍬🍬 nn::sequential by @chenghuaWang in #420
feat(python): implement C++ backend for layers and add ParameterFile support in python🐍 by @chenghuaWang in #421
feat(pymllm): add support for accessing Linear layer weights and biases by @chenghuaWang in #422
feat(lazy_vlm): implement pruning for Qwen-2.5VL decoder layers by @chenghuaWang in #423
feat(lazy_vlm): implement normal prefill and decode logic by @chenghuaWang in #424
docs(contribute): update roadmap with new contributions and initiatives by @chenghuaWang in #425
feat(Vocos): add decode method and modify constructor by @oreomaker in #426
refactor(lazy_vlm): optimize KV cache management and update process by @chenghuaWang in #427
perf(lazy_vlm): optimize model performance and debugging by @chenghuaWang in #428
docs(roadmap): update roadmap with new features and improvements by @chenghuaWang in #429
feat(models): add ARGeneration chat functionality by @chenghuaWang in #430
feat(arm): add NT-T matmul kernel and optimize lazy VLM performance by @chenghuaWang in #431
V2 by @oreomaker in #432
refactor(nn): improve module naming and registration by @chenghuaWang in #433
feat(models): add LLaMA model support by @oreomaker in #434
feat(ffi): Roadmap FFI by @chenghuaWang in #435
feat(cpu): add mllm_blas_linear kernel for ARM CPUs by @chenghuaWang in #436
feat(cpu): u1-u7 bitspack packing by @chenghuaWang in #437
refactor(mllm_blas): implement I8GEMM using bitspack by @chenghuaWang in #438
V2 by @oreomaker in #439
feat(core): add PagedAttnOp implementation by @chenghuaWang in #440
fix(lazy_vlm): update HKVCacheFast and Qwen2.5VL model implementations by @chenghuaWang in #441
feat(build): Move FFI and auto tune to experiments. Add mllm-arm-pmu tool by @chenghuaWang in #442
build(mllm): conditionally compile FFI extension by @chenghuaWang in #443
feat(ffi): start building FFI extension for MLLM by @chenghuaWang in #444
build(pymllm): update CMakeLists and packaging configuration by @chenghuaWang in #446
feat(models): add basic ChatTTS support for MiniCPM-O2-6 model by @oreomaker in #445
fix(bugs): In PR445 by @chenghuaWang in #447
feat(mllm/ffi/Extension.cc, pymllm): Add tensor operations and bindings by @chenghuaWang in #451
Some sugar(not too sweet) by @oreomaker in #450
feat(qwen3): add Qwen3 model support and example by @chenghuaWang in #453
feat(xxHash): integrate xxHash library for hashing functionality by @chenghuaWang in #457
fix(mllm-cli): correct return value check in isOk function by @chenghuaWang in #458
feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel by @chenghuaWang in #459
feat(service): implement interactive chat loop in qwen3 service by @chenghuaWang in #462
QNN Graph Building Core Features by @oreomaker in #463
fix(cpu): add missing task type kExecuteModule in CPUDispatcher by @chenghuaWang in #464
feat(service): implement OpenAI-compatible chat completion API by @chenghuaWang in #466
ci(macOS): update nightly workflow to use python-build and improve ve… by @chenghuaWang in #467
ci(workflows): simplify workflow trigger syntax by @chenghuaWang in #468
feat(workflow): trigger macOS nightly build on PR and push to v2 branch by @chenghuaWang in #469
ci(pymllm-macos-nightly): restrict publish job to merged pull requests by @chenghuaWang in #470
ci(pymllm-macos-nightly): simplify workflow trigger conditions by @chenghuaWang in #471
build(pymllm): update dependencies and version for nightly build by @chenghuaWang in #472
refactor(convertor): lazy import torch and numpy based on availability by @chenghuaWang in #473
build(workflows): compute and bump to next beta version in CI by @chenghuaWang in #474
build(workflows): simplify nightly version reading in macOS workflow by @chenghuaWang in #475
build(workflows): update macOS nightly workflow sed command by @chenghuaWang in #476
fix(cpu): remove hardcoded paths and unused arm_neon include by @chenghuaWang in #477
feat(cuda): add CUDA backend initialization and device info test by @chenghuaWang in #478
feat(plugin): implement plugin system for dynamic op loading by @chenghuaWang in #479
feat(scripts): add MLIR installation script by @chenghuaWang in #480
feat(cli): add mllm-llm-benchmark tool for performance testing by @chenghuaWang in #481
feat(qwen3): add config and quantization files for 0.6B model by @chenghuaWang in #482
feat(cpu): add inplace rmsnorm implementations for fp32 and fp16 by @chenghuaWang in #483
feat(cpu-kernels): add SIMD-based vector operations for flash attention by @chenghuaWang in #484
feat(qnn): Basic QNN Prefill on v2 by @oreomaker in #485
feat: Add benchmark for Qwen3 and update readme about benchmark by @jialilve in #487
doc(Qnn): Qnn Documents by @oreomaker in #488
feat(deepseek-ocr): deepseek-ocr support(On working) by @chenghuaWang in #486
feat(deepseek-ocr): update inference image and add perf by @chenghuaWang in #494
feat：Add support of MiniCPM-o-2_6 model by @KKkai0315 in #495
refactor(minicpm_o): reformat code and remove unused includes by @oreomaker in #496
feat(engine): add Tracy profiler support and CPU memory-disk async I/O by @chenghuaWang in #497
feat(tracy): integrate Tracy performance profiling support by @chenghuaWang in #498
refactor(minicpm_o2_6): optimize tensor operations and memory access by @oreomaker in #499
fix ggml op by @oreomaker in #502
feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM by @chenghuaWang in #503
feat: add GGUF quantization support by @KKkai0315 in #501
feat: Add mllm-cli support for Qwen3 and update docs by @yuerqiqi in #493
feat(mllm-chat): add mllm-chat submodule by @chenghuaWang in #504
fix minicpm image preprocessing by @oreomaker in #505
Add SmolLM3-3B model support by @nuozhihan in #509
fix: QNN Execute Return Order - handle output reordering by @jialilve in #510
feat(compile-stack, cpu): MLIR, flash attention's swa support. by @chenghuaWang in #512
fix(cpu ops extension): flash attention swa with sink bugs for GPT-OSS model. by @chenghuaWang in #514
feat(cpu): implement RadixAttnSwaSink with sliding window attention support by @chenghuaWang in #516
feat(examples): add paged attention hybrid example with sliding window support by @chenghuaWang in #517
feat(cpu): add partial_dim support in RoPE operation by @chenghuaWang in #518
feat(build): install mllm-ext-opset headers and libraries by @chenghuaWang in #519
feat(mllm-ext-opset): add radix attention relax implementation for flexible tensor dims by @chenghuaWang in #520
fix(radix-attn): correct shape indexing for K and V tensors in forward method by @chenghuaWang in #521
fix: resolve type mismatch in Smollm3Attention KV cache update by @Shimmer22 in #523
feat(radix-attn): implement pattern-based forwarding for Radix Attent… by @chenghuaWang in #524
Minicpm o2.6 Basic Support by @oreomaker in #530
feat(thread-pool): implement HpcThreadPool for efficient CPU task management and update build configurations by @chenghuaWang in #531
refactor(hpc-thread-pool): remove unused NUMA affinity functions and related includes by @chenghuaWang in #532
feat(cpu-backend): add support for SME2 and SVE2 in ARM backend configurations by @chenghuaWang in #533
refactor: update ARM backend compile options and disable SME2 support for OSX by default by @chenghuaWang in #536
feat: Implement Qwen NPU Decoding Support with Memory Management Fixes by @jialilve in #537
QNN Op Package Migrate to v2 by @oreomaker in #539
feat: add DeepSeek-OCR support, C++ API updates, and dual-model loadi… by @yuerqiqi in #534
test: fix CausalMaskOp CPU coverage by @jialilve in #538
update docs by @oreomaker in #541
feat(build): update threading options for Apple GCD support in build configurations by @chenghuaWang in #540
fix(docs): update links for Qwen2 and Qwen2.5 models in README by @chenghuaWang in #542
feat(docs): add mllm-params-inspector tool usage instructions to README by @chenghuaWang in #543
docs(readme): add OrangePi AI Pro and Studio build status by @chenghuaWang in #544
feat(docs): enhance README with MLLM's role and workflow diagrams by @chenghuaWang in #547
fix(assets): update mllm_role image to reflect recent changes by @chenghuaWang in #548
feat(MiniCPM4, MiniCPM-o, AvgPool1dOp): Add support for MiniCPM4 model, MiniCPM-o's audio modality inference capability, and AvgPool1dOp by @KKkai0315 in #526
fix(minicpmo): fix minicpmo tokenization logic & streaming generation by @KKkai0315 in #549
feat(cpu): add support for new attention ops and improve parallel scheduling by @chenghuaWang in #553
Enhance README with Android Demo & Architecture details by @yuerqiqi in #554
Merge pull request #526 from KKkai0315/v2 minicpm v4 & minicpm o audi… by @oreomaker in #551
feat: qwen2 cpu model and connection with npu prefill by @Sp0tless in #555
chore: update submodule to main by @yuerqiqi in #557
OpenCL Backend Init by @oreomaker in #558
feat(qnn): add Qualcomm QNN AOT support on x86 platforms by @chenghuaWang in #562
feat(qwen3, cpu): add support for Qwen3 model on x86 architecture by @HayzelHan in #561
feat(qnn): add QcomTargetMachine and related enums for AOT environment by @chenghuaWang in #563
feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode. by @chenghuaWang in #567
feat(Qnn AOT): Refactor code structure for improved readability and maintainability by @chenghuaWang in #568
feat(ascend): initial Ascend backend and add elementwise add op by @lywbarca in #564
feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline by @chenghuaWang in #569
feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization by @chenghuaWang in #572
feat: add LLM2QnnLoweringPass and update graph splitting logic by @chenghuaWang in #577
fix: Qualcomm QNN AOT Pass by @chenghuaWang in #579
fix(qualcomm): Qnn AOT refactor by @oreomaker in #578
feat(qualcomm): Qnn AOT Lowering pass by @oreomaker in #580
fix(qualcomm): use unsigned int for qualcomm model quantization by @chenghuaWang in #581
fix(qualcomm): QNNParamScalarWrapper type error. by @chenghuaWang in #582
fix: reshape tensor weight to 2d by @chenghuaWang in #583
feat(qualcomm): Qnn AOT Lowering Passes by @oreomaker in #584
feat(qualcomm): AOTPipeline update by @chenghuaWang in #585
feat: add readme_zh by @PiaoAdmin in #588
fix(compile): using ssa ViewOp and SliceOp in MLLM-IR by @chenghuaWang in #589
fix: update README-ZH by @PiaoAdmin in #592
feat(qualcomm): PTQPass add constant ptq impl. by @chenghuaWang in #593
feat: PTQPass will modify the tensor ir constant! by @chenghuaWang in #594
fix: LPBQ return shape fellow qnn spec by @chenghuaWang in #595
feat(qualcomm): Qnn AOT Lowering passes by @oreomaker in #596
fix(Qualcomm): Replace linear op with conv2d in Qualcomm backend by @chenghuaWang in #600
fix(qualcomm): LM Head Merge pass by @chenghuaWang in #601
feat: update mllm-cli and android build tasks by @yuerqiqi in #605
fix(pymllm): qnn_aot_env.py针对x86的改进 by @Lucyliu1234 in #604
feat(qualcomm): Qnn aot runner by @oreomaker in #603
feat(qualcomm): Qnn AOT Runtime by @oreomaker in #606
fix(qualcomm): Enhance quantization modules. by @chenghuaWang in #607
feat(qnn): Enhance QNNBackend initialization with improved logging and error handling; update default log level to verbose. Add QEmbedding class for quantized embedding operations in PyTorch. Introduce build tasks for Android and x86 QNN AOT SDKs. by @chenghuaWang in #609
feat(qwen3): Add configuration files and enhance Qwen3 model with layer indexing and quantization improvements. by @chenghuaWang in #611
feat(qnn): Update quantization handling in LLM passes and improve logging in QNN runtime. by @chenghuaWang in #613
refactor(qwen3): Simplify input handling in Qwen3 model by @chenghuaWang in #614
feat(qwen3): Introduce Single Head Attention (SHA) optimization for Qualcomm qwen model by @chenghuaWang in #616
feat(core): Introduce kBool data type for Qnn ElewiseEqual Op by @chenghuaWang in #618
feat(Ascend): Add some new Ascend Ops by @lywbarca in #621
mileston(qnn): Qnn AOT by @chenghuaWang in #624
doc(qualcomm): Qnn aot by @oreomaker in #625
docs(qnn_backend): update AOT execution flow documentation by @oreomaker in #628
docs(qnn_backend): enhance AOT execution documentation with installation by @chenghuaWang in #630
docs(Qnn): correct symbolic link path in AOT execution documentation by @chenghuaWang in #631
docs: update latest news section in README files for qnn aot by @chenghuaWang in #633
[Ascend] Implement Concat and Slice operators by @yuerqiqi in #629
feat(mllm_kernel): add initial implementation of mllm-kernel with CPU and JIT utilities by @chenghuaWang in #634

New Contributors

@liang1232018 made their first contribution in #42
@xumengwei made their first contribution in #89
@XieWeikai made their first contribution in #86
@emt0re0 made their first contribution in #94
@WhiteNight123 made their first contribution in #119
@hustc12 made their first contribution in #155
@chunfenri made their first contribution in #154
@k0zhevnikov made their first contribution in #186
@HanoFleet made their first contribution in #280
@csAugust made their first contribution in #254
@jialilve made their first contribution in #487
@KKkai0315 made their first contribution in #495
@yuerqiqi made their first contribution in #493
@nuozhihan made their first contribution in #509
@Shimmer22 made their first contribution in #523
@Sp0tless made their first contribution in #555
@HayzelHan made their first contribution in #561
@lywbarca made their first contribution in #564
@PiaoAdmin made their first contribution in #588
@Lucyliu1234 made their first contribution in #604

Full Changelog: 1.0.0...2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLLM-V2 V2.0.0 Release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

New Features

News

What's Changed

New Contributors

Contributors

Uh oh!