Ming_Flash_2.0_v0318_update by dongxingning · Pull Request #66 · inclusionAI/Ming

dongxingning · 2026-03-17T11:58:51Z

Add Ming-SDK, a tool that facilitates multimodal interaction with the Ming-Flash-2.0 model. For detailed usage instructions of Ming-SDK, please refer to the updated README.

gemini-code-assist · 2026-03-17T12:02:20Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the Ming SDK, a powerful Python API that streamlines multimodal interactions with the Ming-Flash-2.0 model. It unifies capabilities across text, speech, and image generation and understanding, offering both synchronous and asynchronous operations with streaming support. The changes also incorporate significant performance enhancements for image generation through optimized transformer architectures and a new HTTP service for scalable image processing, alongside comprehensive documentation and monitoring tools.

Highlights

Ming SDK Introduction: Introduced the Ming SDK, a new Python API for multimodal interaction with the Ming-Flash-2.0 model, designed for ease of integration and comprehensive capabilities.
Comprehensive Multimodal Features: The SDK provides a unified API supporting text generation, speech synthesis, image generation/editing, streaming output, flexible device configuration, and detailed usage statistics.
Updated Documentation: The README.md file was significantly expanded to include detailed installation instructions and usage examples for all new Ming SDK functionalities, including various generation and understanding tasks.
Image Generation Optimizations: Implemented a 'taylor_cache' mechanism in the image generation pipeline and transformer, along with a new xFuser-optimized transformer wrapper for enhanced performance and sequence parallelism.
New HTTP Service for Image Generation: Added an HTTP service for image generation using Ray and FastAPI, leveraging xFuser for distributed processing and supporting base64 encoded parameters.
Enhanced Image and Video Preprocessing: Updated image processing utilities to include PyTorch-based video batching, resizing, rescaling, and normalization functions, improving handling of multimodal inputs.
Modular SDK Components: New Python files were added to modularize the SDK into core components like Ming (orchestrator), MingImg (image), MingMOE (LLM), MingMOEAsync (async LLM), MingTalker (TTS), and MingUtils (utilities).
Request Metrics Monitoring: Introduced a request metrics monitoring system for text, speech, and image services, logging performance indicators like TTFT, TPOT, and end-to-end latency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

README.md
- Added extensive documentation for the new Ming SDK, covering installation, features, and usage examples for various multimodal tasks.
bailingmm_utils_video.py
- Modified video frame sampling logic in 'v1_smart_nframes' and updated 'v1_fetch_video' to use 'partial' for 'v1_sample_video'.
configuration_bailing_moe_v2.py
- Added 'use_qk_norm' parameter to the 'BailingMoeConfig' initialization.
configuration_bailingmm.py
- Added a new configuration file for 'BailingMMConfig', integrating LLM, audio tokenizer, and other model configurations.
diffusion/pipeline_z_image.py
- Updated comments, added 'taylor_cache' attribute, and modified the 'call' method to conditionally use 'step' for the transformer and added a 'set_taylor_cache' method.
diffusion/transformer_z_image.py
- Updated comments, added 'step' parameter to the 'forward' method, and implemented a Taylor series approximation logic for feature prediction based on 'step'.
diffusion/transformer_z_image_xfuser.py
- Added a new file defining 'xFuserZSingleStreamAttnProcessor' and 'xFuserZImageTransformer2DWrapper' for xFuser-optimized Z-Image transformer with sequence parallelism.
image_gen_dit_server.py
- Added a new file implementing an HTTP service for image generation using Ray, FastAPI, and xFuser, including base64 serialization for parameters.
image_processing_bailingmm2.py
- Updated imports to include 'torchvision.transforms' and 'PIL.Image', added 'make_batched_videos_torch', 'resize_torchvision', 'rescale_torchvision', 'normalize_torchvision' functions, and introduced '_preprocess_torch' for PyTorch-based image/video preprocessing.
ming_sdk/init.py
- Added new imports for 'Ming', 'MingImg', 'MingMOE', 'MingTalker', and 'MingMOEAsync' to expose the SDK components.
ming_sdk/ming.py
- Added a new file defining the core 'Ming' class, which orchestrates text, speech, and image generation, including prompt rewriting, aspect ratio extraction, and request interruption.
ming_sdk/ming_img.py
- Added a new file defining 'MingImg' for image generation and editing, including utilities for auto-balancing saturation/exposure, Gaussian filtering, mask generation, and prompt rewriting.
ming_sdk/ming_moe.py
- Added a new file defining 'MingMOE' for synchronous LLM interaction using vLLM, supporting sampling parameters and request interruption.
ming_sdk/ming_moe_async.py
- Added a new file defining 'MingMOEAsync' for asynchronous LLM interaction using vLLM, with a dedicated event loop and streaming support.
ming_sdk/ming_talker.py
- Added a new file defining 'MingTalker' for text-to-speech generation with multi-GPU process pooling, streaming, and request cancellation.
ming_sdk/ming_test.py
- Added a new file with comprehensive test examples for the Ming SDK, covering various generation and understanding tasks.
ming_sdk/ming_utils.py
- Added a new file with utility functions for the Ming SDK, including multimedia processing, prompt building, tokenization, file downloading, and caching mechanisms.
ming_sdk/monitoring/README.md
- Added a new README file detailing the request metrics monitoring system for text, speech, and image services.
ming_sdk/monitoring/request_metrics.py
- Added a new file implementing request metrics logging for tracking performance and status of different services.
ming_sdk/requirements.txt
- Added a new file listing Python dependencies for the Ming SDK.
ming_sdk/setup.py
- Added a new setup file for packaging the Ming SDK.
modeling_bailing_moe.py
- Added a new file defining the 'BailingMoe' model architecture, including attention mechanisms (FlashAttention2, SDPA), MoE layers, and rotary embeddings.
test_infer_imagegen_taylor.py
- Added a new test script for image generation with Taylor cache.
tokenization_bailing.py
- Modified 'apply_chat_template' to pass additional 'kwargs'.
tokenizer_config.json
- Updated the 'chat_template' to support multimodal inputs and system prompts more flexibly.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the Ming-SDK, a comprehensive toolkit for interacting with the Ming-Flash-2.0 model, supporting multimodal functionalities like text, speech, and image generation. The changes include extensive documentation in the README, the full SDK source code under the ming_sdk/ directory, and modifications to existing model and utility files to support new features like Taylor cache acceleration. My review has identified a critical security vulnerability in the new image generation server due to the use of pickle for deserialization. Additionally, I've found several medium to high severity issues, including bugs in documentation examples, potential regressions in utility functions, and maintainability concerns like hardcoded values and commented-out code. Addressing these points will significantly improve the quality, security, and usability of the new SDK.

gemini-code-assist · 2026-03-17T12:05:36Z

+    #     with open(filename, "a", encoding="utf-8") as f:
+    #         f.write(f"{gen_param_b64}\n")
+
+    gen_param = base64_to_dict(gen_param_b64)


Using pickle.loads to deserialize data received from a network request is a critical security vulnerability that can lead to Remote Code Execution (RCE). An attacker could craft a malicious payload that executes arbitrary code on the server when unpickled. Please use a safe serialization format like JSON for data exchange over the network.

gemini-code-assist · 2026-03-17T12:05:36Z

+    max_frames = max(
+        1,
+        floor_by_factor(
+            ele.get("max_frames", min(FPS_MAX_FRAMES, total_frames)), FRAME_FACTOR
+        ),
+    )
    if "nframes" in ele:
        nframes = min(total_frames, round_by_factor(ele["nframes"], FRAME_FACTOR), max_frames)
    else:
        fps = ele.get("max_video_fps", FPS)
-        nframes = total_frames / video_fps * fps
+        nframes = max(1, total_frames / video_fps * fps)
        if nframes > total_frames:
            logger.warning(f"smart_nframes: nframes[{nframes}] > total_frames[{total_frames}]")
-        nframes = min(min(max(nframes, min_frames), max_frames), total_frames)
-        nframes = floor_by_factor(nframes, FRAME_FACTOR)
-    if not (FRAME_FACTOR <= nframes <= total_frames):
-        raise ValueError(f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")
-    return nframes
+        nframes = min(min(nframes, max_frames), total_frames)
+    return int(nframes)


The refactoring of v1_smart_nframes has removed the validation that ensured the number of frames (nframes) is at least FRAME_FACTOR (which is 2). The new logic only guarantees nframes >= 1. This could introduce a regression if downstream code expects at least 2 frames. Additionally, the ValueError that was previously raised for out-of-bounds nframes has been removed, which might hide potential configuration issues from the user.

gemini-code-assist · 2026-03-17T12:05:36Z

+                    if ref_hidden_states_input is not None:
+                        ref_hidden_states_input = ref_hidden_states_input.to(latent_model_input.dtype)
                else:
                    latent_model_input = latents.to(self.transformer.dtype)
                    prompt_embeds_model_input = prompt_embeds
                    timestep_model_input = timestep
                    ref_hidden_states_input = ref_hidden_states*1.0 if ref_hidden_states is not None else None
+                    if ref_hidden_states_input is not None:
+                        ref_hidden_states_input = ref_hidden_states_input.to(latent_model_input.dtype)


The ref_hidden_states_input tensor is cast to latent_model_input.dtype in two separate places. To avoid code duplication and improve readability, you could perform this type casting once after the if/else block where ref_hidden_states_input is defined.

else: latent_model_input = latents.to(self.transformer.dtype) prompt_embeds_model_input = prompt_embeds timestep_model_input = timestep ref_hidden_states_input = ref_hidden_states*1.0 if ref_hidden_states is not None else None if ref_hidden_states_input is not None: ref_hidden_states_input = ref_hidden_states_input.to(latent_model_input.dtype)

gemini-code-assist · 2026-03-17T12:05:37Z

+# Streaming speech QA with interruption
+all_wavs = []
+all_text = ""
+request_id = ""
+output_audio_path = "test_stream.wav"
+for data_type, data_content in ming.generate_stream(
+    text="介绍一下杭州", output_type="speech", max_new_tokens=128
+):
+    if data_type == "text_data":
+        text, usage = data_content
+    elif data_type == "text_audio_data":
+        tts_speech, text, meta_info, session_id, usage  = data_content
+        all_text += text
+        all_wavs.append(tts_speech)
+    if len(all_text) > 20:
+        ming.generate_interrupt(request_id)
+waveform = torch.cat(all_wavs, dim=-1)
+sr = 44100
+torchaudio.save(output_audio_path, waveform, sr)
+print(f"request_id:{request_id},audio:{output_audio_path},text={all_text}")
+assert os.path.exists(output_audio_path)
+
+```


The example for streaming speech QA with interruption has a bug. The request_id variable is initialized as an empty string and is never updated within the loop. However, it's passed to ming.generate_interrupt(request_id). This will not work as intended.

To fix this, you should pass a unique msg_request_id to the generate_stream call and use that same ID for the interruption, as shown in the 'Request Interruption' example later in the README. The session_id returned by the stream should also be used correctly if it's intended for this purpose.

gemini-code-assist · 2026-03-17T12:05:37Z

+                self.saved_features = []
+
+            #[0, 1, 2, 3, 4, 5, 7, 9, 11, 14, 17, 20, 22, 24, 26, 27, 28, 29]:
+            if step in [0, 1, 2, 3, 4, 5, 7, 9, 12, 15, 18, 21, 23, 25, 26, 27, 28, 29]:


The list of steps [0, 1, 2, 3, 4, 5, 7, 9, 12, 15, 18, 21, 23, 25, 26, 27, 28, 29] is hardcoded. This makes the Taylor cache logic inflexible and difficult to modify. Consider moving this list to a configurable parameter or a well-named constant at the top of the file to improve maintainability.

gemini-code-assist · 2026-03-17T12:05:37Z

+                    except Exception as e:
+                        logger.error(f"Error in text producer: {e}")
+                        talker_input_queue.put(None, None)


Catching a generic Exception is too broad and can mask underlying issues, making debugging more difficult. It's better to catch more specific exceptions that you expect might occur in the producer thread. If a generic catch is necessary, consider logging the full traceback and re-raising the exception to avoid silently swallowing errors.

gemini-code-assist · 2026-03-17T12:05:37Z

+    def logging_tasks(self, request_id, future):
+        time.sleep(0.01)
+        running_keys = []
+        pendding_keys = []


The variable pendding_keys has a typo. It should be pending_keys.

Suggested change

pendding_keys = []

pending_keys = []

gemini-code-assist · 2026-03-17T12:05:37Z

+from setuptools import setup, find_packages
+
+__version__ = "1.0.0"  #
+requirement = open("ming_sdk/requirements.txt").readlines()


The requirement variable is defined but never used. This seems to be leftover code and should be removed.

gemini-code-assist · 2026-03-17T12:05:37Z

+    package_data={"ming_sdk": fetch_installed_data("")},
+    description="Ming Multimodal sdk",
+    keywords="Ming Multimodal sdk",
+    packages=["ming_sdk"],


The packages argument is set to ["ming_sdk"], which is correct. However, using find_packages() from setuptools is a more robust and conventional way to automatically discover all packages within the project. This avoids manual updates if you add sub-packages in the future.

Suggested change

packages=["ming_sdk"],

packages=find_packages(),

gemini-code-assist · 2026-03-17T12:05:37Z

                return_dict=return_dict,
                return_assistant_tokens_mask=return_assistant_tokens_mask,
                tokenizer_kwargs=tokenizer_kwargs,
+                **kwargs,


While adding **kwargs provides flexibility, it can also hide which specific parameters are being used by the underlying apply_chat_template call. It would be more explicit and maintainable to list the expected keyword arguments (like enable_thinking from the new template) in the function signature with default values.

v0317_update

22f8315

dongxingning requested review from LandyGuo and LonghuaTAN March 17, 2026 11:58

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ming_Flash_2.0_v0318_update#66

Ming_Flash_2.0_v0318_update#66
dongxingning wants to merge 1 commit into
mainfrom
Ming_Flash_2.0_update_v0318

dongxingning commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dongxingning commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant