Skip to content

Ming_Flash_2.0_v0318_update#66

Open
dongxingning wants to merge 1 commit intomainfrom
Ming_Flash_2.0_update_v0318
Open

Ming_Flash_2.0_v0318_update#66
dongxingning wants to merge 1 commit intomainfrom
Ming_Flash_2.0_update_v0318

Conversation

@dongxingning
Copy link
Collaborator

Add Ming-SDK, a tool that facilitates multimodal interaction with the Ming-Flash-2.0 model. For detailed usage instructions of Ming-SDK, please refer to the updated README.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the Ming SDK, a powerful Python API that streamlines multimodal interactions with the Ming-Flash-2.0 model. It unifies capabilities across text, speech, and image generation and understanding, offering both synchronous and asynchronous operations with streaming support. The changes also incorporate significant performance enhancements for image generation through optimized transformer architectures and a new HTTP service for scalable image processing, alongside comprehensive documentation and monitoring tools.

Highlights

  • Ming SDK Introduction: Introduced the Ming SDK, a new Python API for multimodal interaction with the Ming-Flash-2.0 model, designed for ease of integration and comprehensive capabilities.
  • Comprehensive Multimodal Features: The SDK provides a unified API supporting text generation, speech synthesis, image generation/editing, streaming output, flexible device configuration, and detailed usage statistics.
  • Updated Documentation: The README.md file was significantly expanded to include detailed installation instructions and usage examples for all new Ming SDK functionalities, including various generation and understanding tasks.
  • Image Generation Optimizations: Implemented a 'taylor_cache' mechanism in the image generation pipeline and transformer, along with a new xFuser-optimized transformer wrapper for enhanced performance and sequence parallelism.
  • New HTTP Service for Image Generation: Added an HTTP service for image generation using Ray and FastAPI, leveraging xFuser for distributed processing and supporting base64 encoded parameters.
  • Enhanced Image and Video Preprocessing: Updated image processing utilities to include PyTorch-based video batching, resizing, rescaling, and normalization functions, improving handling of multimodal inputs.
  • Modular SDK Components: New Python files were added to modularize the SDK into core components like Ming (orchestrator), MingImg (image), MingMOE (LLM), MingMOEAsync (async LLM), MingTalker (TTS), and MingUtils (utilities).
  • Request Metrics Monitoring: Introduced a request metrics monitoring system for text, speech, and image services, logging performance indicators like TTFT, TPOT, and end-to-end latency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • README.md
    • Added extensive documentation for the new Ming SDK, covering installation, features, and usage examples for various multimodal tasks.
  • bailingmm_utils_video.py
    • Modified video frame sampling logic in 'v1_smart_nframes' and updated 'v1_fetch_video' to use 'partial' for 'v1_sample_video'.
  • configuration_bailing_moe_v2.py
    • Added 'use_qk_norm' parameter to the 'BailingMoeConfig' initialization.
  • configuration_bailingmm.py
    • Added a new configuration file for 'BailingMMConfig', integrating LLM, audio tokenizer, and other model configurations.
  • diffusion/pipeline_z_image.py
    • Updated comments, added 'taylor_cache' attribute, and modified the 'call' method to conditionally use 'step' for the transformer and added a 'set_taylor_cache' method.
  • diffusion/transformer_z_image.py
    • Updated comments, added 'step' parameter to the 'forward' method, and implemented a Taylor series approximation logic for feature prediction based on 'step'.
  • diffusion/transformer_z_image_xfuser.py
    • Added a new file defining 'xFuserZSingleStreamAttnProcessor' and 'xFuserZImageTransformer2DWrapper' for xFuser-optimized Z-Image transformer with sequence parallelism.
  • image_gen_dit_server.py
    • Added a new file implementing an HTTP service for image generation using Ray, FastAPI, and xFuser, including base64 serialization for parameters.
  • image_processing_bailingmm2.py
    • Updated imports to include 'torchvision.transforms' and 'PIL.Image', added 'make_batched_videos_torch', 'resize_torchvision', 'rescale_torchvision', 'normalize_torchvision' functions, and introduced '_preprocess_torch' for PyTorch-based image/video preprocessing.
  • ming_sdk/init.py
    • Added new imports for 'Ming', 'MingImg', 'MingMOE', 'MingTalker', and 'MingMOEAsync' to expose the SDK components.
  • ming_sdk/ming.py
    • Added a new file defining the core 'Ming' class, which orchestrates text, speech, and image generation, including prompt rewriting, aspect ratio extraction, and request interruption.
  • ming_sdk/ming_img.py
    • Added a new file defining 'MingImg' for image generation and editing, including utilities for auto-balancing saturation/exposure, Gaussian filtering, mask generation, and prompt rewriting.
  • ming_sdk/ming_moe.py
    • Added a new file defining 'MingMOE' for synchronous LLM interaction using vLLM, supporting sampling parameters and request interruption.
  • ming_sdk/ming_moe_async.py
    • Added a new file defining 'MingMOEAsync' for asynchronous LLM interaction using vLLM, with a dedicated event loop and streaming support.
  • ming_sdk/ming_talker.py
    • Added a new file defining 'MingTalker' for text-to-speech generation with multi-GPU process pooling, streaming, and request cancellation.
  • ming_sdk/ming_test.py
    • Added a new file with comprehensive test examples for the Ming SDK, covering various generation and understanding tasks.
  • ming_sdk/ming_utils.py
    • Added a new file with utility functions for the Ming SDK, including multimedia processing, prompt building, tokenization, file downloading, and caching mechanisms.
  • ming_sdk/monitoring/README.md
    • Added a new README file detailing the request metrics monitoring system for text, speech, and image services.
  • ming_sdk/monitoring/request_metrics.py
    • Added a new file implementing request metrics logging for tracking performance and status of different services.
  • ming_sdk/requirements.txt
    • Added a new file listing Python dependencies for the Ming SDK.
  • ming_sdk/setup.py
    • Added a new setup file for packaging the Ming SDK.
  • modeling_bailing_moe.py
    • Added a new file defining the 'BailingMoe' model architecture, including attention mechanisms (FlashAttention2, SDPA), MoE layers, and rotary embeddings.
  • test_infer_imagegen_taylor.py
    • Added a new test script for image generation with Taylor cache.
  • tokenization_bailing.py
    • Modified 'apply_chat_template' to pass additional 'kwargs'.
  • tokenizer_config.json
    • Updated the 'chat_template' to support multimodal inputs and system prompts more flexibly.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Ming-SDK, a comprehensive toolkit for interacting with the Ming-Flash-2.0 model, supporting multimodal functionalities like text, speech, and image generation. The changes include extensive documentation in the README, the full SDK source code under the ming_sdk/ directory, and modifications to existing model and utility files to support new features like Taylor cache acceleration. My review has identified a critical security vulnerability in the new image generation server due to the use of pickle for deserialization. Additionally, I've found several medium to high severity issues, including bugs in documentation examples, potential regressions in utility functions, and maintainability concerns like hardcoded values and commented-out code. Addressing these points will significantly improve the quality, security, and usability of the new SDK.

# with open(filename, "a", encoding="utf-8") as f:
# f.write(f"{gen_param_b64}\n")

gen_param = base64_to_dict(gen_param_b64)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using pickle.loads to deserialize data received from a network request is a critical security vulnerability that can lead to Remote Code Execution (RCE). An attacker could craft a malicious payload that executes arbitrary code on the server when unpickled. Please use a safe serialization format like JSON for data exchange over the network.

Comment on lines +289 to +303
max_frames = max(
1,
floor_by_factor(
ele.get("max_frames", min(FPS_MAX_FRAMES, total_frames)), FRAME_FACTOR
),
)
if "nframes" in ele:
nframes = min(total_frames, round_by_factor(ele["nframes"], FRAME_FACTOR), max_frames)
else:
fps = ele.get("max_video_fps", FPS)
nframes = total_frames / video_fps * fps
nframes = max(1, total_frames / video_fps * fps)
if nframes > total_frames:
logger.warning(f"smart_nframes: nframes[{nframes}] > total_frames[{total_frames}]")
nframes = min(min(max(nframes, min_frames), max_frames), total_frames)
nframes = floor_by_factor(nframes, FRAME_FACTOR)
if not (FRAME_FACTOR <= nframes <= total_frames):
raise ValueError(f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")
return nframes
nframes = min(min(nframes, max_frames), total_frames)
return int(nframes)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The refactoring of v1_smart_nframes has removed the validation that ensured the number of frames (nframes) is at least FRAME_FACTOR (which is 2). The new logic only guarantees nframes >= 1. This could introduce a regression if downstream code expects at least 2 frames. Additionally, the ValueError that was previously raised for out-of-bounds nframes has been removed, which might hide potential configuration issues from the user.

Comment on lines +558 to +566
if ref_hidden_states_input is not None:
ref_hidden_states_input = ref_hidden_states_input.to(latent_model_input.dtype)
else:
latent_model_input = latents.to(self.transformer.dtype)
prompt_embeds_model_input = prompt_embeds
timestep_model_input = timestep
ref_hidden_states_input = ref_hidden_states*1.0 if ref_hidden_states is not None else None
if ref_hidden_states_input is not None:
ref_hidden_states_input = ref_hidden_states_input.to(latent_model_input.dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ref_hidden_states_input tensor is cast to latent_model_input.dtype in two separate places. To avoid code duplication and improve readability, you could perform this type casting once after the if/else block where ref_hidden_states_input is defined.

                else:
                    latent_model_input = latents.to(self.transformer.dtype)
                    prompt_embeds_model_input = prompt_embeds
                    timestep_model_input = timestep
                    ref_hidden_states_input = ref_hidden_states*1.0 if ref_hidden_states is not None else None

                if ref_hidden_states_input is not None:
                    ref_hidden_states_input = ref_hidden_states_input.to(latent_model_input.dtype)

Comment on lines +350 to +372
# Streaming speech QA with interruption
all_wavs = []
all_text = ""
request_id = ""
output_audio_path = "test_stream.wav"
for data_type, data_content in ming.generate_stream(
text="介绍一下杭州", output_type="speech", max_new_tokens=128
):
if data_type == "text_data":
text, usage = data_content
elif data_type == "text_audio_data":
tts_speech, text, meta_info, session_id, usage = data_content
all_text += text
all_wavs.append(tts_speech)
if len(all_text) > 20:
ming.generate_interrupt(request_id)
waveform = torch.cat(all_wavs, dim=-1)
sr = 44100
torchaudio.save(output_audio_path, waveform, sr)
print(f"request_id:{request_id},audio:{output_audio_path},text={all_text}")
assert os.path.exists(output_audio_path)

```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example for streaming speech QA with interruption has a bug. The request_id variable is initialized as an empty string and is never updated within the loop. However, it's passed to ming.generate_interrupt(request_id). This will not work as intended.

To fix this, you should pass a unique msg_request_id to the generate_stream call and use that same ID for the interruption, as shown in the 'Request Interruption' example later in the README. The session_id returned by the stream should also be used correctly if it's intended for this purpose.

self.saved_features = []

#[0, 1, 2, 3, 4, 5, 7, 9, 11, 14, 17, 20, 22, 24, 26, 27, 28, 29]:
if step in [0, 1, 2, 3, 4, 5, 7, 9, 12, 15, 18, 21, 23, 25, 26, 27, 28, 29]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of steps [0, 1, 2, 3, 4, 5, 7, 9, 12, 15, 18, 21, 23, 25, 26, 27, 28, 29] is hardcoded. This makes the Taylor cache logic inflexible and difficult to modify. Consider moving this list to a configurable parameter or a well-named constant at the top of the file to improve maintainability.

Comment on lines +606 to +608
except Exception as e:
logger.error(f"Error in text producer: {e}")
talker_input_queue.put(None, None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a generic Exception is too broad and can mask underlying issues, making debugging more difficult. It's better to catch more specific exceptions that you expect might occur in the producer thread. If a generic catch is necessary, consider logging the full traceback and re-raising the exception to avoid silently swallowing errors.

def logging_tasks(self, request_id, future):
time.sleep(0.01)
running_keys = []
pendding_keys = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable pendding_keys has a typo. It should be pending_keys.

Suggested change
pendding_keys = []
pending_keys = []

from setuptools import setup, find_packages

__version__ = "1.0.0" #
requirement = open("ming_sdk/requirements.txt").readlines()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The requirement variable is defined but never used. This seems to be leftover code and should be removed.

package_data={"ming_sdk": fetch_installed_data("")},
description="Ming Multimodal sdk",
keywords="Ming Multimodal sdk",
packages=["ming_sdk"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The packages argument is set to ["ming_sdk"], which is correct. However, using find_packages() from setuptools is a more robust and conventional way to automatically discover all packages within the project. This avoids manual updates if you add sub-packages in the future.

Suggested change
packages=["ming_sdk"],
packages=find_packages(),

return_dict=return_dict,
return_assistant_tokens_mask=return_assistant_tokens_mask,
tokenizer_kwargs=tokenizer_kwargs,
**kwargs,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding **kwargs provides flexibility, it can also hide which specific parameters are being used by the underlying apply_chat_template call. It would be more explicit and maintainable to list the expected keyword arguments (like enable_thinking from the new template) in the function signature with default values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant