Skip to content

openai api add AudioURL and support chat_template.json#1225

Open
WANDY666 wants to merge 1 commit intomainfrom
multimodal_openai_api
Open

openai api add AudioURL and support chat_template.json#1225
WANDY666 wants to merge 1 commit intomainfrom
multimodal_openai_api

Conversation

@WANDY666
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the LightLLM server's multimodal capabilities by integrating audio input processing into its OpenAI API endpoint. It also refines the tokenizer setup process, making it more robust by allowing chat templates to be loaded from a default configuration file within the model's directory. These changes collectively enable the server to handle a broader range of multimodal interactions and simplify model deployment.

Highlights

  • Audio Input Support: Introduced support for audio inputs in the OpenAI API compatible chat completions, allowing users to send audio URLs or base64 encoded audio.
  • Flexible Chat Template Loading: Enhanced the tokenizer initialization to automatically load chat templates from a chat_template.json file located in the model directory, providing more flexibility for model configuration.
  • Tokenizer Attribute Handling: Implemented a __setattr__ method in MultimodalTokenizer to correctly propagate chat_template and init_kwargs attributes to the underlying tokenizer instance.
  • Multimodal Parameter Expansion: Updated the MultimodalParams structure to include audio data, ensuring proper handling and forwarding of multimodal requests that contain audio.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lightllm/common/basemodel/multimodal_tokenizer.py
    • Added __setattr__ method to intercept and set chat_template and init_kwargs on the internal tokenizer.
  • lightllm/server/api_models.py
    • Introduced AudioURL BaseModel for audio input URLs.
    • Added audio_url field to MessageContent to support audio messages.
  • lightllm/server/api_openai.py
    • Initialized multimodal_params_dict to include an audios list.
    • Implemented logic to parse audio_url content from chat messages, supporting HTTP URLs and base64 data.
  • lightllm/server/build_prompt.py
    • Modified init_tokenizer to attempt loading chat_template.json from the model directory if no explicit chat template path is provided.
  • lightllm/server/multimodal_params.py
    • Added to_origin_dict method to MultimodalItem for request forwarding.
    • Updated MultimodalParams.to_origin_dict to include audio items.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for audio inputs via AudioURL in the OpenAI API compatibility layer and adds a fallback mechanism to load chat_template.json from the model directory. A critical security concern has been identified: a high-severity Server-Side Request Forgery (SSRF) vulnerability exists in the handling of audio_url, as the server fetches user-provided URLs without proper validation. Addressing this vulnerability is paramount. Additionally, I have provided specific suggestions to enhance the robustness and clarity of the audio processing error messages and the exception handling for loading the chat template.

Comment on lines +183 to +186
elif content.type == "audio_url" and content.audio_url is not None:
audio = content.audio_url.url
if audio.startswith("http://") or audio.startswith("https://"):
multimodal_params_dict["audios"].append({"type": "url", "data": audio})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The application introduces a new feature to handle audio_url in chat completions. When an audio_url starting with http:// or https:// is provided, the server subsequently fetches the resource using the fetch_resource utility function without any validation of the target host. This allows an attacker to perform Server-Side Request Forgery (SSRF) attacks, potentially accessing internal network resources such as cloud metadata services (e.g., http://169.254.169.254/latest/meta-data/) or internal APIs.

To remediate this, implement strict validation for the audio_url. This should include maintaining an allow-list of trusted domains and ensuring that the resolved IP address is not a private or reserved IP address.

else:
raise ValueError("Unrecognized audio input.")
else:
raise ValueError("Unrecognized audio input. Supports local path, http url, base64.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message "Unrecognized audio input. Supports local path, http url, base64." is misleading because the current implementation does not handle local file paths for audio inputs. It only supports URLs and base64-encoded data. To avoid confusion, the error message should be updated to reflect the actual supported formats.

Suggested change
raise ValueError("Unrecognized audio input. Supports local path, http url, base64.")
raise ValueError("Unrecognized audio input. Supports http url and base64.")

Comment on lines +30 to +31
except Exception as e:
logger.warning(f"Failed to load chat_template from {default_chat_template_path}: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's generally better to catch more specific exceptions rather than a broad Exception. This prevents catching unexpected errors like KeyboardInterrupt or SystemExit. For loading and parsing the chat template, catching IOError and json.JSONDecodeError would be more appropriate.

Suggested change
except Exception as e:
logger.warning(f"Failed to load chat_template from {default_chat_template_path}: {e}")
except (IOError, json.JSONDecodeError) as e:
logger.warning(f"Failed to load chat_template from {default_chat_template_path}: {e}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant