Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the multimodal processing architecture by introducing the capability to run the Vision Transformer (ViT) as a remote, decoupled service. This change enables greater flexibility in deploying and scaling multimodal models by separating the visual inference workload. Complementing this, a Redis-backed caching system is integrated to efficiently manage image embeddings across the distributed environment. Additionally, attention mechanisms are optimized through the adoption of a more concise Flash Attention function, and multimodal parameter handling is refined to support the new distributed and caching paradigms. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant changes to support a separated Vision Transformer (ViT) service, including remote ViT inference, a Redis-backed embedding cache, and new server run modes. The changes are extensive and touch many parts of the server infrastructure. While the overall architecture seems well-thought-out, I've identified several critical issues, including a bug that will cause a ValueError due to incorrect hash string formatting, insecure use of pickle for network data, and overly permissive file/directory permissions. I've also found some medium-severity issues like typos, redundant code, and leftover debug statements. Please address the critical and high-severity issues before merging.
| md5sum = "{}_{}".format(hashlib.md5(data).hexdigest(), img.patch_num) | ||
| uuid = int(md5sum, 16) |
There was a problem hiding this comment.
The md5sum string is constructed by concatenating a hex digest with an underscore and a number, which results in a string that is not a valid hexadecimal number (e.g., '..._6'). Calling int(md5sum, 16) on this string will raise a ValueError. You should create a single valid hash from all components. For example, you could hash the concatenation of the data and the patch number.
| md5sum = "{}_{}".format(hashlib.md5(data).hexdigest(), img.patch_num) | |
| uuid = int(md5sum, 16) | |
| md5sum = hashlib.md5(data + str(img.patch_num).encode()).hexdigest() | |
| uuid = int(md5sum, 16) |
| md5sum = "{}_{}".format( | ||
| hashlib.md5(data).hexdigest(), | ||
| hashlib.md5(pickle.dumps(audio.extra_params, protocol=4)).hexdigest(), | ||
| ) | ||
| uuid = int(md5sum, 16) |
There was a problem hiding this comment.
The md5sum string is constructed by concatenating two hex digests with an underscore, which results in a string that is not a valid hexadecimal number. Calling int(md5sum, 16) on this string will raise a ValueError. You should create a single valid hash from all components. For example, you could hash the concatenation of the data and the extra parameters.
md5sum = hashlib.md5(data + pickle.dumps(audio.extra_params, protocol=4)).hexdigest()
uuid = int(md5sum, 16)| """ | ||
| 异步获取VIT实例信息 | ||
| """ | ||
| uri = f"ws://{self.args.config_server_host}:{self.args.config_server_port}/registered_visual_objects" |
There was a problem hiding this comment.
The URI for an HTTP GET request should start with http://, not ws://. Using ws:// with an HTTP client will likely fail.
| uri = f"ws://{self.args.config_server_host}:{self.args.config_server_port}/registered_visual_objects" | |
| uri = f"http://{self.args.config_server_host}:{self.args.config_server_port}/registered_visual_objects" |
| os.makedirs(image_embed_dir, mode=0o777, exist_ok=True) | ||
| os.chmod(image_embed_dir, 0o777) |
There was a problem hiding this comment.
Using 0o777 permissions is highly permissive and can be a security risk, as it gives read, write, and execute permissions to everyone on the system. Consider using more restrictive permissions, such as 0o755 for directories, depending on the access requirements.
| os.makedirs(image_embed_dir, mode=0o777, exist_ok=True) | |
| os.chmod(image_embed_dir, 0o777) | |
| os.makedirs(image_embed_dir, mode=0o755, exist_ok=True) | |
| os.chmod(image_embed_dir, 0o755) |
| await websocket.accept() | ||
| client_ip, client_port = websocket.client | ||
| logger.info(f"ws connected from IP: {client_ip}, Port: {client_port}") | ||
| registered_visual_server_obj: VIT_Obj = pickle.loads(await websocket.receive_bytes()) |
There was a problem hiding this comment.
Deserializing data with pickle from a network source is a security risk and can lead to arbitrary code execution. Even for internal services, it's safer to use a more secure serialization format like JSON if possible. If you must use pickle, ensure the communication channel is secure and authenticated.
| with open(tmp_path, "wb") as f: | ||
| torch.save(tensor.detach().cpu(), f, _use_new_zipfile_serialization=False, pickle_protocol=4) | ||
| os.replace(tmp_path, target_path) | ||
| os.chmod(target_path, 0o777) |
There was a problem hiding this comment.
Using 0o777 permissions is highly permissive and can be a security risk, as it gives read, write, and execute permissions to everyone on the system. Consider using more restrictive permissions, such as 0o664 for files, depending on the access requirements.
| os.chmod(target_path, 0o777) | |
| os.chmod(target_path, 0o664) |
| f.flush() | ||
| os.fsync(f.fileno()) | ||
| os.replace(tmp_path, target_path) | ||
| os.chmod(target_path, 0o777) |
There was a problem hiding this comment.
Using 0o777 permissions is highly permissive and can be a security risk, as it gives read, write, and execute permissions to everyone on the system. Consider using more restrictive permissions, such as 0o664 for files, depending on the access requirements.
| os.chmod(target_path, 0o777) | |
| os.chmod(target_path, 0o664) |
| for _, p in enumerate(infer_state.multimodal_params): | ||
| for img in p["images"] + p["audios"]: | ||
| release_ids.append(img["uuid"]) |
There was a problem hiding this comment.
This loop over infer_state.multimodal_params is redundant. You can collect release_ids in the first loop over the same data structure (lines 59-67) to improve efficiency and code clarity. This avoids iterating over the same data twice. You can add release_ids.append(img["uuid"]) to the first loop when self.args.enable_remote_vit is true.
| raise e | ||
| return | ||
|
|
||
| async def get_image_embeding( |
| req.multimodal_params.free() | ||
|
|
||
| try: | ||
| print(instance, flush=True) |
No description provided.