Skip to content

[Feature]: Extend the APIs of other model providers to support image and video #1767

@Vasilije1990

Description

@Vasilije1990

Problem Statement

Currently Gemini and other providers can't process audio, image and video

Proposed Solution

Check OpenAI adapter and replicate logic to transcribe audio and images to text

Alternatives Considered

No response

Use Case

Send audio, video and image data to cognee

Implementation Ideas

Copy implementation from OpenAI adapter and improve if possible

Additional Context

No response

Pre-submission Checklist

  • I have searched existing issues to ensure this feature hasn't been requested already
  • I have provided a clear problem statement and proposed solution
  • I have described my specific use case

Metadata

Metadata

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions