A curated list of different papers and datasets in various areas of audio-visual processing
-
Updated
Jan 30, 2024
A curated list of different papers and datasets in various areas of audio-visual processing
Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion (CVPR 2022, Oral)
Multimodal Transformer for Korean Sentiment Analysis with Audio and Text Features
PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
[Paper][IJCNN2023] Modality-Aware Negative Sampling for Multi-modal Knowledge Graph Embedding
Tri-AI is a multi AI calling system within claude code. It combines the architecture and coding of claude, the alternative thinking and first time execution of codex, alongside the vision and research capabilities of Kimi, allowing for mutli-purpose single window orchestration and execution using subscriptions or APIs.
Multi-modal AI agent that extracts information from PDFs, images, and documents to answer questions. Combines vision models with RAG architecture for intelligent document understanding. Upload any file and chat with your documents. Built with LangChain, vision APIs, and vector embeddings.
A high-performance, full-stack RAG engine featuring image support, FlashRank reranking, and a modern Next.js streaming chat interface
This repo reproduces key findings from Masked Autoencoders Are Scalable Vision Learners (MAE) on CIFAR-10: self-supervised pretraining improves downstream classification versus training from scratch, and we studied how decoder depth and decoder width affect MAE pretraining and downstream results.
A multi-language invoice data extractor tool using Google Gemini Pro and Streamlit with Prompt Engineering.
Reliability-Aware Early Radiographic Progression Modeling #ResearchProject
Add a description, image, and links to the mutli-modal topic page so that developers can more easily learn about it.
To associate your repository with the mutli-modal topic, visit your repo's landing page and select "manage topics."