EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Paper
•
2505.04623
•
Published
This repository contains the EchoInk-R1-7B model as presented in EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning.
For training and inference, please refer to the Code: https://github.com/HarryHsing/EchoInk
Base model
Qwen/Qwen2.5-Omni-7B