Multimodal Image Classification
updated
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Paper
•
2405.15668
•
Published
On Large Multimodal Models as Open-World Image Classifiers
Paper
•
2503.21851
•
Published
•
5
Benchmarking Large Language Models for Image Classification of Marine
Mammals
Paper
•
2410.19848
•
Published
Parameter-Inverted Image Pyramid Networks for Visual Perception and
Multimodal Understanding
Paper
•
2501.07783
•
Published
•
8
VALE: A Multimodal Visual and Language Explanation Framework for Image
Classifiers using eXplainable AI and Language Models
Paper
•
2408.12808
•
Published
Sparse Attention Vectors: Generative Multimodal Model Features Are
Discriminative Vision-Language Classifiers
Paper
•
2412.00142
•
Published
•
5
Interpretable Bilingual Multimodal Large Language Model for Diverse
Biomedical Tasks
Paper
•
2410.18387
•
Published
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation
Models on Standard Computer Vision Tasks
Paper
•
2507.01955
•
Published
•
36
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of
Multi-Modal Image Generation Models
Paper
•
2505.19415
•
Published
•
2
MM-DINOv2: Adapting Foundation Models for Multi-Modal Medical Image
Analysis
Paper
•
2509.06617
•
Published
•
1
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for
Remote Sensing Image Analysis
Paper
•
2502.09598
•
Published
DuPLUS: Dual-Prompt Vision-Language Framework for Universal Medical Image Segmentation and Prognosis
Paper
•
2510.03483
•
Published
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via
Modality Inversion
Paper
•
2502.04263
•
Published
•
1
GeoPix: Multi-Modal Large Language Model for Pixel-level Image
Understanding in Remote Sensing
Paper
•
2501.06828
•
Published
A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level
Paper
•
2507.06972
•
Published
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with
a Multi-Modal Dataset and Retrieval-Augmented Generation Model
Paper
•
2504.04988
•
Published
Towards Explainable Fake Image Detection with Multi-Modal Large Language
Models
Paper
•
2504.14245
•
Published
MINT: Multi-modal Chain of Thought in Unified Generative Models for
Enhanced Image Generation
Paper
•
2503.01298
•
Published
•
1
How Do Images Align and Complement LiDAR? Towards a Harmonized
Multi-modal 3D Panoptic Segmentation
Paper
•
2505.18956
•
Published
•
1
MMGR: Multi-Modal Generative Reasoning
Paper
•
2512.14691
•
Published
•
118
CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote
Sensing Image Classification
Paper
•
2509.00677
•
Published
Head Pursuit: Probing Attention Specialization in Multimodal
Transformers
Paper
•
2510.21518
•
Published
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal
Transformer Distillation
Paper
•
2505.21549
•
Published
MIRAGE: Multimodal foundation model and benchmark for comprehensive
retinal OCT image analysis
Paper
•
2506.08900
•
Published
•
4