Localize Viusal Understanding - a che111 Collection

che111 's Collections

Work for 3D Medical Vision

Med Multimodal Learning

Localize Viusal Understanding

Generative Model

Synthetic Data Learning

Explaniable, Fairness Work

General Multimodal Learning

Localize Viusal Understanding

updated Oct 4, 2024

GLaMM: Pixel Grounding Large Multimodal Model

Paper • 2311.03356 • Published Nov 6, 2023 • 36
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Paper • 2311.07575 • Published Nov 13, 2023 • 15
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Paper • 2311.03354 • Published Nov 6, 2023 • 7
Language-Informed Visual Concept Learning

Paper • 2312.03587 • Published Dec 6, 2023 • 8
Denoising Vision Transformers

Paper • 2401.02957 • Published Jan 5, 2024 • 31
Learning Anatomically Consistent Embedding for Chest Radiography

Paper • 2312.00335 • Published Dec 1, 2023
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision

Paper • 2404.15672 • Published Apr 24, 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Paper • 2406.17770 • Published Jun 25, 2024 • 19
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

Paper • 2407.16198 • Published Jul 23, 2024 • 13
Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3, 2024 • 37