Phi-Ground Tech Report: Advancing Perception in GUI Grounding Paper • 2507.23779 • Published Jul 31, 2025 • 44
view article Article Introducing smolagents: simple agents that write actions in code. +1 Dec 31, 2024 • 1.16k
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 86
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 29
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 46
Consolidating Attention Features for Multi-view Image Editing Paper • 2402.14792 • Published Feb 22, 2024 • 8
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans Paper • 2308.08545 • Published Aug 16, 2023 • 34
TokenFlow: Consistent Diffusion Features for Consistent Video Editing Paper • 2307.10373 • Published Jul 19, 2023 • 57