Amélie Dubois

page-watcher

21 26

AI & ML interests

None yet

Recent Activity

liked a dataset 1 day ago

WINGNUS/ACL-OCL

liked a Space 6 days ago

AimeeBingmouQu/ProtectBirds

liked a model 15 days ago

thienbaotvb/thienbaotvb

View all activity

Organizations

None yet

upvoted 4 papers about 1 month ago

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Paper • 2605.21467 • Published May 20 • 207

upvoted 2 papers about 2 months ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 196

Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published Apr 30 • 59

upvoted 2 papers 2 months ago

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Paper • 2604.20398 • Published Apr 22 • 3

DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off

Paper • 2604.13902 • Published Apr 15 • 62

upvoted 4 papers 3 months ago

Small Vision-Language Models are Smart Compressors for Long Video Understanding

Paper • 2604.08120 • Published Apr 9 • 21

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 265

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Paper • 2603.26164 • Published Mar 27 • 365

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 353

upvoted 5 papers 4 months ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 373

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 211

Believe Your Model: Distribution-Guided Confidence Calibration

Paper • 2603.03872 • Published Mar 4 • 40

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 526

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 221

upvoted 3 papers 5 months ago

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Paper • 2602.12783 • Published Feb 13 • 246

Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Paper • 2602.10388 • Published Feb 11 • 246

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents

Paper • 2602.07274 • Published Feb 6 • 211

Amélie Dubois

AI & ML interests

Recent Activity

Organizations

page-watcher's activity