TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!
YASH AKHAURI
akhauriyash
AI & ML interests
None yet
Recent Activity
published
a model about 8 hours ago
akhauriyash/DDR1_Q1.5B-GRPO-DACD updated
a model about 9 hours ago
akhauriyash/DDR1_Q1.5B-DAPO published
a model about 16 hours ago
akhauriyash/DDR1_Q1.5B-DAPO