Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
jaygala24
's Collections
RL post-training
RL post-training
updated
3 days ago
Upvote
-
jaygala24/Qwen3-4B-GRPO-KL-math-reasoning
Text Generation
•
4B
•
Updated
12 days ago
•
1.12k
jaygala24/Qwen3-4B-GRPO-math-reasoning
Text Generation
•
4B
•
Updated
12 days ago
•
936
jaygala24/Qwen3-4B-ReMax-math-reasoning
Text Generation
•
4B
•
Updated
12 days ago
•
877
jaygala24/Qwen3-4B-RLOO-math-reasoning
Text Generation
•
4B
•
Updated
6 days ago
•
336
jaygala24/Qwen3-4B-DAPO-math-reasoning
Text Generation
•
4B
•
Updated
3 days ago
•
572
jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
12 days ago
•
909
jaygala24/Qwen3-1.7B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
12 days ago
•
924
jaygala24/Qwen3-1.7B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
12 days ago
•
966
jaygala24/Qwen3-1.7B-RLOO-math-reasoning
Text Generation
•
2B
•
Updated
7 days ago
•
845
jaygala24/Qwen3-1.7B-DAPO-math-reasoning
Text Generation
•
2B
•
Updated
7 days ago
•
726
jaygala24/Qwen2.5-3B-GRPO-KL-math-reasoning
Text Generation
•
3B
•
Updated
12 days ago
•
845
jaygala24/Qwen2.5-3B-GRPO-math-reasoning
Text Generation
•
3B
•
Updated
12 days ago
•
859
jaygala24/Qwen2.5-3B-ReMax-math-reasoning
Text Generation
•
3B
•
Updated
12 days ago
•
515
jaygala24/Qwen2.5-3B-RLOO-math-reasoning
Text Generation
•
3B
•
Updated
7 days ago
•
777
jaygala24/Qwen2.5-3B-DAPO-math-reasoning
Text Generation
•
3B
•
Updated
7 days ago
•
691
jaygala24/Qwen2.5-1.5B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
12 days ago
•
574
jaygala24/Qwen2.5-1.5B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
12 days ago
•
617
jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
12 days ago
•
496
jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning
Text Generation
•
2B
•
Updated
7 days ago
•
715
jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning
Text Generation
•
2B
•
Updated
7 days ago
•
852
jaygala24/Qwen2.5-0.5B-GRPO-KL-math-reasoning
Text Generation
•
0.5B
•
Updated
12 days ago
•
583
jaygala24/Qwen2.5-0.5B-GRPO-math-reasoning
Text Generation
•
0.5B
•
Updated
12 days ago
•
616
jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning
Text Generation
•
0.5B
•
Updated
12 days ago
•
481
jaygala24/Qwen2.5-0.5B-RLOO-math-reasoning
Text Generation
•
0.5B
•
Updated
7 days ago
•
675
jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning
Text Generation
•
0.5B
•
Updated
7 days ago
•
660
Upvote
-
Share collection
View history
Collection guide
Browse collections