Sangsang/qwen3-8B-thinksafe-Qwen-8B-unfiltered-raw-32-pm-3ep Text Generation • Updated about 11 hours ago
Sangsang/R1-8B-thinksafe-DeepSeek-8B-unfiltered-raw-32-pm-3ep Text Generation • Updated about 13 hours ago
Sangsang/feedback_disallowed_ema_DeepSeek-R1-Distill-Llama-8B_reverse_kl_ema0p999_ep30 Text Generation • Updated 1 day ago • 20
Sangsang/grpo_DeepSeek-R1-Distill-Llama-8B_bs8_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 1 day ago • 12
Sangsang/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-7B-Instruct_bw1p6_fw0p4_ema0p999_ep30 Text Generation • 8B • Updated 9 days ago • 24
Sangsang/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-7B-Instruct_bw1p0_fw1p0_ema0p999_ep30 Text Generation • 8B • Updated 9 days ago • 32
Sangsang/ci-feedback_both_ema_plus_interp_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_stw0p3_ep30 Text Generation • 8B • Updated 9 days ago • 25
Sangsang/ci-feedback_both_interp_Qwen2.5-7B-Instruct_from_Qwen2.5-7B-Instruct_jsd_b0p8_stw0p3_ep30 Text Generation • 8B • Updated 9 days ago • 33
Sangsang/ci-feedback_both_ema_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_ep30 Text Generation • 8B • Updated 9 days ago • 37
Sangsang/ci-feedback_both_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • 8B • Updated 9 days ago • 44
Sangsang/ci-feedback_disallowed_ema_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_ep30 Text Generation • 8B • Updated 9 days ago • 31
Sangsang/ci-feedback_disallowed_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • 8B • Updated 9 days ago • 43
Sangsang/ci_feedback_both_feedback_jsd_b0p8_ema0p999 Text Generation • 8B • Updated 12 days ago • 435