A high quality Vietnamese pretraining dataset for LLMs
AI & ML interests
None defined yet.
models 0
None public yet
datasets 17
group2sealion/vnu-hard-clean
Viewer • Updated • 29.8k • 44
group2sealion/web_science_extract
Viewer • Updated • 11.6k • 35
group2sealion/qwen-gen-vnu
Viewer • Updated • 856 • 61
group2sealion/vnu_crawl
Viewer • Updated • 42.2k • 13
group2sealion/15mil_milestone
Viewer • Updated • 2.43M • 27
group2sealion/sft_eval
Viewer • Updated • 223 • 4
group2sealion/4mil_milestone
Viewer • Updated • 2.53M • 30
group2sealion/11mil_last
Viewer • Updated • 1.85M • 29
group2sealion/8mil_last
Viewer • Updated • 1.85M • 17
group2sealion/last_result
Viewer • Updated • 1.82M • 4