Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
vukrosic
/
essential-web-16k-tokenizer
like
0
License:
mit
Model card
Files
Files and versions
xet
Community
main
essential-web-16k-tokenizer
1.59 MB
1 contributor
History:
6 commits
vukrosic
Add essential_web_500k_tokens.pkl - Tokenized version (pickle format) of 500K chars from Essential-Web
4ca13e5
verified
8 months ago
.gitattributes
Safe
1.52 kB
initial commit
8 months ago
README.md
Safe
4.6 kB
Add tokenizer usage documentation
8 months ago
bpe_tokenizer_16k_n1000000.pkl
pickle
Pickle imports
No problematic imports detected
What is a pickle import?
194 kB
xet
Upload bpe_tokenizer_16k_n1000000.pkl
8 months ago
essential_web_500k_text.txt
Safe
504 kB
Add essential_web_500k_text.txt - 500K characters of raw text from Essential-Web dataset
8 months ago
essential_web_500k_tokens.pkl
pickle
Pickle imports
No problematic imports detected
What is a pickle import?
350 kB
xet
Add essential_web_500k_tokens.pkl - Tokenized version (pickle format) of 500K chars from Essential-Web
8 months ago
essential_web_500k_tokens.txt
Safe
538 kB
Add essential_web_500k_tokens.txt - Tokenized version (one token per line) of 500K chars from Essential-Web
8 months ago