All Dataset engineering Listings
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
...
...
...
...
...
...
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Making data higher-quality, juicier, and more digestible for foundation models! ๐ ๐ ๐ฝ โก๏ธ โก๏ธ๐ธ ๐น ๐ทไธบๅคงๆจกๅๆไพๆด้ซ่ดจ้ใๆดไธฐๅฏใๆดๆโๆถๅโ็ๆฐๆฎ๏ผ
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Crawl a site to generate knowledge files to create your own custom GPT from a URL
TextAttack ๐ is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
800,000 step-level correctness labels on LLM solutions to MATH problems
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
A system for quickly generating training data with weak supervision
Listings per page