All Dataset engineering Listings
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
...
...
...
...
...
...
...
...
...
...
...
...
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
A system for quickly generating training data with weak supervision
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Crawl a site to generate knowledge files to create your own custom GPT from a URL
The open-source platform for training advanced AI models and image diffusion.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
800,000 step-level correctness labels on LLM solutions to MATH problems
Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Listings per page