All Dataset engineering Listings
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
...
...
...
...
...
...
...
...
...
...
...
...
Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
A system for quickly generating training data with weak supervision
The open-source platform for training advanced AI models and image diffusion.
Crawl a site to generate knowledge files to create your own custom GPT from a URL
800,000 step-level correctness labels on LLM solutions to MATH problems
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Listings per page