All Dataset engineering Listings
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
...
...
...
...
...
...
...
...
...
...
...
...
A system for quickly generating training data with weak supervision
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Crawl a site to generate knowledge files to create your own custom GPT from a URL
The open-source platform for training advanced AI models and image diffusion.
800,000 step-level correctness labels on LLM solutions to MATH problems
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Listings per page