+ Submit a Tool

Back to the homepage

Inference Optimization

Filter

Category

Search with AI

All Inference optimization Listings

Loading

...

Loading

...

Loading

...

Loading

...

Loading

...

Loading

...

Footer

The ultimate resource directory for free AI tools and resources.

Resources

All Listings
Blog
Newsletter

Legal

Terms
Privacy
Cookies

© 2023 - 2025 AI Something. All rights reserved.

Cover Image

Internal link to /explore/torch-pruning

Model Development 📊🧠

Internal link to /explore/torch-pruning

Torch-Pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning

Open Source Inference optimization

Cover Image

Internal link to /explore/llama.cpp

Model Development 📊🧠

Internal link to /explore/llama.cpp

llama.cpp

LLM inference in C/C++

Open Source Inference optimization

Cover Image

Internal link to /explore/mlc-llm

Model Development 📊🧠

Internal link to /explore/mlc-llm

mlc-llm

Universal LLM Deployment Engine with ML Compilation

Open Source Inference optimization

Cover Image

Internal link to /explore/tensorrt-llm

Model Development 📊🧠

Internal link to /explore/tensorrt-llm

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Open Source Inference optimization

Cover Image

Internal link to /explore/bitnet

Model Development 📊🧠

Internal link to /explore/bitnet

BitNet

Official inference framework for 1-bit LLMs

Open Source Framework Inference optimization

Cover Image

Internal link to /explore/vector-quantize-pytorch

Model Development 📊🧠

Internal link to /explore/vector-quantize-pytorch

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Open Source Inference optimization

Cover Image

Internal link to /explore/flash-attention

Model Development 📊🧠

Internal link to /explore/flash-attention

flash-attention

Fast and memory-efficient exact attention

Open Source Inference optimization

Cover Image

Internal link to /explore/ggml

Model Development 📊🧠

Internal link to /explore/ggml

ggml

Tensor library for machine learning

Open Source Inference optimization

Cover Image

Internal link to /explore/llama2.c

Model Development 📊🧠

Internal link to /explore/llama2.c

llama2.c

Inference Llama 2 in one file of pure C

Open Source Inference optimization

Cover Image

Internal link to /explore/llm.c

Model Development 📊🧠

Internal link to /explore/llm.c

llm.c

LLM training in simple, raw C/CUDA

Open Source Inference optimization

Cover Image

Internal link to /explore/whisper.cpp

Model Development 📊🧠

Internal link to /explore/whisper.cpp

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Open Source Artificial Intelligence Inference optimization

Cover Image

Internal link to /explore/unsloth

Model Development 📊🧠

Internal link to /explore/unsloth

unsloth

Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory

Open Source Inference optimization

Listings per page

Showing1 - 12of16listings

Page 1 of 2