All Serving Listings
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
...
...
...
...
...
...
...
...
...
...
...
...
A high-throughput and memory-efficient inference and serving engine for LLMs
A Blazing Fast AI Gateway with integrated Guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
A blazing fast inference solution for text embeddings models
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
Large Language Model Text Generation Inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Listings per page