All Serving Listings
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
Loading
...
...
...
...
...
...
...
A Blazing Fast AI Gateway with integrated Guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
A high-throughput and memory-efficient inference and serving engine for LLMs
Large Language Model Text Generation Inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A blazing fast inference solution for text embeddings models
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
Listings per page