Hero image

State-of-the-artai models

Start in minutes with low-latency, pay-as-you-go serverless inference.

Hero image
Hero image

Why Run Inference on Hyperbolic?

Serverless API

Run models via REST API with Python, TypeScript, and cURL support, no infrastructure setup required, and already integrated with OpenAI? Keep your code flow and swap in your base URL and API key to start calling Hyperbolic's model catalog.

Privacy-first design

Zero data retention means requests are processed in real time and don't get logged, stored, or shared. Your prompts and responses disappear when the request finishes.

Scalable inference capacity

Built for high-demand AI workloads with flexible GPU availability and the ability to handle production traffic without the "we hit quota" wall.

Affordable pricing

Pay-as-you-go pricing with clear, displayed rates and no hidden fees or long-term commitments. If you need guaranteed capacity, you can also move to dedicated hosting with hourly pricing.

Low-latency global infrastructure

Optimized for fast response times across regions, so your users aren't waiting on a faraway cluster.

On-demand model hosting

Run open-source models in seconds, no setup, no DevOps. Hyperbolic's on-demand hosting gives you high-performance GPUs and private API access, great for rapid prototyping, internal tools, and early-stage launches. You'll find popular open models across modalities, including Llama 3.1, Qwen 2.5, DeepSeek V2.5, SDXL, and Flux, with new models rolling in regularly.

Text-TextText-Text
Text-ImageText-Image
VLMsVLMs
Text-AudioText-Audio

Pricing

Our transparent, usage-based pricing model keeps things predictable. Rates are displayed upfront, billing is straightforward, and you can choose the setup that matches how you run inference today.

Qwen2-VL-72B-Instruct

Qwen2.5-Coder-32B

Llama-3.2-3B

Qwen2.5-72B

DeepSeek-V2.5

Llama-3-70B

Hermes-3-70B

Llama-3.1-405B

Llama-3.1-70B

Llama-3.1-8B

Llama 3.1 8B (BF16) - Base

Where Inference Happens

Dedicated Model Hosting for AI Teams

For teams that need guaranteed availability, custom configurations, or always-on capacity, dedicated hosting provides single-tenant GPU instances with private endpoints. Bring your own weights, dial in serving parameters, and monitor usage in one place. If you're doing higher-throughput inference or running a production system with strict requirements, this is the "no surprises" option.

Dedicated, single-tenant GPU instances with private endpoints

Dedicated, single-tenant GPU instances with private endpoints

Supports VLMs, LLMs, image/audio/video generation, quantization, batching, and speculative decoding

Supports VLMs, LLMs, image/audio/video generation, quantization, batching, and speculative decoding

Bring your own weights, tune settings, and monitor usage

Bring your own weights, tune settings, and monitor usage

Pay hourly with unlimited requests, scale up or down anytime

Pay hourly with unlimited requests, scale up or down anytime

Priority support with direct access to the team when it matters

Priority support with direct access to the team when it matters

Inference at a
Fraction of the Cost

Access powerful inference engines without torching your budget. Hyperbolic's optimized model serving and efficient infrastructure translate into real savings, commonly three to ten times lower cost compared to traditional providers, while still delivering the performance you need.

Basic TierPro TierEnterprise Tier
60RPM600RPMUnlimited
100/min100/minUnlimited
Full precision (BF16) SOTA open-source modelsFull precision (BF16) SOTA open-source modelsFull precision (BF16) SOTA open-source models + custom models
Pay-as-you-goPay-as-you-goCustom Hourly pricing billed by GPU Type
Available Upon Request
Get StartedGet Started
Upgrade NowUpgrade Now
Contact UsContact Us

Made for Making

  • Dedicated Hosting

  • Accelerating Developer Access to Open-Source AI

Finding a host for the particular model we've been looking to use wasn't easy — Hyperbolic was the only platform that had it ready to go. Not only has the performance been outstanding, but their pricing absolutely crushes the major competitors. On top of that, the Hyperbolic founders provide the best customer support we’ve experienced, always going above and beyond to solve our needs. Partnering with them has been a huge win for us.

Taesung Park

Taesung Park

Co-Founder of Reve AI