Documentation Index
Fetch the complete documentation index at: https://docs.hyperbolic.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Performance and Limits
Understand rate limits, service tiers, and infrastructure capabilities for Hyperbolic’s Serverless Inference API.Rate Limits
Standard Limits
| Tier | Requests/Minute | Requirements |
|---|---|---|
| Basic | 60 | Free account |
| Pro | 600 | $5+ deposit |
| Enterprise | Unlimited | Contact sales |
All tiers have a per-IP limit of 600 requests/minute for DDoS protection.
Model-Specific Limits
Some resource-intensive models have special rate limits:| Model | Basic | Pro |
|---|---|---|
| Llama 3.1 405B | 5/min | 120/min |
| Llama 3.1 405B-Instruct | 5/min | 120/min |
| FLUX.1-dev ⚠️ Sunset | 1/5 min | 50/min |
Upgrading to Pro
Get 10x higher rate limits by upgrading to Pro:Log into Dashboard
Go to app.hyperbolic.ai and sign in.
Service Tiers
| Feature | Basic | Pro | Enterprise |
|---|---|---|---|
| Rate Limit | 60/min | 600/min | Unlimited |
| Cost | Free | $5+ deposit | Custom |
| Support | Community Discord | 24/7 dedicated | |
| Priority Queue | - | Yes | Yes |
| Dedicated Instances | - | - | Yes |
| Custom SLAs | - | - | Yes |
| Fine-tuning | - | - | Yes |
Basic tier includes $1 promotional credit when you verify your phone number.
Pricing Summary
Hyperbolic uses pay-as-you-go pricing with no monthly quotas or commitments.Text Generation
| Model Category | Price |
|---|---|
| Small models (3B-8B) | From $0.10 per 1M tokens |
| Medium models (32B-72B) | $0.20 - $0.40 per 1M tokens |
| Large models (120B-480B) | $0.30 - $4.00 per 1M tokens |
Image Generation
Base rate: $0.01 per image (1024x1024, 25 steps) Formula:$0.01 × (width/1024) × (height/1024) × (steps/25)
Audio Generation
Rate: $5.00 per 1M charactersSee Text APIs, Image APIs, and Audio APIs for complete pricing by model.
Infrastructure
Security
| Feature | Description |
|---|---|
| Zero Data Retention | Your prompts and responses are never stored |
| Encryption | TLS 1.3 for all API connections |
| Compliance | SOC2 compliance (Enterprise tier) |
Error Handling
Rate Limit Errors
When you exceed rate limits, you’ll receive a429 Too Many Requests response:
Best Practices
- Implement exponential backoff for automatic retries
- Monitor usage via the dashboard to stay within limits
- Cache responses when appropriate to reduce API calls
- Use streaming for long responses to improve perceived latency
Retry Example
Monitoring Usage
Track your API usage in the Hyperbolic Dashboard:- Requests per minute/hour/day
- Token consumption by model
- Cost breakdown and billing history
- Real-time usage graphs
Next Steps
Text APIs
Models and pricing details
Image APIs
Image generation pricing
Audio APIs
Text-to-speech pricing

