Documentation Index Fetch the complete documentation index at: https://docs.hyperbolic.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Cluster Configuration
Configure your dedicated cluster with the exact specifications your workloads require. Our team works with you to design the optimal setup for your use case.
GPU Selection
Choose from the latest NVIDIA hardware:
GPU Model Memory Best For Blackwell (B200) 192GB HBM3e Cutting-edge training and inference H200 141GB HBM3e Next-gen LLM training and inference H100 80GB HBM3 Industry standard for large-scale training
Custom Mix : You can combine different GPU types in a single cluster for specialized workloads.
Networking Options
High-speed interconnects are critical for distributed training. Choose the right networking for your needs:
InfiniBand (400Gb/s)
Ultra-low latency networking for distributed training at scale.
Best for: Multi-node training with 32+ GPUs
Latency: Sub-microsecond
Topology: Fat-tree or custom
RoCE (200Gb/s)
High-speed Ethernet alternative with RDMA support.
Best for: Mixed training/inference workloads
Latency: Low microsecond
Easier integration with existing infrastructure
NVLink
Direct GPU-to-GPU communication within a node.
Best for: Intra-node communication
Bandwidth: Up to 900GB/s
Included with multi-GPU nodes
Custom Topology
Need something specific? We can design custom network architectures for your requirements.
Compute Specifications
Configure the CPU and memory for your nodes:
Component Options CPU Up to 256 cores per node RAM Up to 2TB per node Internet Bandwidth 10-100 Gbps connectivity
Storage Solutions
Local Storage
Fast NVMe storage attached directly to your nodes.
Capacity : Up to 30TB per node
Performance : Up to 7GB/s read/write
Best for : Training checkpoints, scratch space
Shared Storage
Network-attached storage accessible from all nodes.
Type Capacity Best For Lustre Up to 1PB High-performance parallel I/O NFS Up to 1PB General-purpose shared storage
Object Storage
S3-compatible storage for datasets and artifacts.
Capacity : Unlimited
Best for : Large datasets, model artifacts, backups
Backup Options
Automated snapshots
Point-in-time recovery
Cross-region replication (optional)
Example Configurations
LLM Training Cluster
Optimized for training large language models:
Component Specification GPUs 64x H100 80GB Networking InfiniBand 400Gb/s CPU 128 cores per node RAM 1TB per node Storage 15TB NVMe local + 500TB Lustre shared
Contact sales for pricing based on your specific configuration and commitment terms.
Inference Cluster
Optimized for high-throughput inference:
Component Specification GPUs 16x H100 80GB Networking RoCE 200Gb/s CPU 64 cores per node RAM 512GB per node Storage 8TB NVMe local
Research Cluster
Flexible configuration for R&D teams:
Component Specification GPUs 32x H200 141GB Networking NVLink + RoCE CPU 128 cores per node RAM 1TB per node Storage 10TB NVMe local + 200TB NFS shared
Security Options
VPN Access : Secure connectivity to your cluster
Private Networking : Isolated network environment
Custom Firewall Rules : Control inbound/outbound traffic
Compliance Configurations : SOC2, HIPAA-ready setups available
Next Steps
Get Started Begin the reservation process with our sales team
Cluster Management Learn about deployment and monitoring options