Yu Sun, in collaboration with researchers from Stanford, UC Berkeley, UCSD, UT Austin, and other institutions, has pioneered groundbreaking work in generative AI and sequence modeling using Hyperbolic Labs' GPU infrastructure. These projects have redefined what's possible in video generation and recurrent neural networks.
Test-Time Training (TTT): A Novel Approach
Across multiple projects, Sun and colleagues developed Test-Time Training (TTT) layers—innovative RNN modules with adaptive neural network states that dynamically adjust during inference. This approach has enabled significant breakthroughs in two key areas:
Minute-Long Video Generation
The team tackled a persistent challenge in generative video: creating coherent, minute-long videos from single prompts. While conventional models like Sora and Veo max out at around 20 seconds due to inefficient self-attention mechanisms, Sun's team created the first autoregressive model capable of generating full one-minute videos without post-editing.
Using Hyperbolic's powerful 256× NVIDIA H100 GPU clusters, they integrated TTT layers into a pre-trained 5B CogVideo-X model and trained on storyboarded cartoon scenes. The model achieved a +34 Elo rating improvement over existing baselines (Mamba 2, DeltaNet, sliding window attention).
Metric | Value |
---|---|
GPUs Used | 256× NVIDIA H100 |
Total Runtime | 50 GPU-hours |
Input Context Length | ~300,000 tokens |
Evaluation Improvement | +34 Elo vs. Mamba 2 baseline |
Dataset | 7 hours of storyboarded cartoons |
Paper: One-Minute Video Generation with Test-Time Training
RNNs with Expressive Hidden States
In a parallel effort, Sun's team addressed fundamental limitations in modern RNNs: their performance degradation beyond 16k tokens. They introduced TTT-Linear and TTT-MLP—novel RNN layers whose hidden states function as learnable neural networks, adapting dynamically at test-time using gradient-based self-supervision.
Leveraging Hyperbolic's NVIDIA H100 SXM GPUs, the team scaled models to 1.3B parameters and achieved unprecedented context windows of up to 32,000 tokens. The TTT-enhanced RNNs matched or exceeded Transformer performance while maintaining linear-time complexity and constant-memory usage.
Metric | Value |
---|---|
Model Sizes | Scaled from 125M to 1.3B parameters |
Max Context Window | 32,000 tokens |
Runtime Improvement | 5× speedup via optimized dual-form computation |
Comparisons | Transformer, Mamba, DeltaNet |
Code Availability | Open-sourced (JAX + PyTorch) |
Paper: Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Impact of Hyperbolic Labs' Infrastructure
Both projects required exceptional computing power to handle extensive fine-tuning, token processing at scale, and complex optimization procedures. Hyperbolic's infrastructure provided stable, high-performance GPU clusters featuring NVIDIA H100s with sufficient memory bandwidth to process 300k+ video-text tokens per sequence.
The platform also offered persistent environments for nested inner-loop/outer-loop optimization and scalable resources for FLOP-matched training experiments, ensuring researchers could focus on innovation rather than infrastructure limitations.
Looking Ahead
The success of these projects opens new possibilities in generative AI and sequence modeling. TTT layers have proven effective for video generation and RNN enhancement, suggesting broader applications across various domains. With Hyperbolic's continued support, researchers like Yu Sun can push the boundaries of what's possible in AI.
"Hyperbolic’s H100 GPUs and services provided the reliability that enabled us to prototype our research in test-time training. Their infrastructure made it easier to scale our models to generate one-minute videos from text storyboards. We were able to focus on research rather than dealing with infrastructure issues.” — Yu Sun

About Hyperbolic
Hyperbolic is the on-demand AI cloud made for developers. We provide fast, affordable access to compute, inference, and AI services. Over 195,000 developers use Hyperbolic to train, fine-tune, and deploy models at scale.
Our platform has quickly become a favorite among AI researchers, including those like Andrej Karpathy. We collaborate with teams at Hugging Face, Vercel, Quora, Chatbot Arena, LMSYS, OpenRouter, Black Forest Labs, Stanford, Berkeley, and beyond.
Founded by AI researchers from UC Berkeley and the University of Washington, Hyperbolic is built for the next wave of AI innovation—open, accessible, and developer-first.
Website | X | Discord | LinkedIn | YouTube | GitHub | Documentation