Research by Dr. Yu Sun: Advancing AI with Hyperbolic Compute

X Discord Reddit Youtube Linkedin

Yu Sun, in collaboration with researchers from Stanford, UC Berkeley, UCSD, UT Austin, and other institutions, has pioneered groundbreaking work in generative AI and sequence modeling using Hyperbolic Labs' GPU infrastructure. These projects have redefined what's possible in video generation and recurrent neural networks.

Test-Time Training (TTT): A Novel Approach

Across multiple projects, Sun and colleagues developed Test-Time Training (TTT) layers—innovative RNN modules with adaptive neural network states that dynamically adjust during inference. This approach has enabled significant breakthroughs in two key areas:

Minute-Long Video Generation

The team tackled a persistent challenge in generative video: creating coherent, minute-long videos from single prompts. While conventional models like Sora and Veo max out at around 20 seconds due to inefficient self-attention mechanisms, Sun's team created the first autoregressive model capable of generating full one-minute videos without post-editing.

Using Hyperbolic's powerful 256× NVIDIA H100 GPU clusters, they integrated TTT layers into a pre-trained 5B CogVideo-X model and trained on storyboarded cartoon scenes. The model achieved a +34 Elo rating improvement over existing baselines (Mamba 2, DeltaNet, sliding window attention).

Metric	Value
GPUs Used	256× NVIDIA H100
Total Runtime	50 GPU-hours
Input Context Length	~300,000 tokens
Evaluation Improvement	+34 Elo vs. Mamba 2 baseline
Dataset	7 hours of storyboarded cartoons

Paper: One-Minute Video Generation with Test-Time Training

RNNs with Expressive Hidden States

In a parallel effort, Sun's team addressed fundamental limitations in modern RNNs: their performance degradation beyond 16k tokens. They introduced TTT-Linear and TTT-MLP—novel RNN layers whose hidden states function as learnable neural networks, adapting dynamically at test-time using gradient-based self-supervision.

Leveraging Hyperbolic's NVIDIA H100 SXM GPUs, the team scaled models to 1.3B parameters and achieved unprecedented context windows of up to 32,000 tokens. The TTT-enhanced RNNs matched or exceeded Transformer performance while maintaining linear-time complexity and constant-memory usage.

Metric	Value
Model Sizes	Scaled from 125M to 1.3B parameters
Max Context Window	32,000 tokens
Runtime Improvement	5× speedup via optimized dual-form computation
Comparisons	Transformer, Mamba, DeltaNet
Code Availability	Open-sourced (JAX + PyTorch)

Paper: Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Impact of Hyperbolic Labs' Infrastructure

Both projects required exceptional computing power to handle extensive fine-tuning, token processing at scale, and complex optimization procedures. Hyperbolic's infrastructure provided stable, high-performance GPU clusters featuring NVIDIA H100s with sufficient memory bandwidth to process 300k+ video-text tokens per sequence.

The platform also offered persistent environments for nested inner-loop/outer-loop optimization and scalable resources for FLOP-matched training experiments, ensuring researchers could focus on innovation rather than infrastructure limitations.

Looking Ahead

The success of these projects opens new possibilities in generative AI and sequence modeling. TTT layers have proven effective for video generation and RNN enhancement, suggesting broader applications across various domains. With Hyperbolic's continued support, researchers like Yu Sun can push the boundaries of what's possible in AI.

"Hyperbolic’s H100 GPUs and services provided the reliability that enabled us to prototype our research in test-time training. Their infrastructure made it easier to scale our models to generate one-minute videos from text storyboards. We were able to focus on research rather than dealing with infrastructure issues.” — Yu Sun

About Hyperbolic

Hyperbolic is the on-demand AI cloud made for developers. We provide fast, affordable access to compute, inference, and AI services. Over 195,000 developers use Hyperbolic to train, fine-tune, and deploy models at scale.

Our platform has quickly become a favorite among AI researchers, including those like Andrej Karpathy. We collaborate with teams at Hugging Face, Vercel, Quora, Chatbot Arena, LMSYS, OpenRouter, Black Forest Labs, Stanford, Berkeley, and beyond.

Founded by AI researchers from UC Berkeley and the University of Washington, Hyperbolic is built for the next wave of AI innovation—open, accessible, and developer-first.