Large language models (LLMs) are evolving fast—and retraining offers a powerful path to new architectural capabilities, lower cost, and better performance. Recent research on the Brumby-14B-Base model shows how attention-free retention layers, efficient retraining from pretrained weights, and long-context processing can reshape what’s possible in generative AI. This session dives deep into these techniques and how you can apply them in your own work.
In this code-along, Jacob Buckman, the CEO at Manifest AI, guides you through cutting-edge methods for LLM design, retraining, and deployment. You’ll explore how alternative attention mechanisms like power retention change model behaviour, how to repurpose pretrained models for new architectures, and how to engineer large-scale training pipelines. Whether you’re building foundation models or pushing boundaries in generative AI research, this session gives you the tools and insights to lead the next wave.
Key Takeaways:
- Understand how different attention (and retention) mechanisms affect LLM behaviour, performance, and cost.
- Learn the process of retraining large language models by repurposing pretrained weights and exploring new architectures.
- Discover the latest research techniques—including long-context inference, hardware-efficient kernels, and scalable LLM families—and how to apply them in your projects.
Session Resources + GitHub Link + Power Attention - Manifest AI



