Slim Attention
Slim Attention: cut memory use by up to 8x, boost speed, and unlock real-time, on-device language models—no accuracy lost, just pure efficiency and blazing-fast results
Flash Normalization
FlashNorm is an exact but faster implementation of RMSNorm, LayerNorm, and Dynamic Tanh (DyT). RMSNorm is used by virtually all modern LLMs including LLama, Gemma, Mistral, and OLMo 2.
Papers
Unlock Lightning-Fast AI with Slim Attention
Discover Slim Attention—a breakthrough in transformer efficiency. This innovative method cuts context memory use by up to 8x and doubles inference speed, all without sacrificing accuracy. Instantly upgrade your large language and vision-language models with a simple, drop-in solution—no retraining needed. Power up your AI with less memory, more speed, and zero compromise.
Introducing FlashNorm: Streamlining AI Performance
FlashNorm is an innovative approach to normalization in large language models, delivering faster processing and simplified architecture without compromising accuracy. Discover how this advancement can enhance AI efficiency and support more scalable, cost-effective solutions.