Fast AI engines for your chip

We design high-performance AI engines tailored for your chip, chiplet, RISC-V SoC, ASIC and FPGA.
Our team has designed AI chips for the last 10 years at Google (TPUs), Groq (LPUs), Apple and Waymo. We are based in the San Francisco Bay Area and VC-backed by Mozilla.

Slim Attention

Slim Attention: cut memory use by up to 8x, boost speed, and unlock real-time, on-device language models—no accuracy lost, just pure efficiency and blazing-fast results

Flash Normalization

FlashNorm is an exact but faster implementation of RMSNorm, LayerNorm, and Dynamic Tanh (DyT). RMSNorm is used by virtually all modern LLMs including LLama, Gemma, Mistral, and OLMo 2.

Papers

Unlock Lightning-Fast AI with Slim Attention

Discover Slim Attention—a breakthrough in transformer efficiency. This innovative method cuts context memory use by up to 8x and doubles inference speed, all without sacrificing accuracy. Instantly upgrade your large language and vision-language models with a simple, drop-in solution—no retraining needed. Power up your AI with less memory, more speed, and zero compromise.

Introducing FlashNorm: Streamlining AI Performance

FlashNorm is an innovative approach to normalization in large language models, delivering faster processing and simplified architecture without compromising accuracy. Discover how this advancement can enhance AI efficiency and support more scalable, cost-effective solutions.