May 11, 2024
Expand the context window of Gemma 2B to 10 million
Using a technique called Infini-Attention, the context window of Gemma 2B is expanded to 10M.
The expanded Gemma-10M model is able to maintain low memory and computational costs.
- Supports sequences up to 10 million in length. — Use no more than 32GB of memory. — O (1) memory and O (n) time computational complexity allows it to still run on consumer hardware.
Its main approach is to preserve long-distance dependencies through cyclic local attention and compressed memory.
Model Download: https://t.co/3Auv2AkQCK