StableLM, a series of open-source language models crafted by Stability AI, focuses on advancing natural language understanding and generation tasks. Among these models, StableLM-3B-4E1T stands out with its configuration of 3 billion parameters. It underwent rigorous pre-training across multiple epochs to investigate the effects of repeated tokens on downstream performance, drawing inspiration from prior research.
This specific model, StableLM-3B-4E1T, employs a decoder-only transformer architecture similar to LLaMA, enhanced with Rotary Position Embeddings for optimized throughput and LayerNorm featuring learned bias terms. Trained on 1 trillion tokens derived from filtered datasets like Falcon RefinedWeb extract and RedPajama-Data, it aims to provide robust performance across various NLP tasks.
Stability AI continues to expand the StableLM series, anticipating future releases that promise further fine-tuning capabilities and versatility. These models, including the 7B variant like StableLM-Tuned-Alpha-7B, are designed to offer viable alternatives in the realm of natural language processing, inviting exploration and application in diverse contexts.