Mistral released two 7B miniatures: Codestral Mamba 7B and Mathstral 7B

4 min readJul 20, 2024

Mistral has trained a 7B code model using the Mamba 2 architecture: Codestral Mamba, and has launched a new model for mathematical reasoning and scientific discovery using the same architecture as Mistral 7B: Mathstral 7B

Codestral Mamba surpasses DeepSeek QwenCode as the best model with less than 10B parameters and is competitive with Codestral 22B with support for 256K contexts.

Unlike traditional Transformer models, the Mamba model is more efficient in processing time and can handle input sequences of unlimited length. Users can use, modify and distribute the model for free, and it is suitable for various code-related application scenarios.

Codestral Mamba has the following features:

Linear Time Inference : The Mamba model has a linear time advantage in inference time, which makes it more efficient in processing large-scale input data.
Infinite-length sequence modeling : In theory, it can handle sequences of infinite length, making it perform well when processing long texts or codes.
Advanced code and reasoning capabilities : The model is specially trained for code productivity and has advanced code understanding and reasoning capabilities, and can perform well in code-related tasks.
Efficient context retrieval : In the context retrieval capability test, the Mamba model is able to process contexts of up to 256k tokens, which is suitable for application scenarios that need to process a large amount of context information.
Multi-platform deployment :

Supports deployment via the mistral-inference SDK, which relies on the reference implementation in Mamba’s GitHub repository.
It can also be deployed via TensorRT-LLM, with plans to provide local inference support in llama.cpp.

Compared with other open source models, Codestral Mamba’s performance is as follows:

CodeGemma 1.1 7B : Codestral Mamba performs better in most tests, especially in the HumanEval and HumanEval C++ tests.
CodeLlama 7B : Codestral Mamba significantly outperforms CodeLlama 7B, especially in the HumanEval and MBPP benchmarks.
DeepSeek v1.5 7B : Although DeepSeek performs better on some benchmarks, overall Codestral Mamba performs better in HumanEval and HumanEval C++.
Codestral 22B : Compared to the larger Codestral 22B, the Mamba fell slightly behind in some tests, but still performed well in HumanEval and HumanEval Bash.
CodeLlama 34B : Codestral Mamba outperformed CodeLlama 34B in most tests.

Official introduction: https://mistral.ai/news/codestral-mamba/

Model download: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

Features of Mathstral:

Efficient Mathematical Reasoning :

Designed for handling advanced mathematical problems with complex, multi-step logical reasoning, it excels in mathematics and science and can handle complex multi-step reasoning problems such as mathematical proofs and complex scientific calculations.

Large context window :

With a 32k context window, it can process and understand a wider range of input information, which is very useful for complex problems and long text reasoning.

Advanced Performance :

It performs well on various industry-standard benchmarks, such as 56.6% in the MATH test and 63.47% in the MMLU test.
Mathstral 7B can improve its score to 68.37% in the MATH test when using the majority voting method, and 74.59% when using the strong reward model among 64 candidates.

Model architecture :

Mathstral 7B is built on the basis of Mistral 7B, inheriting its strong basic capabilities and architectural advantages. The model has 7B parameters,

Customization and fine-tuning capabilities :

Users can deploy and fine-tune models to meet specific needs through mistral-inference and mistral-finetune tools.
Providing flexible fine-tuning capabilities, users can optimize the model according to specific application scenarios.