Skip to content

Commit

Permalink
add wechat blog url
Browse files Browse the repository at this point in the history
  • Loading branch information
ymcui committed Mar 6, 2024
1 parent a0e4b8c commit 7f85255
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

本项目基于Mistral.ai发布的[Mixtral模型](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)进行开发,该模型使用了稀疏混合专家模型(Sparse MoE)架构。本项目利用大规模中文无标注数据进行了中文增量训练,得到了**中文Mixtral**基础模型,并且进一步通过指令精调,得到了**中文Mixtral-Instruct**指令模型。该模型原生支持**32K上下文(实测可达128K)**,能够有效地处理长文本,同时在数学推理、代码生成等方面获得了显著性能提升。使用llama.cpp进行量化推理时,最低只需16G内存(或显存)。

**技术报告**[[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851)
**技术报告**[[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851) [[论文解读]](https://mp.weixin.qq.com/s?__biz=MzU2NDQ3MTQ0MA==&mid=2247489255&idx=1&sn=432c1c0e40d5e4cfc2a7d4d96ac8e3fc&chksm=fc4b2518cb3cac0e63816d65be9b9e11fb201e3a089550e3cb2cb4d1e8eb532c6d7301688926&token=1150909890&lang=zh_CN#rd)

#### 本项目主要内容

Expand Down
2 changes: 1 addition & 1 deletion README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

This project is developed based on the [Mixtral model](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) released by Mistral.ai, which utilizes a Sparse Mixture of Experts (MoE) architecture. This project involves the use of large-scale Chinese unannotated data for incremental training in Chinese, resulting in the **Chinese Mixtral** base model. Further fine-tuning with instructions led to the creation of the **Chinese Mixtral-Instruct** instruction model. This model natively supports a **32K context (tested up to 128K)** and is capable of effectively processing long texts, while also showing significant performance improvements in areas like mathematical reasoning and code generation. When using llama.cpp for quantized inference, a minimum of only 16GB of memory (or VRAM) is required.

**Paper**: [[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851)
**Paper**: [[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851) [[Blog (in Chinese)]](https://mp.weixin.qq.com/s?__biz=MzU2NDQ3MTQ0MA==&mid=2247489255&idx=1&sn=432c1c0e40d5e4cfc2a7d4d96ac8e3fc&chksm=fc4b2518cb3cac0e63816d65be9b9e11fb201e3a089550e3cb2cb4d1e8eb532c6d7301688926&token=1150909890&lang=zh_CN#rd)

#### Main Contents of This Project

Expand Down

0 comments on commit 7f85255

Please sign in to comment.