ML | DeepSeek

DeepSeek (深度求索),全稱杭州深度求索人工智能基礎技術研究有限公司,是中國的一家人工智能與大型語言模型公司。該公司的總部位於中國大陸浙江省杭州市,由中資對沖基金幻方量化創立,創始人和行政總裁為梁文鋒。DeepSeek因其在推理任務上的表現與OpenAI的ChatGPT相當,但開發成本和資源消耗卻僅為其一小部分而受到廣泛關注。

DeepSeek Model

DeepSeek-R1 and DeepSeek-V3 are advanced language models developed by DeepSeek-AI, leveraging a Mixture of Experts (MoE) architecture to optimize performance and resource efficiency.

Model #Total Params #Activated Params Context Length Download VRAM FP8
DeepSeek-R1-Zero 671B 37B 128k https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero at least 800 GB
DeepSeek-R1 671B 37B 128k https://huggingface.co/deepseek-ai/DeepSeek-R1 at least 800 GB
DeepSeek-V3 671B at least 800 GB

DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository. (DeepSeek-V3-671B, 671 billion, VRAM (FP16) ~1,543 GB, VRAM (4-bit Quantization) ~386 GB)

  • FP16 Precision: Higher VRAM GPUs or multiple GPUs are required due to the larger memory footprint.
  • 4-bit Quantization: Lower VRAM GPUs can handle larger models more efficiently, reducing the need for extensive multi-GPU setups.
  • Lower Spec GPUs: Models can still be run on GPUs with lower specifications than the above recommendations, as long as the GPU is equal or more than VRAM requirements. However, the setup would not be optimal and likely requires some tuning, such as adjusting batch sizes and processing settings.

本地部署

Ollama

我們可以透過Ollama來進行安裝

  • Ollama 官方版:https://ollama.com/
    • curl -fsSL https://ollama.com/install.sh | sh
    • Ollama 是一個開源軟體,讓使用者可以在自己的硬體上運行、創建和分享大型語言模型服務。這個平台適合希望在本地端運行模型的使用者,因為它不僅可以保護隱私,還允許用戶透過命令行介面輕鬆地設置和互動。Ollama 支援包括 Llama 2 和 Mistral 等多種模型,並提供彈性的客製化選項,例如從其他格式導入模型並設置運行參數。
  • Web UI 控制端: Page Assist - A Web UI for Local AI Models | Chrome Extension

ollama where is model stored

  • macOS: ~/.ollama/models
  • Linux: /usr/share/ollama/.ollama/models
  • Windows: C:\Users<username>.ollama\models
1
2
3
4
5
6
$ ls ~/.ollama
history id_ed25519 id_ed25519.pub logs models

[~/.ollama/models]$ du -sh ./*
39G ./blobs
24K ./manifests

Examples

  • The Llama 3.2 1B and 3B models support context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge. These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.

Distilled models

DeepSeek team has demonstrated that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models.

Below are the models created via fine-tuning against several dense models widely used in the research community using reasoning data generated by DeepSeek-R1. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 1.5B Qwen DeepSeek R1
ollama run deepseek-r1:1.5b

#7B Qwen DeepSeek R1
ollama run deepseek-r1:7b

# 8B Llama DeepSeek R1
ollama run deepseek-r1:8b

# 14B Qwen DeepSeek R1
ollama run deepseek-r1:14b

# 32B Qwen DeepSeek R1
ollama run deepseek-r1:32b

# 70B Llama DeepSeek R1
ollama run deepseek-r1:70b

References

  1. 本地部署 DeepSeek-R1 大模型!免费开源,媲美OpenAI-o1能力
  2. ollama | deepseek-r1
  3. huggingface | DeepSeek-R1-Distill-Qwen-1.5B
  4. Day 03】Ollama UI 本機建置
  5. Llama 3.2: Revolutionizing edge AI and vision with open, customizable models
  6. ollama | llama3.2

ML | DeepSeek
https://waipangsze.github.io/2025/01/30/ML-DeepSeek/
Author
wpsze
Posted on
January 30, 2025
Updated on
February 3, 2025
Licensed under