ML | Ollama
Ollama
Ollama是一個開源工具,旨在直接在本地機器上運行大型語言模型(LLMs),增強用戶的隱私和性能。它使開發者、研究人員和企業能夠利用強大的人工智能模型,而無需依賴基於雲的解決方案,從而保持對數據的完全控制並降低與外部伺服器相關的潛在安全風險。
Ollama的主要特點
本地執行:Ollama使得在本地執行LLMs成為可能,這減輕了隱私問題並增強了數據安全性。這意味著用戶不需要將敏感信息上傳到雲端,確保所有處理都在其設備上進行。
廣泛的模型庫:該平台支持多種預訓練模型,包括流行的LLaMA 2和Code Llama。用戶可以輕鬆選擇適合特定任務的模型,確保人工智能應用的多樣性。
自定義和微調:Ollama允許用戶根據需求自定義和微調語言模型,包括提示工程和少量學習,使輸出更符合用戶目標。
無縫集成:該工具與各種編程語言和框架良好集成,使開發者能夠輕鬆將LLMs納入其項目中。
無使用限制:與許多在線人工智能服務施加使用上限不同,Ollama不限制生成文本的量,為用戶提供了更大的靈活性。
Ollama的應用
Ollama可以在多個領域中使用,包括:
- 聊天機器人和虛擬助手:通過智能自動回應增強客戶服務體驗。
- 代碼生成和輔助:通過生成代碼片段或提供調試幫助來簡化開發工作流程。
- 自然語言處理:促進翻譯、摘要和內容生成等任務。
- 研究和知識發現:分析大型數據集以提取見解或生成假設。
總之,Ollama是一個強大的本地運行LLMs的工具,在隱私、性能和自定義方面提供顯著優勢。它能夠在不依賴雲基礎設施的情況下運作,使其對於關心數據安全的人士在使用先進人工智能技術時特別具吸引力。
Ollama is a popular open-source command-line tool and engine that allows you to download quantized versions of the most popular LLM chat models.
Ollama is a separate application that you need to download first and connect to. Ollama supports both running LLMs on CPU and GPU.
本地部署
Ollama
我們可以透過Ollama來進行安裝
- Ollama 官方版:https://ollama.com/
- Mac
- Apple Silicon 可用
- 如果是 Mac, 下載後解開執行,簡單照著指示放到應用程式即可。雖然是應用程式,不過實際要跑模型的時候是用命令列
- Linux
curl -fsSL https://ollama.com/install.sh | sh
# Needsudo
priviledge- Enable installation without root priviledge
- https://github.com/ollama/ollama/releases & e.g. `wget https://github.com/ollama/ollama/releases/download/v0.5.7/ollama-linux-amd64.tgz'
- ollama-linux-amd64.tgz
./ollama serve &
source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
./ollama run llama2
- The model is stored on
$HOME/.ollama
- https://github.com/ollama/ollama/releases & e.g. `wget https://github.com/ollama/ollama/releases/download/v0.5.7/ollama-linux-amd64.tgz'
- Ollama 是一個開源軟體,讓使用者可以在自己的硬體上運行、創建和分享大型語言模型服務。這個平台適合希望在本地端運行模型的使用者,因為它不僅可以保護隱私,還允許用戶透過命令行介面輕鬆地設置和互動。Ollama 支援包括 Llama 2 和 Mistral 等多種模型,並提供彈性的客製化選項,例如從其他格式導入模型並設置運行參數。
- Mac
- Web UI 控制端: Page Assist - A Web UI for Local AI Models | Chrome Extension






1 |
|
1 |
|
1 |
|
ollama where is model stored
- macOS:
~/.ollama/models
- Linux:
/usr/share/ollama/.ollama/models
- Windows:
C:\Users<username>.ollama\models
1 |
|
WebUI
- Web UI 控制端: Page Assist - A Web UI for Local AI Models | Chrome Extension
- Open WebUI
micromamba install python=3.11
pip install open-webui
open-webui serve
# you can access at http://localhost:8080
1 |
|
ollama serve
on server (default:11434 port)source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
- ssh tunnel to server (like
ssh -N -L 11434:localhost:11434 10.4.7.1
) http://localhost:11434/
--> showOllama is running
http://localhost:8080/
enters Open-WebUI page (port=8080).




Installation with Default Configuration
If Ollama is on your computer, use this command:
1
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
If Ollama is on a Different Server, use this command:
To connect to Ollama on another server, change the
OLLAMA_BASE_URL
to the server's URL:1
docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
To run Open WebUI with Nvidia GPU support, use this command:
1
docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda
Installation for OpenAI API Usage Only
If you're only using OpenAI API, use this command:
1
docker run -d -p 3000:8080 -e OPENAI_API_KEY=your_secret_key -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Installing Open WebUI with Bundled Ollama Support
This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup:
With GPU Support: Utilize GPU resources by running the following command:
1
docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
For CPU Only: If you're not using a GPU, use this command instead:
1
docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
Both commands facilitate a built-in, hassle-free installation of both Open WebUI and Ollama, ensuring that you can get everything up and running swiftly.
After installation, you can access Open WebUI at http://localhost:3000.
Run
1 |
|
ollama pull <名字>
只下載模型不跑ollama list
可顯示本機下載了哪些模型ollama rm <名字>
可刪除下載的模型
模型儲存 在 ~/.ollama/models/
; log 也在 ~/.ollama/logs/server.log
Stop Ollama
「這什麼蠢問題?不是 ctrl-c 或是 ctrl-d 就好了嗎?」
- 其實 ollama run 「結束」以後,服務還在 port 照 bind,可以 ps 看到。雖然一段時間沒跑模型以後,模型就會從記憶體裡卸載,但如果有潔癖的話,可以在 Mac menu bar 右上角看到 Ollama 的圖示,點下去 Quit 就行
- Linux
- Identify the Process:
ps aux | grep ollama
pkill ollama
orkill -9 <PID>
- Identify the Process:
Remove Ollama
- Stop the Ollama Service
- Remove Ollama or
1
2
3
4
5
6systemctl stop ollama.service # Stop the service
sudo apt remove ollama # Remove (Debian/Ubuntu)
sudo dnf remove ollama # Remove (Fedora/RHEL)
sudo snap remove ollama # Remove (Snap)
brew uninstall ollama # Remove (Homebrew)
rm -rf ~/.ollama # Remove configuration files (optional)1
2sudo rm /usr/local/bin/ollama # Adjust the path as necessary
rm -rf ~/.ollama # Remove configuration files
Ollama-library
The Llama 3.2 1B and 3B models support context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge. These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.
Model examples
LLaMA (Large Language Model Meta AI)
LLaMA (Large Language Model Meta AI) is a family of large language models developed by Meta AI, with its initial release in February 2023. The latest version, LLaMA 3.3, was launched in December 2024. (The largest model, LLaMA 3.1 with 405 billion parameters, requires approximately 854 GB of memory without quantization. With techniques like 8-bit quantization, this can be reduced to around 427 GB, though it still demands substantial computational resources.)
Llama 3.2 1B and 3B models:
Model | Total Parameters | Context Length | Memory Requirements (GB) |
---|---|---|---|
Llama 3.2 1B | 1 billion | 128,000 tokens | BF16/FP16: ~2.5 GB |
FP8: ~1.25 GB | |||
INT4: ~0.75 GB | |||
Llama 3.2 3B | 3 billion | 128,000 tokens | BF16/FP16: ~6.5 GB |
FP8: ~3.2 GB | |||
INT4: ~1.75 GB |
Key Features:
- Multilingual Support: Trained on up to 9 trillion tokens, supporting languages like English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Optimized for Efficiency: Designed for on-device applications such as prompt rewriting and knowledge retrieval.
- High Performance: Outperforms many existing open-access models of similar sizes and is competitive with larger models.
Customize a model (.gguf)
Import from GGUF
Ollama supports importing GGUF models in the Modelfile:
Create a file named
Modelfile
, with aFROM
instruction with the local filepath to the model you want to import.1
FROM ./vicuna-33b.Q4_0.gguf
Create the model in Ollama
1
ollama create example -f Modelfile
Run the model
1
ollama run example