Running large language models locally on a laptop is becoming increasingly feasible, and Ollama makes it accessible. The key to a good experience is choosing a model that matches your laptop’s hardware, primarily its RAM and whether it has a dedicated GPU.
Here is a breakdown of Ollama models that can be run on different categories of laptops, from basic machines to those with more powerful specifications.
For Basic Laptops (8GB of RAM, Integrated Graphics)
These laptops are suitable for smaller, highly efficient models. The performance will be slower, relying on the CPU, but perfectly usable for many tasks.
Model Name | Parameters | Typical Size (GB) | Key Characteristics |
phi-3:mini | 3.8 Billion | ~2.3 GB | Excellent performance for its size, often matching larger models on certain benchmarks. A great starting point. |
gemma:2b | 2 Billion | ~1.4 GB | A lightweight and capable model from Google, ideal for devices with limited resources. |
llama3.2:3b | 3 Billion | ~5-6 GB | A smaller version of the powerful Llama 3 series, offering good general capabilities. |
tinyllama | 1.1 Billion | ~0.6 GB | One of the smallest and fastest models, suitable for the most resource-constrained systems. |
qwen:0.5b | 0.5 Billion | ~0.3 GB | An extremely lightweight model from Alibaba’s Qwen family, good for very basic tasks. |
Recommendation: Start with phi-3:mini
. It offers a fantastic balance of performance and resource usage for laptops without a dedicated GPU.
For Modern Laptops (16GB of RAM, Integrated or Basic Dedicated Graphics)
With 16GB of RAM, you can comfortably run the popular and highly capable 7-billion-parameter models. These offer a significant jump in reasoning and instruction-following capabilities.
Model Name | Parameters | Typical Size (GB) | Key Characteristics |
llama3:8b | 8 Billion | ~4.7 GB | The latest and most capable model in its class from Meta. Highly recommended for general use. |
mistral:7b | 7.3 Billion | ~4.1 GB | A very popular and efficient model known for its speed and strong performance. |
gemma:7b | 7 Billion | ~4.8 GB | A well-balanced model from Google that provides a good blend of performance and resource requirements. |
codellama:7b | 7 Billion | ~3.8 GB | Specialized for code generation and assistance. A must-have for developers. |
llava:7b | 7 Billion | ~4.1 GB | A multimodal model that can understand both text and images. |
Recommendation: llama3:8b
is currently the top performer in this category for general chat and instruction following.
For Laptops with a Dedicated GPU (8GB+ VRAM) and 16GB+ RAM
If your laptop has a dedicated NVIDIA or AMD graphics card, you can leverage GPU acceleration for significantly faster performance. You can also run larger and more capable models.
Model Name | Parameters | Typical Size (GB) | Key Characteristics |
phi-3:medium | 14 Billion | ~7.9 GB | A more powerful version of Phi-3 that can run well on GPUs with around 8GB of VRAM. |
codellama:13b | 13 Billion | ~7.4 GB | A more capable version of the code generation model for more complex programming tasks. |
mixtral:8x7b | 46.7 Billion | ~26 GB | A high-performance “Mixture-of-Experts” model. While large, it can be partially offloaded to a GPU with sufficient VRAM, with the rest handled by the CPU and RAM. Requires 32GB+ of system RAM for a decent experience. |
llama3:70b | 70 Billion | ~40 GB | One of the most powerful open models available. Running this on a laptop is challenging and would require a high-end mobile GPU with significant VRAM (16GB+) and a large amount of system RAM (at least 64GB). |
Recommendation: For a powerful laptop with a good GPU, phi-3:medium
or codellama:13b
will provide a very responsive and capable experience.
Important Considerations:
- Quantization: All the sizes mentioned are for quantized models (typically 4-bit), which are compressed to use less RAM and VRAM. This is what makes running them on laptops possible.
- Performance: The speed at which a model generates text (tokens per second) will be significantly higher on a laptop with a dedicated GPU. On CPU-only systems, expect a slower but still functional response.
- Getting Started: To run any of these models, you’ll first need to install Ollama. Then, in your terminal, simply type
ollama run <model_name>
. For example:ollama run llama3:8b
.