Skip to content
Home » Ollama models that can be run on a laptop

Ollama models that can be run on a laptop

Running large language models locally on a laptop is becoming increasingly feasible, and Ollama makes it accessible. The key to a good experience is choosing a model that matches your laptop’s hardware, primarily its RAM and whether it has a dedicated GPU.

Here is a breakdown of Ollama models that can be run on different categories of laptops, from basic machines to those with more powerful specifications.

For Basic Laptops (8GB of RAM, Integrated Graphics)

These laptops are suitable for smaller, highly efficient models. The performance will be slower, relying on the CPU, but perfectly usable for many tasks.

Model NameParametersTypical Size (GB)Key Characteristics
phi-3:mini3.8 Billion~2.3 GBExcellent performance for its size, often matching larger models on certain benchmarks. A great starting point.
gemma:2b2 Billion~1.4 GBA lightweight and capable model from Google, ideal for devices with limited resources.
llama3.2:3b3 Billion~5-6 GBA smaller version of the powerful Llama 3 series, offering good general capabilities.
tinyllama1.1 Billion~0.6 GBOne of the smallest and fastest models, suitable for the most resource-constrained systems.
qwen:0.5b0.5 Billion~0.3 GBAn extremely lightweight model from Alibaba’s Qwen family, good for very basic tasks.

Recommendation: Start with phi-3:mini. It offers a fantastic balance of performance and resource usage for laptops without a dedicated GPU.


For Modern Laptops (16GB of RAM, Integrated or Basic Dedicated Graphics)

With 16GB of RAM, you can comfortably run the popular and highly capable 7-billion-parameter models. These offer a significant jump in reasoning and instruction-following capabilities.

Model NameParametersTypical Size (GB)Key Characteristics
llama3:8b8 Billion~4.7 GBThe latest and most capable model in its class from Meta. Highly recommended for general use.
mistral:7b7.3 Billion~4.1 GBA very popular and efficient model known for its speed and strong performance.
gemma:7b7 Billion~4.8 GBA well-balanced model from Google that provides a good blend of performance and resource requirements.
codellama:7b7 Billion~3.8 GBSpecialized for code generation and assistance. A must-have for developers.
llava:7b7 Billion~4.1 GBA multimodal model that can understand both text and images.

Recommendation: llama3:8b is currently the top performer in this category for general chat and instruction following.


For Laptops with a Dedicated GPU (8GB+ VRAM) and 16GB+ RAM

If your laptop has a dedicated NVIDIA or AMD graphics card, you can leverage GPU acceleration for significantly faster performance. You can also run larger and more capable models.

Model NameParametersTypical Size (GB)Key Characteristics
phi-3:medium14 Billion~7.9 GBA more powerful version of Phi-3 that can run well on GPUs with around 8GB of VRAM.
codellama:13b13 Billion~7.4 GBA more capable version of the code generation model for more complex programming tasks.
mixtral:8x7b46.7 Billion~26 GBA high-performance “Mixture-of-Experts” model. While large, it can be partially offloaded to a GPU with sufficient VRAM, with the rest handled by the CPU and RAM. Requires 32GB+ of system RAM for a decent experience.
llama3:70b70 Billion~40 GBOne of the most powerful open models available. Running this on a laptop is challenging and would require a high-end mobile GPU with significant VRAM (16GB+) and a large amount of system RAM (at least 64GB).

Recommendation: For a powerful laptop with a good GPU, phi-3:medium or codellama:13b will provide a very responsive and capable experience.

Important Considerations:

  • Quantization: All the sizes mentioned are for quantized models (typically 4-bit), which are compressed to use less RAM and VRAM. This is what makes running them on laptops possible.
  • Performance: The speed at which a model generates text (tokens per second) will be significantly higher on a laptop with a dedicated GPU. On CPU-only systems, expect a slower but still functional response.
  • Getting Started: To run any of these models, you’ll first need to install Ollama. Then, in your terminal, simply type ollama run <model_name>. For example: ollama run llama3:8b.

Leave a Reply

error: Content is protected !!