The Benefits of Running LLMs Locally: A Look at Ollama and LMStudio

Before we go into Ollama and LMStudio, let’s briefly discuss why you would want to use Large-language Models(LLMs) locally or host it on our own infrastructure.

Note that I’ll interchangeably refer to LLMs as models in this article.

Why do you want to run LLMs locally?

No Internet required. If you are using AI chat in a place without internet in the rural areas, you can still use it.
Chat with Sensitive Data. Since the models are run in your computer or your hosting service, you can chat with sensitive or confidential information without worrying.
Alternative to online Chat Models. If ChatGPT is down, these are good alternatives. If ChatGPT is down and you are really dependent on it, it can affect your work productivity. So it’s better to have a backup plan.
Application Integration. If you want or need to integrate AI with your own application using an API, running models locally.
Save costs. You need to pay for OpenAI APIs or other online models. If you’re just running an AI integration for your business workflows through backend processing which is not required real time, you can use this approach to save costs.
Flexibility. You can use any model you want for each specific use case. You can also design your system to query multiple models if you’d like to get the best results. Since some models are tuned for specific expertise, you can take advantage of it. You are not tied to a single model.
Control. Since you are running on your own infra, you can control your LLM APIs security and scalability.

Things to consider when running LLMs locally

The models will run locally so it will use your computer’s resources — CPU, GPU, RAM, Storage (models are around 4–5gb in size)
Self-hosting. If you want to use it for its APIs, you need to run it in your own self-hosted server.
Maintenance, you need to maintain the model version and your system’s OS and software and ensure it is up to date.

Based on my reseearch, there are 2 tools that stand out in terms or running LLMs locally. It’s Ollama and LMStudio. Let’s take a look at both!

Ollama

Ollama is a tool which allows you to run LLMs (large language models) in your computer. It supports MacOS, Window and Linux.

You can download Ollama here https://ollama.com/

Ollama supports different models and you can see the list in their website.
https://ollama.com/library

If you’re interested in the source code, you can check this out.
https://github.com/ollama/ollama

Features of Ollama

Search and Download Models Ollama allows you to search for and download various pre-trained LLMs from popular open-source repositories.
Chat with a model locally You can use Ollama to chat with a downloaded LLM model locally on your machine, without the need for an internet connection or external API calls.
Command-Line Interface (CLI) Ollama provides a CLI that allows you to interact with LLMs directly from the terminal.
APIs Ollama offers APIs that enable you to integrate LLMs into your applications programmatically.
Popular open-source models available Ollama supports a variety of popular open-source LLMs, such as llama3, phi3, mistral, gemma and others, which can be easily downloaded and used.
Open-Source Chat UI A thirdy party tool provides an open-source chat user interface (UI) called “nextjs-ollama-llm-ui”. This UI allows you to easily set up a chat interface for interacting with the downloaded LLMs. You can download it from its Github repo. https://github.com/jakobhoeg/nextjs-ollama-llm-ui
Best for hosting a Chat API Ollama is particularly well-suited for hosting a chat API service, allowing you to deploy and serve LLMs as APIs for chatbot applications or other use cases that require conversational interactions.

LMStudio

source: photo from LMStudio

LM Studio is a desktop application that allows you to run large language models (LLMs) locally on your computer.

You can download LMStudio here
https://lmstudio.ai/

Features of LMStudio

Built-in user interface via the Desktop App
LMStudio provides a desktop application that includes a user interface for interacting with LLMs. This allows you to use the models without the need for additional coding or setup.
Search and Download Models
Similar to Ollama, LMStudio allows you to search for and download various pre-trained LLMs from popular open-source repositories.
Downloading of model using URL (gguf support)
LMStudio supports downloading models using URLs, including support for the GGML file format (gguf), which is a compressed format used for efficient storage and distribution of LLMs. You can download models from HuggingFace using its URL.
Chat with model locally
Like Ollama, LMStudio enables you to chat with downloaded LLMs locally on your machine, without relying on external services or APIs.
Multi-model chat
One unique feature of LMStudio is the ability to prompt multiple models simultaneously in a single chat session. This allows you to compare and leverage the strengths of different models for a given task.
Command-Line Interface (CLI)
LMStudio provides a CLI tool called lms that allows you to interact with LLMs directly from the terminal.
APIs
Similar to Ollama, LMStudio offers APIs that enable you to integrate LLMs into your applications programmatically.
Popular open-source models available
Like Ollama, LMStudio supports a variety of popular open-source LLMs, making it easy to download and use these models.

Wrapping it up

Both Ollama and LMStudio are excellent open-source tools for running large language models locally on your machine or self-hosted infrastructure. While they share some core features like model search and download, local chat, CLI, and API support, each tool offers unique capabilities.

Ollama shines with its focus on hosting a chat API service, making it well-suited for integrating LLMs into conversational applications or enriching your business workflow with AI.

On the other hand, LMStudio’s standout feature is its desktop app with a built-in user interface and the ability to chat with multiple models simultaneously, allowing for easy comparison and leveraging of different model strengths. This can be used by companies as a great alternative to ChatGPT or Claude AI.

Ultimately, the choice between Ollama and LMStudio will depend on your specific requirements, whether it’s hosting a chat API, leveraging a user-friendly desktop interface, or taking advantage of multi-model chat capabilities. Regardless of your preference, both tools empower you to harness the power of large language models locally, opening up a world of possibilities for offline AI applications, cost savings, data privacy, and customized model integration.

Share your thoughts in the comments section.

You can also Follow me in LinkedIn and my Twitch where I livestream Tech and AI.

The Benefits of Running LLMs Locally: A Look at Ollama and LMStudio

Why do you want to run LLMs locally?

Things to consider when running LLMs locally

Ollama

Features of Ollama

LMStudio

Wrapping it up

Comments

More from this blog

Using the Claude Agent SDK for Non-Coding Workflows

Building a Java API connecting to LLMs with Spring AI and Ollama local models

Java-Based AI Solutions for Enterprises: Viability, Use Cases, and Market Outlook

9 Ways How AI can be Useful for Emails

Introducing OpenAI o3-mini: A New Frontier in Cost-Effective STEM Reasoning

Command Palette

Why do you want to run LLMs locally?

Things to consider when running LLMs locally

Ollama

Features of Ollama

LMStudio

Wrapping it up

Comments

More from this blog