Docker Model Runner: Run AI Models Locally with Seamless Integration

In the fast-paced world of AI development, tools that simplify the process of running and integrating AI models locally are in high demand. Docker’s latest beta feature, Model Runner, offers developers a seamless way to work with AI models directly within their existing Docker environment. This article explores the features, benefits, and practical applications of Docker Model Runner, making it an essential resource for developers looking to optimize their AI workflows.

What is Docker Model Runner?

Docker Model Runner is a beta feature designed for Docker Desktop users, enabling them to download, run, and manage AI models locally. By pulling models from Docker Hub, storing them locally, and loading them into memory only when needed, it optimizes system resources. For developers already familiar with Docker’s containerization tools, Model Runner provides a streamlined experience with OpenAI-compatible APIs for easy integration into applications.

Key Benefits of Docker Model Runner

Local AI model management: Pull, run, and remove models directly from the command line.
Resource optimization: Models are only loaded at runtime and unloaded when idle.
OpenAI-compatible APIs: Simplify integration with existing applications.
Familiar workflows: Leverage Docker commands you already know.

Key Features of Docker Model Runner

To make the most of Docker Model Runner, here are its standout features:

Pull AI models directly from Docker Hub.
Run models locally using simple commands.
Manage local models with options to add, list, or remove them.
Interact with models via prompts or chat mode.
Optimize resource usage, ensuring efficient memory management.
Access OpenAI-compatible APIs, enabling seamless integration into your applications.

These features make Docker Model Runner a game-changer for developers aiming to build custom AI assistants or agents.

How to Get Started with Docker Model Runner

Prerequisites

To start using Docker Model Runner, you’ll need:

Docker Desktop version 4.40 or later
A Mac with Apple Silicon (currently supported)
Beta features enabled in Docker Desktop under “Features in development”

Basic Commands

Here’s a quick guide to essential commands:

# Check if Model Runner is active
docker model status

# Pull a model from Docker Hub
docker model pull ai/smollm2

# List downloaded models
docker model list

# Run a model with a single prompt
docker model run ai/smollm2 "What is Kubernetes?"

# Remove a model
docker model rm ai/smollm2

These commands allow you to efficiently manage and interact with AI models directly from your terminal.

Building AI Assistants with Docker Model Runner

One of the most exciting use cases for Docker Model Runner is building custom AI assistants. Developers can integrate these assistants into applications using OpenAI-compatible APIs. Here’s how you can access these APIs:

From Within Containers:

   #!/bin/sh

   curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
         "model": "ai/smollm2",
         "messages": [
               {
                  "role": "system",
                  "content": "You are a helpful assistant."
               },
               {
                  "role": "user",
                  "content": "What is Kubernetes?"
               }
         ]
      }'

From the Host (Unix Socket):

   #!/bin/sh

   curl --unix-socket $HOME/.docker/run/docker.sock \
      localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
         "model": "ai/smollm2",
         "messages": [
               {
                  "role": "system",
                  "content": "You are a helpful assistant."
               },
               {
                  "role": "user",
                  "content": "What is Kubernetes?"
               }
         ]
      }'

From the host using TCP:

If you prefer to interact with the API directly from your host machine using TCP rather than a Docker socket, you can enable this functionality. TCP support can be activated either through the Docker Desktop graphical interface or by using the Docker Desktop command line with sh docker desktop enable model-runner --tcp <port>

Once TCP support is enabled, you can communicate with the API through localhost using either your specified port number or the default port, following the same request format shown in the previous examples.

   #!/bin/sh

   curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
         "model": "ai/smollm2",
         "messages": [
               {
                  "role": "system",
                  "content": "You are a helpful assistant."
               },
               {
                  "role": "user",
                  "content": "What is Kubernetes?"
               }
         ]
      }'

For hands-on examples, check out the official https://github.com/docker/hello-genai.git repository on GitHub. It includes sample applications in Python, Node.js, and Go.

Where to Find Models

Docker provides an extensive collection of pre-trained AI models on its Gen AI Catalog at https://hub.docker.com/catalogs/gen-ai. Popular options include:

SmolLM2: Tiny LLM built for speed, edge devices, and local development.
Llama Models: Available in various sizes for different use cases.
Other optimized models tailored for specific applications.

This centralized hub simplifies finding and deploying the right model for your needs.

Known Limitations

While Docker Model Runner shows great promise, it’s important to note some current limitations:

Lack of safeguards for oversized models that may exceed system resources.
Chat interface may still launch even if the model pull fails.
Progress reporting during model pulls can be inconsistent.

These issues are expected to improve as the feature evolves beyond its beta phase.

Why Choose Docker Model Runner?

For developers already working within the Docker ecosystem, Model Runner offers several compelling advantages:

Unified platform: Manage both containers and AI models in one environment.
Familiar commands: No steep learning curve for existing Docker users.
Resource efficiency: Load models only when needed to save memory.
Seamless integration: Easily connect AI capabilities to your applications via OpenAI-compatible APIs.

By leveraging these benefits, developers can enhance their productivity while simplifying their workflows.

Conclusion

As artificial intelligence becomes increasingly integral to modern applications, tools like Docker Model Runner are paving the way for more accessible and efficient development processes. With its ability to integrate seamlessly into existing workflows while optimizing resource usage, this beta feature holds immense potential for developers and DevOps engineers alike.

Start exploring Docker Model Runner today and take your AI development workflow to the next level!

Published on: April 07, 2025

Learn more

Home | Joseph Velliah

Fulfilling God’s purpose for my life

Blog image