Ollama Integration

Bilge provides first-class support for Ollama, allowing you to run a fully local AI coding copilot without any external API calls. This is ideal for privacy-sensitive work, offline development, or reducing API costs.


Setup

1. Install Ollama

Follow the instructions at ollama.ai to install Ollama on your system.

2. Pull a Model

# Recommended models for coding tasks
ollama pull qwen3
ollama pull llama3.1
ollama pull codellama

3. Start Bilge

using Bilge

bilge(ollama=true, model="qwen3")

Verifying the Connection

Before starting Bilge, you can check that Ollama is running:

using Bilge

# Check if Ollama server is reachable
check_ollama_connection()  # Returns true/false

# List available models
list_ollama_models()  # Returns Vector{String}
Ollama Must Be Running

Ollama runs as a background service. If check_ollama_connection() returns false, start Ollama with:

ollama serve

Model Selection

ModelSizeStrengths
qwen38BStrong tool calling, good reasoning
qwen3-coder30B+Specialized for code generation
llama3.18B/70BGeneral-purpose, good tool support
codellama7B/13B/34BCode-focused, fast
deepseek-coder-v2VariousStrong code generation
Model Size vs Quality

Larger models generally produce better results but require more RAM and are slower. For most coding tasks, 8B-30B models offer a good balance of quality and speed.


Protocol Options

Ollama supports two API protocols. Bilge works with both.

Native API (Default)

Uses Ollama's native /api/chat endpoint:

bilge(ollama=true, model="qwen3")
# Equivalent to:
bilge(ollama=true, model="qwen3", use_openai_compat=false)

OpenAI-Compatible API

Uses Ollama's /v1/chat/completions endpoint:

bilge(ollama=true, model="qwen3", use_openai_compat=true)
When to Use OpenAI-Compatible Mode

The native API is recommended for most cases. Use OpenAI-compatible mode if:

  • The model has better tool-calling support through the OpenAI format
  • You're debugging protocol issues
  • You want consistent behavior with OpenAI backends

Remote Ollama Server

You can connect to an Ollama instance running on a different machine:

bilge(
    ollama = true,
    model = "qwen3",
    host = "http://192.168.1.100:11434"
)

Or using the programmatic API:

config = BilgeConfig(
    ollama = OllamaConfig(
        model = "qwen3",
        host = "http://192.168.1.100:11434"
    )
)

agent = BilgeAgent(config, pwd())

Utility Functions

check_ollama_connection

check_ollama_connection(; host="http://localhost:11434") -> Bool

Checks if the Ollama server is running and reachable. Uses a 5-second connection timeout.

list_ollama_models

list_ollama_models(; host="http://localhost:11434") -> Vector{String}

Returns a list of model names available on the Ollama server.

julia> list_ollama_models()
3-element Vector{String}:
 "qwen3:latest"
 "llama3.1:latest"
 "codellama:latest"

Troubleshooting

Problem: HTTP.IOError or connection refused

  • Solution: Make sure Ollama is running (ollama serve)

Problem: Model not found

  • Solution: Pull the model first (ollama pull model-name)

Problem: Slow responses

  • Solution: Use a smaller model or ensure sufficient RAM. Ollama requires roughly the model size in RAM (e.g., 8GB for an 8B model).

Problem: Poor tool calling

  • Solution: Some models have limited tool-calling ability. Try qwen3 or llama3.1 which have strong support.