Qiskit Code Assistant

Qiskit Code Assistant LLMs aim to make quantum computing more accessible to new Qiskit adopters and to improve the coding experience for current users. It is trained using millions of text tokens from Qiskit SDK, years of Qiskit code examples, and IBM Quantum® features. Qiskit Code Assistant can help your quantum development workflow by offering LLM-generated suggestions based on IBM Granite and other open-source models, which incorporate the latest features and functionalities from IBM®.

Notes

Want to skip to the installation instructions? Go to the Install Qiskit Code Assistant section.
If you have feedback or want to contact the developer team, use the Qiskit Slack Workspace channel or the related public GitHub repositories.

The Large Language Model (LLM) behind Qiskit Code Assistant

To provide code suggestions, Qiskit Code Assistant uses a Large Language Model (LLM). In this case, Qiskit Code Assistant currently relies on the model mistral-small-3.2-24b-qiskit, built on the Mistral-Small-3.2-24B-Qiskit model. The mistral-small-3.2-24b-qiskit model improves the Mistral-Small-3.2-24B-Instruct-2506 model's code generation capabilities for Qiskit through extended pretraining and fine-tuning it on high-quality Qiskit data, as well as Python commits and chat. For more information about the Mistral AI models family, refer to Mistral AI documentation. For more details about the .*-qiskit models, see Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code.

Our LLMs specialized for Qiskit are available also as open-source models. Check all the models available at https://huggingface.co/Qiskit.

The Qiskit HumanEval and Qiskit HumanEval Hard benchmarks

To test the mistral-small-3.2-24b-qiskit and other models, we collaborated with Qiskit Advocates and experts to create the execution-based benchmarks called Qiskit HumanEval (QHE) and Qiskit HumanEval Hard (QHE Hard), and ran them on the models. These benchmarks are similar to HumanEval, including multiple challenging code problems to solve, all based on the official Qiskit libraries.

The benchmarks are composed of approximately 150 tests, each one made from a function definition, followed by a docstring that details the task the model is required to solve. Each example also includes a reference canonical solution, as well as unit tests, to evaluate the correctness of the generated solutions. There are three levels of difficulty for tests: basic, intermediate, and difficult. The Qiskit HumanEval Hard benchmark is a variation of the Qiskit HumanEval one, but removes information related to code imports, so the LLM needs to figure out the right method or class imports. This change makes the dataset much more challenging for LLMs, according to our tests and initial results.

The datasets for Qiskit HumanEval and Qiskit HumanEval Hard are available at these websites: Qiskit HumanEval and Qiskit HumanEval. You can contribute to the development of these benchmarks at the GitHub repository.

Install Qiskit Code Assistant

Learn how to install, configure, and use any of Qiskit Code Assistant models on your local machine.

Download from the Hugging Face website

Follow these steps to download any Qiskit Code Assistant-related model from the Hugging Face website:

Navigate to the desired Qiskit model page on Hugging Face.
Go to the Files and Versions tab and download the safetensors or GGUF model files.

Download using the Hugging Face CLI

To download any of the available Qiskit Code Assistant models using the Hugging Face CLI, follow these steps:

Install the Hugging Face CLI
Log in to your Hugging Face account
```
huggingface-cli login
```

Download the model you prefer from the previous list

huggingface-cli download <HF REPO NAME> <MODEL PATH> --local-dir <LOCAL PATH>

Manually deploy the Qiskit Code Assistant models in local through Ollama

There are multiple ways to deploy and interact with the downloaded Qiskit Code Assistant model. This guide demonstrates using Ollama as follows: either with the Ollama application by using the Hugging Face Hub integration or local model, or with the llama-cpp-python package.

Using the Ollama application

The Ollama application provides a simple solution to run the LLMs locally. It is easy to use, with a CLI that makes the whole setup process, model management, and interaction fairly straightforward. It’s ideal for quick experimentation and for users that want fewer technical details to handle.

Install Ollama

Download the Ollama application
Install the downloaded file
Launch the installed Ollama application

info
The application is running successfully when the Ollama icon appears in the desktop menu bar. You can also verify the service is running by going to http://localhost:11434/.
Try Ollama in your terminal and start running models. For example:
```
ollama run hf.co/Qiskit/Qwen2.5-Coder-14B-Qiskit
```

Set up Ollama using the Hugging Face Hub integration

The Ollama/Hugging Face Hub integration provides a way to interact with models hosted on the Hugging Face Hub without needing to create a new modelfile nor manually downloading the GGUF or safetensors files. The default template and params files are already included for the model on the Hugging Face Hub.

Make sure the Ollama application is running.
Go the desired model page, and copy the URL. For example, https://huggingface.co/Qiskit/Qwen2.5-Coder-14B-Qiskit-GGUF.

From your terminal, run the command:

ollama run hf.co/Qiskit/Qwen2.5-Coder-14B-Qiskit

You can use the hf.co/Qiskit/Qwen2.5-Coder-14B-Qiskit model or any of the other currently recommended GGUF official models hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF or hf.co/Qiskit/granite-3.3-8b-qiskit-GGUF.

Set up Ollama with a manually downloaded Qiskit Code Assistant GGUF model

If you have manually downloaded a GGUF model such as https://huggingface.co/Qiskit/Qwen2.5-Coder-14B-Qiskit-GGUF and you want to experiment with different templates and parameters, you can follow these steps to load it into your local Ollama application.

Create a Modelfile entering the following content and be sure to update <PATH-TO-GGUF-FILE> to the actual path of your downloaded model.

FROM <PATH-TO-GGUF-FILE>
TEMPLATE """{{ if .System }}
System:
{{ .System }}

{{ end }}{{ if .Prompt }}Question:
{{ .Prompt }}

{{ end }}Answer:
```python{{ .Response }}
"""

PARAMETER stop "Question:"
PARAMETER stop "Answer:"
PARAMETER stop "System:"
PARAMETER stop "```"

PARAMETER temperature 0
PARAMETER top_k 1

Run the following command to create a custom model instance based on the Modelfile.
```
ollama create Qwen2.5-Coder-14B-Qiskit -f ./path-to-model-file
```
note
This process may take some time for Ollama to read the model file, initialize the model instance, and configure it according to the specifications provided.

Run the Qiskit Code Assistant model manually downloaded in Ollama

After the Qwen2.5-Coder-14B-Qiskit model has been set up in Ollama, run the following command to launch the model and interact with it in the terminal (in chat mode).

ollama run Qwen2.5-Coder-14B-Qiskit

Some useful commands:

ollama list - List models on your computer
ollama rm Qwen2.5-Coder-14B-Qiskit - Delete the model
ollama show Qwen2.5-Coder-14B-Qiskit - Show model information
ollama stop Qwen2.5-Coder-14B-Qiskit - Stop a model that is currently running
ollama ps - List which models are currently loaded

Manually deploy the Qiskit Code Assistant models in local through the llama-cpp-python package

An alternative to the Ollama application is the llama-cpp-python package, which is a Python binding for llama.cpp. It gives you more control and flexibility to run the GGUF model locally, and is ideal for users who wish to integrate the local model in their workflows and Python applications.

Install llama-cpp-python
Interact with the model from within your application using llama_cpp. For example:

from llama_cpp import Llama

model_path = <PATH-TO-GGUF-FILE>

model = Llama(
        model_path,
        seed=17,
        n_ctx=10000,
        n_gpu_layers=37, # to offload in gpu, but put 0 if all in cpu
    )

input = 'Generate a quantum circuit with 2 qubits'
raw_pred = model(input)["choices"][0]["text"]

You can also add text generation parameters to the model to customize the inference:

generation_kwargs = {
        "max_tokens": 512,
        "echo": False, # Echo the prompt in the output
        "top_k": 1
    }

raw_pred = model(input, **generation_kwargs)["choices"][0]["text"]

Manually deploy the Qiskit Code Assistant models in local through llama.cpp

Use the `llama.cpp` library

Another alternative is to use llama.cpp, an open-source library for performing LLM inference on a CPU with minimal setup. It provides low-level control over the model execution and is typically run from the command line, pointing to a local GGUF model file.

There are several ways to install llama.cpp on your machine:

Install llama.cpp using brew, nix, or winget
Run with Docker: See out the Docker documentation by llama.cpp team
Download pre-built binaries from the releases page
Build from source by cloning this repository

Once installed, you can use llama.cpp to interact with GGUF models in conversation mode as follows:

# Use a local model file
llama-cli -m my_model.gguf -cnv

# Or download and run a model directly from Hugging Face
llama-cli -hf Qiskit/Qwen2.5-Coder-14B-Qiskit-GGUF -cnv

You can also launch an OpenAI-compatible API server for the model in the following way:

llama-server -hf Qiskit/Qwen2.5-Coder-14B-Qiskit-GGUF

Advanced parameters

With the llama-cli program, you can control the model generation using command-line options. For example, you can provide an initial “system” prompt using the -p/--prompt flag. In conversation mode (-cnv), this initial prompt acts as the system message. Otherwise, you can simply prepend any desired instruction to your prompt text. You can also adjust sampling parameters - for instance: temperature (--temp), top-k (--top-k), top-p (--top-p), repetition penalty (--repeat-penalty), and the seed to use (--seed). The following is an example invocation using these options:

llama-cli -hf Qiskit/Qwen2.5-Coder-14B-Qiskit-GGUF \
  -p "You are a friendly assistant." -cnv \
  --temp 0.7 \
  --top-k 50 \
  --top-p 0.95 \
  --repeat-penalty 1.1 \
  --seed 42

To ensure proper functionality of our Qiskit models, we recommend using the system prompt provided in our HF GGUF repositories: system prompt for mistral-small-3.2-24b-qiskit-GGUF, Qwen2.5-Coder-14B-Qiskit-GGUF, granite-3.3-8b-qiskit-GGUF, and granite-3.2-8b-qiskit-GGUF.

Manually connect Continue (VS Code)

Continue (VS Code)

1. Install the extension

Open VS Code, go to Extensions (Cmd+Shift+X), search Continue, install it.

2. Open the config

Click the Continue icon in the sidebar, then click the gear icon, or open the command palette (Cmd+Shift+P) and run Continue: Open Config File.

This opens ~/.continue/config.yaml (or config.json in older versions).

3. Configure the model

Add the following to config.yaml:

models:
  - name: Qiskit Code Assistant
    provider: ollama
    model: mistral-small-3.2-24b-qiskit
    apiBase: http://localhost:11434

This makes the Qiskit model available in the chat panel (sidebar conversations, inline Q&A) and for inline edit commands.

4. Test it

Chat: Open the Continue panel in the sidebar and ask a question (e.g., "How do I create a parameterized circuit in Qiskit?")
Inline edit: Select a block of code, press Cmd+I (Mac) or Ctrl+I (Linux/Windows)

Manually connect Jupyter AI (JupyterLab)

Jupyter AI (JupyterLab)

Note: These instructions cover Jupyter AI v2.x.

1. Install Jupyter AI and the Ollama provider

pip install "jupyter-ai<3" langchain-ollama

The "jupyter-ai<3" pin ensures you get v2.x. The langchain-ollama package is required for Jupyter AI to detect Ollama as a provider. Without it, Ollama will not appear in the settings panel.

Then restart JupyterLab.

2. Configure the chat model

Open JupyterLab and click the chat icon in the left sidebar. In the settings panel:

Under Language model, select Ollama as the provider.
Enter mistral-small-3.2-24b-qiskit as the model name.
No API key is needed for Ollama (leave the field empty).
Click the back arrow to start chatting.

3. Use the `%%ai` magic command

The %%ai magic lets you query the model directly in notebook cells.

%load_ext jupyter_ai_magics

Then in a cell:

%%ai ollama:mistral-small-3.2-24b-qiskit
Write a function that implements Grover's algorithm using Qiskit

4. Custom Ollama host (optional)

By default, Jupyter AI connects to http://127.0.0.1:11434. If your Ollama server runs on a different address or port:

In the chat UI: Set the "Base API URL" field in the AI settings panel.

Manually connect OpenCode (Terminal)

OpenCode (Terminal)

1. Install OpenCode

curl -fsSL https://opencode.ai/install | bash

2. Configure the Qiskit model

Create an opencode.json file in your project root (or ~/.config/opencode/opencode.json for a global config):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "mistral-small-3.2-24b-qiskit": {
          "name": "Qiskit Code Assistant"
        }
      }
    }
  }
}

3. Select the model

Launch OpenCode in your project directory:

opencode

Inside the TUI, run the /models command and select Qiskit Code Assistant from the list.

4. Test it

Ask a question directly in the chat, for example: "Define a Bell circuit and run it using QiskitRuntimeService"

Available models

Current models

These are the latest recommended models for use with Qiskit Code Assistant:

Qiskit/mistral-small-3.2-24b-qiskit - Released October 2025
Qiskit/Qwen2.5-Coder-14B-Qiskit - Released June 2025
qiskit/granite-3.3-8b-qiskit - Released June 2025
qiskit/granite-3.2-8b-qiskit - Released June 2025

GGUF models (recommended for personal environments/laptops)

GGUF format models are optimized for local use and require fewer computational resources:

mistral-small-3.2-24b-qiskit-GGUF – Released October 2025
Trained with Qiskit data up to version 2.1
Qiskit/Qwen2.5-Coder-14B-Qiskit-GGUF – Released June 2025
Trained with Qiskit data up to version 2.0
qiskit/granite-3.3-8b-qiskit-GGUF – Released June 2025
Trained with Qiskit data up to version 2.0
qiskit/granite-3.2-8b-qiskit-GGUF – Released June 2025
Trained with Qiskit data up to version 2.0

The Open Source Qiskit Code Assistant models are available in safetensors or GGUF file format and can be downloaded from the Hugging Face as explained below.

Qiskit versions used for training

Model						Benchmark Metrics					Release date	Trained on Qiskit version
	QiskitHumanEval-Hard	QiskitHumanEval	HumanEval	ASDiv	MathQA	SciQ	MBPP	IFEval	CrowsPairs (English)	TruthfulQA (MC1 acc)
mistral-small-3.2-24b-qiskit	32.45	47.02	77.49	3.77	49.68	97.50	64.00	48.44	67.08	39.41	January 2026	2.2
Qwen2.5-Coder-14B-Qiskit	25.17	49.01	91.46	4.21	53.90	97.00	77.60	49.64	65.18	37.82	June 2025	2.0
granite-3.3-8b-qiskit	14.57	27.15	62.80	0.48	38.66	93.30	52.40	59.71	59.75	39.05	June 2025	2.0
granite-3.2-8b-qiskit	9.93	24.50	57.32	0.09	41.41	96.30	51.80	60.79	66.79	40.51	June 2025	2.0
granite-8b-qiskit-rc-0.10	15.89	38.41	59.76	—	—	—	—	—	—	—	February 2025	1.3
granite-8b-qiskit	17.88	44.37	53.66	—	—	—	—	—	—	—	November 2024	1.2

Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.

Deprecated models

These models are no longer actively maintained but remain available:

qiskit/granite-8b-qiskit-rc-0.10 - Released February 2025 (deprecated)
qiskit/granite-8b-qiskit - Released November 2024 (deprecated)

More information and citations

To learn more about Qiskit Code Assistant, the Qiskit HumanEval, or Qiskit HumanEval Hard benchmarks, and cite them in your scientific publications, review these recommended citations:

@misc{2405.19495,
Author = {Nicolas Dupuis and Luca Buratti and Sanjay Vishwakarma and Aitana Viudes Forrat and David Kremer and Ismael Faro and Ruchir Puri and Juan Cruz-Benito},
Title = {Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code},
Year = {2024},
Eprint = {arXiv:2405.19495},
}

@misc{2406.14712,
Author = {Sanjay Vishwakarma and Francis Harkins and Siddharth Golecha and Vishal Sharathchandra Bajpe and Nicolas Dupuis and Luca Buratti and David Kremer and Ismael Faro and Ruchir Puri and Juan Cruz-Benito},
Title = {Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models},
Year = {2024},
Eprint = {arXiv:2406.14712},
}

@misc{2508.20907,
Author = {Nicolas Dupuis and Adarsh Tiwari and Youssef Mroueh and David Kremer and Ismael Faro and Juan Cruz-Benito},
Title = {Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant},
Year = {2025},
Eprint = {arXiv:2508.20907},
}

The Large Language Model (LLM) behind Qiskit Code Assistant​

The Qiskit HumanEval and Qiskit HumanEval Hard benchmarks​

Install Qiskit Code Assistant​

Using the Ollama application​

Install Ollama​

Set up Ollama using the Hugging Face Hub integration​

Set up Ollama with a manually downloaded Qiskit Code Assistant GGUF model​

Run the Qiskit Code Assistant model manually downloaded in Ollama​

Use the llama.cpp library​

Advanced parameters​

Continue (VS Code)​

1. Install the extension​

2. Open the config​

3. Configure the model​

4. Test it​

Jupyter AI (JupyterLab)​

1. Install Jupyter AI and the Ollama provider​

2. Configure the chat model​

3. Use the %%ai magic command​

4. Custom Ollama host (optional)​

OpenCode (Terminal)​

1. Install OpenCode​

2. Configure the Qiskit model​

3. Select the model​

4. Test it​

Available models​

Current models​

GGUF models (recommended for personal environments/laptops)​

Qiskit versions used for training​

Deprecated models​

More information and citations​

The Large Language Model (LLM) behind Qiskit Code Assistant

The Qiskit HumanEval and Qiskit HumanEval Hard benchmarks

Install Qiskit Code Assistant

Using the Ollama application

Install Ollama

Set up Ollama using the Hugging Face Hub integration

Set up Ollama with a manually downloaded Qiskit Code Assistant GGUF model

Run the Qiskit Code Assistant model manually downloaded in Ollama

Use the `llama.cpp` library

Advanced parameters

Continue (VS Code)

1. Install the extension

2. Open the config

3. Configure the model

4. Test it

Jupyter AI (JupyterLab)

1. Install Jupyter AI and the Ollama provider

2. Configure the chat model

3. Use the `%%ai` magic command

4. Custom Ollama host (optional)

OpenCode (Terminal)

1. Install OpenCode

2. Configure the Qiskit model

3. Select the model

4. Test it

Available models

Current models

GGUF models (recommended for personal environments/laptops)

Qiskit versions used for training

Deprecated models

More information and citations