---
title: Multimodal context | Adaption
description: Supply images as context columns in Adaptive Data so models can reason over visual reference material alongside the prompt.
---

**Context columns** aren’t limited to text. A context column can carry **images**—visual reference material the model reasons over alongside the prompt, the same way it would a document or a passage of retrieved text. If you are new to the concept, [Column selection](/guides/column-selection/index.md) covers how context columns work in general; this guide focuses on the image case.

## Use cases

Image context columns power a range of vision and multimodal tasks:

- **Visual question answering** — answer questions about the contents of an image.
- **Image captioning** — generate a natural-language description of an image.
- **Visual reasoning** — reason over relationships, counts, or spatial layout within an image.
- **Image classification via prompting** — assign labels to an image using a prompt instead of a trained classifier.
- **Document question answering** — extract or answer questions from scanned documents, forms, and other image-based text.

Use image context columns when your data is **visual**: building vision or multimodal datasets, OCR-style extraction from images, or generating image captions and descriptions as training data.

## Supported image inputs

How an image column is handled depends on where the dataset comes from. Each source currently supports the following image representations:

### HuggingFace

- **`bytes`** — raw image bytes
- **`hf_struct`** — HuggingFace `Image()` feature
- **`list`** — array of image bytes
- **`url`** — absolute `http(s)` URLs
- **`path`** — relative paths (e.g. an `imagefolder`)

### Kaggle

- **Image folders** — no manifest (e.g. `train/<class>/*.png`)
- **CSV / Parquet manifest** with an image column
- **Tabular file** with a bytes or URL column

### Direct file upload

For files uploaded directly (`csv`, `parquet`, `json`, `jsonl`, `xlsx`, and similar):

- **`bytes` / `hf_struct` / `list`**
- **`url`** — absolute `http(s)` URLs

## Using images in the app

Image context is configured in the **web app’s Adaptive Data wizard**. To see how it works, walk through the [MathVision](https://huggingface.co/datasets/MathLLMs/MathVision) dataset—a collection of competition math problems where each question references a diagram or figure it can’t be solved without.

Each row holds the question text, multiple-choice options, the figure the question refers to, and the correct answer. In the column-selection step, map them like this:

- `question` → **prompt** column. The per-row question, e.g. *“Which number should be written in place of the question mark? `<image1>`”*.
- `decoded_image` → **context** column. The figure the question references, supplied to the model as an image.
- `answer` → **completion** column. The expected answer.

![Mapping the MathVision columns in the column selection step of the adaptation wizard: question as the prompt, decoded\_image as image context, and answer as the completion.](/multimodal-context/column-selection.png)

Because every row carries its own question, you map the `question` column as the prompt rather than writing one. If your rows instead shared a single instruction—say *“Describe what’s shown in this image”*—you’d use a [universal prompt](/guides/universal-prompts/index.md) and supply only the image as context.

## Continue through the wizard

With your columns mapped, finish the remaining wizard steps—**diagnosis**, **recipes**, **brand**, and **summary**—then launch the run. The platform processes your dataset asynchronously; the dataset status updates when the job completes.

## Review results in the View tab

When processing finishes, open the dataset and switch to the **View** tab. Each row lays out the adapted output side by side with your source data:

- **Original completion** — the short answer from your `answer` column (e.g. `11`).
- **Enhanced prompt** — the per-row question after adaptation, with any `<image1>` reference intact.
- **Enhanced completion** — the platform-generated response, typically a fuller step-by-step solution.
- **Image column** — the figure from your context mapping, rendered inline so you can verify visual reasoning against the diagram.

![The View tab for a completed MathVision run: original completion, enhanced prompt, step-by-step enhanced completion, and the image context column showing the geometry diagram.](/multimodal-context/view-tab.png)

*The screenshot above is from a run that used **Expand** to translate MathVision into Spanish.*

## Next steps

From here the workflow is the same as any other adapted dataset:

- [Evaluating dataset quality](/guides/evaluating-dataset-quality/index.md) — read quality scores and metrics after the run.
- [Reasoning traces](/guides/reasoning-traces/index.md) — add a chain-of-thought alongside each completion when the reasoning matters as much as the answer.