--- title: Multimodal context | Adaption description: Supply images as context columns in Adaptive Data so models can reason over visual reference material alongside the prompt. --- **Context columns** aren’t limited to text. A context column can carry **images**—visual reference material the model reasons over alongside the prompt, the same way it would a document or a passage of retrieved text. If you are new to the concept, [Column selection](/guides/column-selection/index.md) covers how context columns work in general; this guide focuses on the image case. ## Use cases Image context columns power a range of vision and multimodal tasks: - **Visual question answering** — answer questions about the contents of an image. - **Image captioning** — generate a natural-language description of an image. - **Visual reasoning** — reason over relationships, counts, or spatial layout within an image. - **Image classification via prompting** — assign labels to an image using a prompt instead of a trained classifier. - **Document question answering** — extract or answer questions from scanned documents, forms, and other image-based text. Use image context columns when your data is **visual**: building vision or multimodal datasets, OCR-style extraction from images, or generating image captions and descriptions as training data. ## Supported image inputs How an image column is handled depends on where the dataset comes from. Each source currently supports the following image representations: ### HuggingFace - **`bytes`** — raw image bytes - **`hf_struct`** — HuggingFace `Image()` feature - **`list`** — array of image bytes - **`url`** — absolute `http(s)` URLs - **`path`** — relative paths (e.g. an `imagefolder`) ### Kaggle - **Image folders** — no manifest (e.g. `train//*.png`) - **CSV / Parquet manifest** with an image column - **Tabular file** with a bytes or URL column ### Direct file upload For files uploaded directly (`csv`, `parquet`, `json`, `jsonl`, `xlsx`, and similar): - **`bytes` / `hf_struct` / `list`** - **`url`** — absolute `http(s)` URLs ## Using images in the app Image context is configured in the **web app’s Adaptive Data wizard**. To see how it works, walk through the [MathVision](https://huggingface.co/datasets/MathLLMs/MathVision) dataset—a collection of competition math problems where each question references a diagram or figure it can’t be solved without. Each row holds the question text, multiple-choice options, the figure the question refers to, and the correct answer. In the column-selection step, map them like this: - `question` → **prompt** column. The per-row question, e.g. *“Which number should be written in place of the question mark? ``”*. - `decoded_image` → **context** column. The figure the question references, supplied to the model as an image. - `answer` → **completion** column. The expected answer. ![Mapping the MathVision columns in the column selection step of the adaptation wizard: question as the prompt, decoded\_image as image context, and answer as the completion.](/multimodal-context/column-selection.png) Because every row carries its own question, you map the `question` column as the prompt rather than writing one. If your rows instead shared a single instruction—say *“Describe what’s shown in this image”*—you’d use a [universal prompt](/guides/universal-prompts/index.md) and supply only the image as context. ## Continue through the wizard With your columns mapped, finish the remaining wizard steps—**diagnosis**, **recipes**, **brand**, and **summary**—then launch the run. The platform processes your dataset asynchronously; the dataset status updates when the job completes. ## Review results in the View tab When processing finishes, open the dataset and switch to the **View** tab. Each row lays out the adapted output side by side with your source data: - **Original completion** — the short answer from your `answer` column (e.g. `11`). - **Enhanced prompt** — the per-row question after adaptation, with any `` reference intact. - **Enhanced completion** — the platform-generated response, typically a fuller step-by-step solution. - **Image column** — the figure from your context mapping, rendered inline so you can verify visual reasoning against the diagram. ![The View tab for a completed MathVision run: original completion, enhanced prompt, step-by-step enhanced completion, and the image context column showing the geometry diagram.](/multimodal-context/view-tab.png) *The screenshot above is from a run that used **Expand** to translate MathVision into Spanish.* ## Next steps From here the workflow is the same as any other adapted dataset: - [Evaluating dataset quality](/guides/evaluating-dataset-quality/index.md) — read quality scores and metrics after the run. - [Reasoning traces](/guides/reasoning-traces/index.md) — add a chain-of-thought alongside each completion when the reasoning matters as much as the answer.