---
title: Column selection | Adaption
description: Map your dataset's columns to prompt, completion, context, and chat roles — the structure Adaptive Data uses to adapt your data.
---

A dataset is just rows and named columns. Adaptive Data needs to know **which column carries the instruction**, **which carries the answer**, and **which (if any) carries supporting material**. That mapping is **column selection**, and it is the foundation every later step—recipes, brand controls, evaluation, export—builds on.

In the Python SDK you express this with **`column_mapping`** on `datasets.run`. The same mapping lives behind the **columns** step of the web app’s adaptation wizard.

## Required: a prompt column or a completion column

Every run needs at least one of two anchors: a **prompt column** or a **completion column**. You only need **one**—if the other is missing, the platform can synthesize it for you.

- Have **prompt** but no **completion** → the platform generates completions for each prompt.
- Have **completion** but no **prompt** → the platform synthesizes a matching prompt for each completion.
- Have **neither** per row, but want the same instruction applied to every row → use a **universal prompt** in the web app. See [Universal prompts](/guides/universal-prompts/index.md).

The simplest, most common case is a dataset that already has both:

```
run = client.datasets.run(
    dataset_id,
    column_mapping={
        "prompt": "instruction",
        "completion": "response",
    },
)
```

## Prompt column

The **prompt column** is the column that contains the **question, instruction, or task** that needs to be completed. Each row’s value becomes the input the model is asked to respond to.

```
column_mapping={"prompt": "instruction", "completion": "response"}
```

Cells can be **plain text** or **chat-turn arrays**—see [Plain text vs chat-turn arrays](#plain-text-vs-chat-turn-arrays) below.

## Completion column

The **completion column** contains the **expected responses or answers** to the prompt column. When present, completions act as **ground truth** that recipes (deduplication, ranking, fusion, reasoning traces) can build on.

```
column_mapping={"prompt": "instruction", "completion": "response"}
```

If your dataset has completions but no prompts—for example, a corpus of high-quality answers you want to teach a model to produce—map only the **completion** column and let the platform synthesize prompts.

## Context columns

A **context column** holds **background information, reference material, or documents** that the model needs to answer the prompt. Unlike `prompt` and `completion`, **`context` is a list**: you can tag **multiple** columns as context and the platform passes them all alongside the prompt at generation time.

```
column_mapping={
    "prompt": "instruction",
    "completion": "response",
    "context": ["document_title", "document_body"],
}
```

Common patterns:

- **RAG-style data**: a `query` column as `prompt`, the retrieved passages as `context`, and the gold answer as `completion`.
- **Per-row metadata**: `context` columns carry the row-specific guidance (audience, tone, product line) that should ride alongside each prompt.
- **Universal prompt + context**: when no `prompt` column exists and you write a single instruction in the app, your context columns supply the per-row content that instruction operates on. See [Universal prompts](/guides/universal-prompts/index.md).

## Chat column

The **chat column** carries **multi-turn conversational exchanges or dialogue history**—the same shape OpenAI-style chat APIs use. Each cell is a JSON array of turns:

```
[
  { "role": "system", "content": "You are a helpful assistant." },
  { "role": "user", "content": "How do I reset my password?" },
  { "role": "assistant", "content": "Open Settings → Account → Reset password." }
]
```

```
column_mapping={"chat": "messages"}
```

The chat column is **mutually exclusive** with `prompt`, `completion`, and `context`: when you map a chat column, the platform extracts the prompt and completion from the conversation itself (typically the last assistant turn as the completion and everything before as the prompt). Use **`chat`** when your data is already conversational; use **`prompt`** + **`completion`** when each row is a single instruction-and-answer pair.

## Plain text vs chat-turn arrays

Both **`prompt`** and **`completion`** cells accept either:

- **Plain text** — the most common case. A string in each cell.
- **Chat-turn arrays** — a JSON array of `{"role": ..., "content": ...}` objects, when a single row already represents a structured exchange. The platform parses these into chat turns automatically; cells that aren’t valid JSON arrays of turns are treated as plain text.

```
[
  { "role": "user", "content": "Summarize this email." },
  { "role": "assistant", "content": "..." }
]
```

For multi-turn conversations spread across a column, prefer the **`chat`** column above, which is purpose-built for that shape. See **`ColumnMapping`** in the [API Reference](/api/index.md) for the exact schema.

## Don’t have a prompt column?

If every row should be evaluated against the **same instruction**—and that instruction isn’t already a column in your dataset—define it once as a **universal prompt** in the web app instead of engineering per-row prompts upstream. Walk-through: [Universal prompts](/guides/universal-prompts/index.md).

## Where this fits

Column selection is one early step in the adaptation lifecycle. Once your mapping is set, the same dataset flows through recipes, [Brand controls](/guides/safety-and-length-constraints/index.md), evaluation, and export. If you are new to the rest of the lifecycle, [Getting started](/introduction/getting-started/index.md) walks through it end to end.