--- title: Column selection | Adaption description: Map your dataset's columns to prompt, completion, context, and chat roles — the structure Adaptive Data uses to adapt your data. --- A dataset is just rows and named columns. Adaptive Data needs to know **which column carries the instruction**, **which carries the answer**, and **which (if any) carries supporting material**. That mapping is **column selection**, and it is the foundation every later step—recipes, brand controls, evaluation, export—builds on. In the Python SDK you express this with **`column_mapping`** on `datasets.run`. The same mapping lives behind the **columns** step of the web app’s adaptation wizard. ## Required: a prompt column or a completion column Every run needs at least one of two anchors: a **prompt column** or a **completion column**. You only need **one**—if the other is missing, the platform can synthesize it for you. - Have **prompt** but no **completion** → the platform generates completions for each prompt. - Have **completion** but no **prompt** → the platform synthesizes a matching prompt for each completion. - Have **neither** per row, but want the same instruction applied to every row → use a **universal prompt** in the web app. See [Universal prompts](/guides/universal-prompts/index.md). The simplest, most common case is a dataset that already has both: ``` run = client.datasets.run( dataset_id, column_mapping={ "prompt": "instruction", "completion": "response", }, ) ``` ## Prompt column The **prompt column** is the column that contains the **question, instruction, or task** that needs to be completed. Each row’s value becomes the input the model is asked to respond to. ``` column_mapping={"prompt": "instruction", "completion": "response"} ``` Cells can be **plain text** or **chat-turn arrays**—see [Plain text vs chat-turn arrays](#plain-text-vs-chat-turn-arrays) below. ## Completion column The **completion column** contains the **expected responses or answers** to the prompt column. When present, completions act as **ground truth** that recipes (deduplication, ranking, fusion, reasoning traces) can build on. ``` column_mapping={"prompt": "instruction", "completion": "response"} ``` If your dataset has completions but no prompts—for example, a corpus of high-quality answers you want to teach a model to produce—map only the **completion** column and let the platform synthesize prompts. ## Context columns A **context column** holds **background information, reference material, or documents** that the model needs to answer the prompt. Unlike `prompt` and `completion`, **`context` is a list**: you can tag **multiple** columns as context and the platform passes them all alongside the prompt at generation time. ``` column_mapping={ "prompt": "instruction", "completion": "response", "context": ["document_title", "document_body"], } ``` Common patterns: - **RAG-style data**: a `query` column as `prompt`, the retrieved passages as `context`, and the gold answer as `completion`. - **Per-row metadata**: `context` columns carry the row-specific guidance (audience, tone, product line) that should ride alongside each prompt. - **Universal prompt + context**: when no `prompt` column exists and you write a single instruction in the app, your context columns supply the per-row content that instruction operates on. See [Universal prompts](/guides/universal-prompts/index.md). ## Chat column The **chat column** carries **multi-turn conversational exchanges or dialogue history**—the same shape OpenAI-style chat APIs use. Each cell is a JSON array of turns: ``` [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "How do I reset my password?" }, { "role": "assistant", "content": "Open Settings → Account → Reset password." } ] ``` ``` column_mapping={"chat": "messages"} ``` The chat column is **mutually exclusive** with `prompt`, `completion`, and `context`: when you map a chat column, the platform extracts the prompt and completion from the conversation itself (typically the last assistant turn as the completion and everything before as the prompt). Use **`chat`** when your data is already conversational; use **`prompt`** + **`completion`** when each row is a single instruction-and-answer pair. ## Plain text vs chat-turn arrays Both **`prompt`** and **`completion`** cells accept either: - **Plain text** — the most common case. A string in each cell. - **Chat-turn arrays** — a JSON array of `{"role": ..., "content": ...}` objects, when a single row already represents a structured exchange. The platform parses these into chat turns automatically; cells that aren’t valid JSON arrays of turns are treated as plain text. ``` [ { "role": "user", "content": "Summarize this email." }, { "role": "assistant", "content": "..." } ] ``` For multi-turn conversations spread across a column, prefer the **`chat`** column above, which is purpose-built for that shape. See **`ColumnMapping`** in the [API Reference](/api/index.md) for the exact schema. ## Don’t have a prompt column? If every row should be evaluated against the **same instruction**—and that instruction isn’t already a column in your dataset—define it once as a **universal prompt** in the web app instead of engineering per-row prompts upstream. Walk-through: [Universal prompts](/guides/universal-prompts/index.md). ## Where this fits Column selection is one early step in the adaptation lifecycle. Once your mapping is set, the same dataset flows through recipes, [Brand controls](/guides/safety-and-length-constraints/index.md), evaluation, and export. If you are new to the rest of the lifecycle, [Getting started](/introduction/getting-started/index.md) walks through it end to end.