Skip to content
SupportGo to app

Column selection

Map your dataset's columns to prompt, completion, context, and chat roles — the structure Adaptive Data uses to adapt your data.

A dataset is just rows and named columns. Adaptive Data needs to know which column carries the instruction, which carries the answer, and which (if any) carries supporting material. That mapping is column selection, and it is the foundation every later step—recipes, brand controls, evaluation, export—builds on.

In the Python SDK you express this with column_mapping on datasets.run. The same mapping lives behind the columns step of the web app’s adaptation wizard.

Required: a prompt column or a completion column

Section titled “Required: a prompt column or a completion column”

Every run needs at least one of two anchors: a prompt column or a completion column. You only need one—if the other is missing, the platform can synthesize it for you.

  • Have prompt but no completion → the platform generates completions for each prompt.
  • Have completion but no prompt → the platform synthesizes a matching prompt for each completion.
  • Have neither per row, but want the same instruction applied to every row → use a universal prompt in the web app. See Universal prompts.

The simplest, most common case is a dataset that already has both:

run = client.datasets.run(
dataset_id,
column_mapping={
"prompt": "instruction",
"completion": "response",
},
)

The prompt column is the column that contains the question, instruction, or task that needs to be completed. Each row’s value becomes the input the model is asked to respond to.

column_mapping={"prompt": "instruction", "completion": "response"}

Cells can be plain text or chat-turn arrays—see Plain text vs chat-turn arrays below.

The completion column contains the expected responses or answers to the prompt column. When present, completions act as ground truth that recipes (deduplication, ranking, fusion, reasoning traces) can build on.

column_mapping={"prompt": "instruction", "completion": "response"}

If your dataset has completions but no prompts—for example, a corpus of high-quality answers you want to teach a model to produce—map only the completion column and let the platform synthesize prompts.

A context column holds background information, reference material, or documents that the model needs to answer the prompt. Unlike prompt and completion, context is a list: you can tag multiple columns as context and the platform passes them all alongside the prompt at generation time.

column_mapping={
"prompt": "instruction",
"completion": "response",
"context": ["document_title", "document_body"],
}

Common patterns:

  • RAG-style data: a query column as prompt, the retrieved passages as context, and the gold answer as completion.
  • Per-row metadata: context columns carry the row-specific guidance (audience, tone, product line) that should ride alongside each prompt.
  • Universal prompt + context: when no prompt column exists and you write a single instruction in the app, your context columns supply the per-row content that instruction operates on. See Universal prompts.

The chat column carries multi-turn conversational exchanges or dialogue history—the same shape OpenAI-style chat APIs use. Each cell is a JSON array of turns:

[
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "How do I reset my password?" },
{ "role": "assistant", "content": "Open Settings → Account → Reset password." }
]
column_mapping={"chat": "messages"}

The chat column is mutually exclusive with prompt, completion, and context: when you map a chat column, the platform extracts the prompt and completion from the conversation itself (typically the last assistant turn as the completion and everything before as the prompt). Use chat when your data is already conversational; use prompt + completion when each row is a single instruction-and-answer pair.

Both prompt and completion cells accept either:

  • Plain text — the most common case. A string in each cell.
  • Chat-turn arrays — a JSON array of {"role": ..., "content": ...} objects, when a single row already represents a structured exchange. The platform parses these into chat turns automatically; cells that aren’t valid JSON arrays of turns are treated as plain text.
[
{ "role": "user", "content": "Summarize this email." },
{ "role": "assistant", "content": "..." }
]

For multi-turn conversations spread across a column, prefer the chat column above, which is purpose-built for that shape. See ColumnMapping in the API Reference for the exact schema.

If every row should be evaluated against the same instruction—and that instruction isn’t already a column in your dataset—define it once as a universal prompt in the web app instead of engineering per-row prompts upstream. Walk-through: Universal prompts.

Column selection is one early step in the adaptation lifecycle. Once your mapping is set, the same dataset flows through recipes, Brand controls, evaluation, and export. If you are new to the rest of the lifecycle, Getting started walks through it end to end.