Skip to content
SupportGo to app

Start an augmentation run (or estimate cost)

POST/api/v1/datasets/{dataset_id}/run

Validates column mapping and recipe configuration, reserves credits, and starts the augmentation pipeline. Set estimate=true to validate and get a cost quote without starting a run. When the mapped image column is also in context_columns, output rows are billed at 10 credits per 100 rows (1–100 rows cost 10 credits; see multimodalPricingApplied and creditMultiplier on the response).

Path ParametersExpand Collapse
dataset_id: string
Body ParametersJSONExpand Collapse
brand_controls: optional object { blueprint, hallucination_mitigation, length, safety_categories }

Brand and quality controls for generated completions. Covers response length, content safety categories, web-search grounding, and a freeform blueprint system prompt.

blueprint: optional string

Freeform brand/style instructions injected as a system prompt for every generated completion. Use this to enforce tone, language, persona, or any guideline that does not fit the structured length/safety/grounding controls.

hallucination_mitigation: optional boolean

Enable web-search grounding to reduce hallucinations in generated completions.

length: optional "minimal" or "concise" or "detailed" or "extensive"

Target response length. Controls verbosity of generated completions.

One of the following:
"minimal"
"concise"
"detailed"
"extensive"
safety_categories: optional array of string

Content safety categories to enforce. Completions violating any listed category are filtered from the output.

column_mapping: optional object { chat, completion, context, 3 more }

Column role assignments for augmentation. Required for real runs, optional for estimate-only requests.

chat: optional string

Column containing chat/conversation data. Chat mode is exclusive — when chat is set, all other column-mapping fields must be omitted.

completion: optional string

Column to use as the completion/response field. Optional in prompt mode and universal-prompt mode; not allowed in chat mode.

context: optional array of string

Columns to include as context. Optional in prompt mode; required (at least one entry) in universal-prompt mode; not allowed in chat mode.

image: optional string

Column containing per-row images (URLs, file paths, or encoded bytes) to fold into the augmentation as multimodal context. Allowed in any mode (prompt, universal-prompt, or chat) as long as text framing is also present — it cannot be the only mapping. The column is automatically added to context if omitted. Note: opting a dataset into multimodal context disqualifies it from finetuning.

prompt: optional string

Column to use as the prompt/instruction field. Trigger for prompt mode (with optional completion and context); cannot be combined with chat or universal_prompt.

universal_prompt: optional string

Dataset-wide instruction folded into every row alongside the context columns. Use when there is no per-row prompt column but rows share a common task framing. Requires at least one entry in context.

estimate: optional boolean

When true, validates the request and returns the estimated credit cost without starting a run.

job_specification: optional object { idempotency_key, max_rows }

Job execution parameters

idempotency_key: optional string

Client-generated idempotency key for safe retries. If a launch with the same key already exists, the original response is returned.

max_rows: optional number

Maximum number of rows to process in this run

minimum1
language_expansion: optional object { sample_rate, type, languages, pairs }

Translation/localization expansion. When set, the pipeline produces additional rows in the requested languages or country/language variants. Output row count ≈ input × (1 + sample_rate × target_count); credits are billed on output rows.

Three-state field:

  • omitted — leaves any prior expansion config on the dataset unchanged (no-op). A wizard-saved expansion is preserved across SDK-driven /run calls that do not pass language_expansion.
  • language_expansion: null — explicitly clears any prior expansion config on this dataset (overrides wizard state).
  • object — overwrites the prior expansion config with the given spec.
sample_rate: number

Fraction (0.01–1) of input rows that are expanded per target. Output rows ≈ input × (1 + sample_rate × target_count). Required. Credits are billed on the expanded output row count.

minimum0.01
maximum1
type: "translate" or "localize"

Expansion mode. translate produces one new row variant per target language; localize produces one new row variant per country/language pair.

One of the following:
"translate"
"localize"
languages: optional array of string

Target ISO 639-1 language codes (required when type=translate, forbidden when type=localize). Validated against the supported language list at request time; unknown values return 400 with a sample of supported codes.

pairs: optional array of object { country, language }

Country/language pairs (required when type=localize, forbidden when type=translate). Each pair is validated against the supported country/language pair list at request time.

country: string

ISO 3166-1 alpha-2 country code.

minLength2
maxLength2
language: string

ISO 639-1 language code.

minLength2
maxLength2
recipe_specification: optional object { recipes, version }

Augmentation recipe configuration. Omitted recipes use backend defaults.

recipes: optional object { deduplication, prompt_rephrase, reasoning_traces }

Augmentation recipe toggles. Omitted recipes use backend defaults.

deduplication: optional boolean

Remove near-duplicate rows

prompt_rephrase: optional boolean

Rephrase prompts for variety and clarity

reasoning_traces: optional boolean

Add reasoning traces (chain-of-thought) to completions

version: optional string

Recipe schema version. Allows recipe options to evolve across releases.

training_type: optional "instruction_dataset" or "preference_pairs"

How to adapt the dataset. instruction_dataset (default) produces enhanced prompt/completion pairs for SFT; preference_pairs generates chosen/rejected pairs for DPO. Source columns are the same in both cases — column_mapping.prompt (and optionally completion) — the chosen/rejected fields are produced by the augmentation pipeline.

One of the following:
"instruction_dataset"
"preference_pairs"
ReturnsExpand Collapse
estimate: boolean

Whether this was an estimate-only request (no run started)

estimatedCreditsConsumed: number

Estimated number of credits that will be consumed by this run

estimatedMinutes: number

Estimated processing time in minutes

multimodalPricingApplied: boolean

True when an image column is mapped and also listed in context_columns; each output row is billed at a higher rate.

creditMultiplier: optional number

10 credits per 100 output rows when multimodalPricingApplied is true. Omitted for text-only pricing.

run_id: optional string

Unique identifier for this pipeline run. Null for estimate-only requests.

Start an augmentation run (or estimate cost)

curl https://api.prod.adaptionlabs.ai/api/v1/datasets/$DATASET_ID/run \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $ADAPTION_API_KEY" \
    -d '{
          "training_type": "instruction_dataset"
        }'
{
  "estimate": true,
  "estimatedCreditsConsumed": 0,
  "estimatedMinutes": 0,
  "multimodalPricingApplied": true,
  "creditMultiplier": 0,
  "run_id": "dataset-550e8400-e29b-41d4-a716-446655440000-1712234567890"
}
Returns Examples
{
  "estimate": true,
  "estimatedCreditsConsumed": 0,
  "estimatedMinutes": 0,
  "multimodalPricingApplied": true,
  "creditMultiplier": 0,
  "run_id": "dataset-550e8400-e29b-41d4-a716-446655440000-1712234567890"
}