Safety and length constraints

Guides

Encode length and content safety into each run with brand_controls—specification-first data, not cosmetic post-processing.

Adaptive Data treats specification as foundational: goals such as length and safety become objectives that steer what the platform produces. Data configuration should be structural, not a layer of preferences applied after the fact.

In the Python SDK today, brand_controls on datasets.run is where you express much of that specification: target verbosity (length) and content safety categories. Together they help ensure adapted data matches how much your product says and what it is allowed to say.

Length

Sets how verbose adapted completions should be—aligned with UI copy limits, documentation depth, or coaching-style responses.

run = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    brand_controls={
        "length": "concise",
    },
)

Safety categories

Completions that violate selected categories are handled according to platform policy—so training data stays closer to rules you can audit, not ad hoc filtering downstream.

run = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    brand_controls={
        "safety_categories": ["harassment", "hate"],
    },
)

Example values appear in tests; for production, treat the API schema as source of truth.

Combined example

run = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    brand_controls={
        "length": "detailed",
        "safety_categories": ["harassment", "hate"],
    },
)

Full script

import os
import time

from adaption import Adaption, DatasetTimeout

client = Adaption(api_key=os.environ["ADAPTION_API_KEY"])
dataset_id = os.environ.get("ADAPTION_DATASET_ID")

if not dataset_id:
    result = client.datasets.upload_file("training_data.csv")
    dataset_id = result.dataset_id
    while True:
        st = client.datasets.get_status(dataset_id)
        if st.row_count is not None:
            break
        time.sleep(2)

run = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    brand_controls={
        "length": "concise",
        "safety_categories": ["harassment", "hate"],
    },
)
print(f"Run started: {run.run_id}")

try:
    final = client.datasets.wait_for_completion(dataset_id, timeout=3600)
    print(f"Finished: {final.status}")
    if final.error:
        raise RuntimeError(final.error.message)
except DatasetTimeout:
    print("Timed out")

url = client.datasets.download(dataset_id)
print(f"Download: {url}")

When to use each control

length — Match assistant response depth to product constraints (short replies vs long explanations).
safety_categories — Encode policy-aligned training data for moderated or customer-facing models.