Mitigating hallucinations

Guides

Reduce fabricated content in adapted data with hallucination mitigation and web-search grounding via brand_controls.

Generative models can produce hallucinations—plausible text that is not grounded in your data or verifiable facts. When that noise lands in training or evaluation sets, it works against durable, transferable model behavior—the kind of control Adaptive Data is designed to reinforce.

Hallucination mitigation steers generations toward verifiable information using web-search grounding when enabled, so adapted completions are less likely to invent facts you will later have to filter or relabel.

Configure a run

Set hallucination_mitigation to true inside brand_controls:

run = client.datasets.run(
    dataset_id,
    column_mapping={
        "prompt": "instruction",
        "completion": "response",
    },
    brand_controls={
        "hallucination_mitigation": True,
    },
)

You can combine this with other brand_controls (length, safety categories, and related options) on the same run.

Full example

Same structure as Getting started: ingest (or reuse a dataset_id), wait until the dataset is ready, adapt with mitigation enabled, wait for completion, then export.

import os
import time

from adaption import Adaption, DatasetTimeout

client = Adaption(api_key=os.environ["ADAPTION_API_KEY"])
dataset_id = os.environ.get("ADAPTION_DATASET_ID")

if not dataset_id:
    result = client.datasets.upload_file("training_data.csv")
    dataset_id = result.dataset_id
    while True:
        st = client.datasets.get_status(dataset_id)
        if st.row_count is not None:
            break
        time.sleep(2)

estimate = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    brand_controls={"hallucination_mitigation": True},
    estimate=True,
)
print(f"Estimated credits: {estimate.estimated_credits_consumed}")

run = client.datasets.run(
    dataset_id,
    column_mapping={"prompt": "instruction", "completion": "response"},
    brand_controls={"hallucination_mitigation": True},
)
print(f"Run started: {run.run_id}")

try:
    final = client.datasets.wait_for_completion(dataset_id, timeout=3600)
    print(f"Finished: {final.status}")
    if final.error:
        raise RuntimeError(final.error.message)
except DatasetTimeout:
    print("Timed out — poll datasets.get or get_status in your environment")

url = client.datasets.download(dataset_id)
print(f"Download: {url}")

When to use it

Enable mitigation when outputs must stay fact-aligned: customer support, RAG-style workflows with external truth, compliance-sensitive text, or any pipeline where invented details are expensive to catch later. For creative or purely subjective tasks, you may omit it to save latency or credits; use estimate=True to compare cost before you scale.