--- title: Getting started | Adaption description: Install the Adaption Python SDK, ingest data, run Adaptive Data adaptation jobs, and export results—same lifecycle in code as in the product. --- **Adaptive Data** is designed so the full path from raw file to augmented output is repeatable in code: a short sequence of API calls, whether you run once in a notebook or at scale in a pipeline. This page uses the SDK’s **convenience methods** (upload, import, run, wait, download). For granular control—raw `create`, presigned upload steps, and every field—see the [API Reference](/api/index.md). ## Install Terminal window ``` pip install adaption ``` ## Authentication ``` from adaption import Adaption client = Adaption(api_key="pt_live_...") # Or set the ADAPTION_API_KEY environment variable and omit the argument ``` --- ## Ingest data (three ways) **Ingest** data from where it already lives—local files, Hugging Face, or Kaggle—without hand-built conversion scripts for supported formats. Each path returns a `dataset_id` you pass into `datasets.run` for the **adapt** step. 1\. Upload a local file `upload_file` creates the dataset, uploads via a presigned URL, and confirms the upload. Supported extensions: **`.csv`**, **`.json`**, **`.jsonl`**, **`.parquet`**. ``` result = client.datasets.upload_file("training_data.csv") print(result.dataset_id) ``` Optional custom name (defaults to the filename without extension): ``` result = client.datasets.upload_file("data.csv", name="my-dataset") ``` 2\. Import from Hugging Face Point at a Hugging Face dataset URL and the file(s) to import. ``` resp = client.datasets.create_from_huggingface( url="https://huggingface.co/datasets/org/repo", files=["train.csv"], ) print(resp.dataset_id) ``` Import runs **asynchronously on the server**. Poll status or call `wait_for_completion()` before you start an adaptation run so ingestion has finished and the dataset is ready to adapt. 3\. Import from Kaggle Use the Kaggle dataset page URL and the files to pull. ``` resp = client.datasets.create_from_kaggle( url="https://www.kaggle.com/datasets/org/dataset-name", files=["data.csv"], ) print(resp.dataset_id) ``` **Kaggle credentials** must be registered in Adaption: open **[API keys settings](https://frontend-pharos-app-dev.vercel.app/app/settings?tab=api_keys)** (sign in if prompted) and add your Kaggle API credentials there before importing. --- ## Adapt: start a run **Adapt** applies the platform’s recipes and optimization to your dataset. Map columns to the roles the API expects (`prompt` is required; others are optional). Call with **`estimate=True`** first to see estimated cost and duration **before** you commit—no surprises on large jobs. ``` run = client.datasets.run( dataset_id, column_mapping={ "prompt": "instruction", # required — your prompt column "completion": "response", # optional — completion column # "chat": "conversation", # optional — chat column # "context": ["source", "ref"], # optional — context columns }, ) print(run.run_id) print(run.estimated_credits_consumed) ``` Estimate cost **without** starting a run: ``` estimate = client.datasets.run( dataset_id, column_mapping={"prompt": "instruction", "completion": "response"}, estimate=True, ) print(f"Would cost {estimate.estimated_credits_consumed} credits") ``` --- ## Wait for completion ``` from adaption import DatasetTimeout try: status = client.datasets.wait_for_completion(dataset_id, timeout=600) print(f"Done: {status.status}") # "succeeded" or "failed" if status.error: print(f"Error: {status.error.message}") except DatasetTimeout as e: print(f"Still running after {e.timeout}s (last status: {e.last_status})") ``` Polling uses exponential backoff (2s → 4s → 8s → … up to 30s). Default timeout is one hour. --- ## List datasets Auto-paginated iterator: ``` for dataset in client.datasets.list(): print(f"{dataset.dataset_id} — {dataset.status}") ``` With filters: ``` for dataset in client.datasets.list(status="succeeded", limit=10): print(dataset.name) ``` --- ## Export: download results **Export** augmented data for training pipelines, evaluation harnesses, or downstream storage. `download` returns a presigned URL you can fetch with your HTTP client. ``` url = client.datasets.download(dataset_id) # presigned S3 download URL ``` --- ## End-to-end example (file upload) ``` import time from adaption import Adaption, DatasetTimeout client = Adaption(api_key="pt_live_...") # 1. Upload result = client.datasets.upload_file("training_data.csv") # 2. Wait for file processing while True: status = client.datasets.get_status(result.dataset_id) if status.row_count is not None: break time.sleep(2) # 3. Adapt (start run) run = client.datasets.run( result.dataset_id, column_mapping={"prompt": "instruction", "completion": "response"}, ) print( f"Run started: {run.run_id}, ~{run.estimated_minutes} min, " f"{run.estimated_credits_consumed} credits" ) # 4. Wait for completion try: final = client.datasets.wait_for_completion(result.dataset_id, timeout=1800) print(f"Finished: {final.status}") except DatasetTimeout: print("Timed out — check status manually") # 5. Download url = client.datasets.download(result.dataset_id) print(f"Download: {url}") ``` --- ## Low-level and async The convenience methods delegate to lower-level APIs (for example `datasets.create`, `upload.initiate`, `upload.complete_by_id`, `get`, `get_status`). Browse the [API Reference](/api/index.md) for full signatures. **Async** variants exist for all methods: ``` from adaption import AsyncAdaption client = AsyncAdaption(api_key="pt_live_...") result = await client.datasets.upload_file("data.csv") status = await client.datasets.wait_for_completion(result.dataset_id) async for dataset in client.datasets.list(): print(dataset.dataset_id) ``` --- ## Going further Encode **how data should behave**—not only what it contains—via runs: - [Processing large datasets](/guides/processing-large-datasets/index.md) — subsample with `job_specification.max_rows` before a full run. - [Evaluating dataset quality](/guides/evaluating-dataset-quality/index.md) — read evaluation status and metrics after a run. - [Mitigating hallucinations](/guides/mitigating-hallucinations/index.md) — grounding and fact-aligned completions. - [Reasoning traces](/guides/reasoning-traces/index.md) — recipe-driven reasoning for auditable, trainable outputs. - [Safety and length constraints](/guides/safety-and-length-constraints/index.md) — length and safety as **specification**, aligned with [Blueprint](https://www.adaptionlabs.ai/blog/blueprint).