# Datasets ## Create a dataset from file upload, HuggingFace, or Kaggle `client.datasets.create(DatasetCreateParamsbody, RequestOptionsoptions?): DatasetCreateResponse` **post** `/api/v1/datasets` Unified ingest endpoint. Discriminated by source.type: "file" returns upload instructions for a presigned S3 PUT, "huggingface" and "kaggle" start an async import. ### Parameters - `body: DatasetCreateParams` - `source: FileSourceDto | HuggingfaceSourceDto | KaggleSourceDto` Dataset source configuration. Discriminated by `type`: file, huggingface, or kaggle. - `FileSourceDto` - `file_format: "csv" | "json" | "jsonl" | "parquet"` Format of the file being uploaded - `"csv"` - `"json"` - `"jsonl"` - `"parquet"` - `name: string` Human-readable name for the dataset - `type: "file"` Source type - `"file"` - `HuggingfaceSourceDto` - `files: Array` File paths to download from the repository - `type: "huggingface"` Source type - `"huggingface"` - `url: string` HuggingFace dataset repository URL - `KaggleSourceDto` - `files: Array` File paths to download from the dataset - `type: "kaggle"` Source type - `"kaggle"` - `url: string` Kaggle dataset URL ### Returns - `DatasetCreateResponse` - `dataset_id: string` ID of the newly created dataset - `status: string` Current dataset status - `upload_instructions?: UploadInstructions` Upload instructions for file sources. PUT your file to the provided URL. - `method: string` HTTP method to use - `s3_key: string` S3 object key — pass this back in the complete request if needed for verification - `url: string` Pre-signed URL for uploading the file ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const dataset = await client.datasets.create({ source: { file_format: 'csv', name: 'my-training-data', type: 'file', }, }); console.log(dataset.dataset_id); ``` #### Response ```json { "dataset_id": "dataset_id", "status": "status", "upload_instructions": { "method": "PUT", "s3_key": "s3_key", "url": "https://s3.amazonaws.com/bucket/key?X-Amz-Signature=..." } } ``` ## Get a dataset by ID `client.datasets.get(stringdatasetID, RequestOptionsoptions?): Dataset` **get** `/api/v1/datasets/{dataset_id}` Get a dataset by ID ### Parameters - `datasetID: string` ### Returns - `Dataset` - `configured_column_mapping: ConfiguredColumnMapping | null` User-configured column mapping. Null if not yet configured. - `chat: string | null` - `completion: string | null` - `context: Array` - `prompt: string | null` - `created_at: string` Timestamp when the dataset was created - `dataset_id: string` Unique dataset identifier - `error: Error | null` Error details if the dataset failed. Null otherwise. - `message: string` Error message - `evaluation_summary: EvaluationSummary | null` Compact evaluation summary. Null if evaluation has not completed. - `grade_after: string | null` Letter grade (A-E) after augmentation - `grade_before: string | null` Letter grade (A-E) before augmentation - `improvement_percent: number | null` Relative improvement percentage - `score_after: number | null` Quality score after augmentation - `score_before: number | null` Quality score before augmentation - `name: string | null` Human-readable name for the dataset - `progress: Progress | null` Processing progress. Null when no run is active. - `percent: number | null` Progress percentage (0-100) - `processed_rows: number | null` Number of rows processed so far - `total_rows: number | null` Total rows to process (samples_to_process or row_count) - `row_count: number | null` Total number of rows in the dataset - `run_id: string | null` ID of the currently active run - `status: "pending" | "running" | "succeeded" | "failed"` Lifecycle status: pending, running, succeeded, or failed - `"pending"` - `"running"` - `"succeeded"` - `"failed"` - `updated_at: string` Timestamp of the last update ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const dataset = await client.datasets.get('dataset_id'); console.log(dataset.dataset_id); ``` #### Response ```json { "configured_column_mapping": { "chat": "chat", "completion": "completion", "context": [ "string" ], "prompt": "prompt" }, "created_at": "2019-12-27T18:11:19.117Z", "dataset_id": "dataset_id", "error": { "message": "message" }, "evaluation_summary": { "grade_after": "grade_after", "grade_before": "grade_before", "improvement_percent": 0, "score_after": 0, "score_before": 0 }, "name": "name", "progress": { "percent": 0, "processed_rows": 0, "total_rows": 0 }, "row_count": 0, "run_id": "run_id", "status": "pending", "updated_at": "2019-12-27T18:11:19.117Z" } ``` ## List datasets `client.datasets.list(DatasetListParamsquery?, RequestOptionsoptions?): Cursor` **get** `/api/v1/datasets` List datasets ### Parameters - `query: DatasetListParams` - `created_after?: string` ISO 8601 datetime — datasets created after this time. - `created_before?: string` ISO 8601 datetime — datasets created before this time. - `cursor?: string` Cursor from the previous response next_cursor field. - `limit?: number` Number of results (max 100, default 20). Used with cursor pagination. - `q?: string` Search by dataset name (case-insensitive contains). - `sort?: string` Sort field: created_at | updated_at | name (default: created_at). - `sort_direction?: string` Sort direction: asc | desc (default: desc). - `status?: string` Filter by status: pending | running | succeeded | failed ### Returns - `DatasetListResponse` - `created_at: string` Timestamp when the dataset was created - `dataset_id: string` Dataset ID - `status: "pending" | "running" | "succeeded" | "failed"` Dataset status - `"pending"` - `"running"` - `"succeeded"` - `"failed"` - `updated_at: string` Last updated timestamp - `description?: string | null` Auto-generated description of the dataset contents - `name?: string | null` Dataset name - `row_count?: number | null` Total number of rows ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); // Automatically fetches more pages as needed. for await (const datasetListResponse of client.datasets.list()) { console.log(datasetListResponse.dataset_id); } ``` #### Response ```json { "datasets": [ { "created_at": "2019-12-27T18:11:19.117Z", "dataset_id": "550e8400-e29b-41d4-a716-446655440000", "status": "pending", "updated_at": "2019-12-27T18:11:19.117Z", "description": "description", "name": "My training data", "row_count": 1000 } ], "next_cursor": "550e8400-e29b-41d4-a716-446655440000" } ``` ## Get the processing status of a dataset `client.datasets.getStatus(stringdatasetID, RequestOptionsoptions?): DatasetGetStatusResponse` **get** `/api/v1/datasets/{dataset_id}/status` Get the processing status of a dataset ### Parameters - `datasetID: string` ### Returns - `DatasetGetStatusResponse` - `dataset_id: string` Dataset ID - `error: Error | null` Error details if the dataset failed. Null otherwise. - `message: string` Error message - `progress: Progress | null` Processing progress. Null when no run is active. - `percent: number | null` Progress percentage (0-100) - `processed_rows: number | null` Number of rows processed so far - `total_rows: number | null` Total rows to process (samples_to_process or row_count) - `row_count: number | null` Number of rows in the dataset - `status: "pending" | "running" | "succeeded" | "failed"` Current processing status - `"pending"` - `"running"` - `"succeeded"` - `"failed"` ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.getStatus('dataset_id'); console.log(response.dataset_id); ``` #### Response ```json { "dataset_id": "dataset_id", "error": { "message": "message" }, "progress": { "percent": 0, "processed_rows": 0, "total_rows": 0 }, "row_count": 0, "status": "pending" } ``` ## Download the processed dataset `client.datasets.download(stringdatasetID, DatasetDownloadParamsquery?, RequestOptionsoptions?): DatasetDownloadResponse` **get** `/api/v1/datasets/{dataset_id}/download` Download the processed dataset ### Parameters - `datasetID: string` - `query: DatasetDownloadParams` - `fileFormat?: "csv" | "json" | "jsonl" | "parquet"` Output file format. Defaults to the original upload format if omitted. - `"csv"` - `"json"` - `"jsonl"` - `"parquet"` ### Returns - `DatasetDownloadResponse = Uploadable` ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.download('dataset_id'); console.log(response); ``` #### Response ```json "Example data" ``` ## Publish a dataset to an external platform `client.datasets.publish(stringdatasetID, DatasetPublishParamsbody, RequestOptionsoptions?): DatasetPublishResponse` **post** `/api/v1/datasets/{dataset_id}/publish` Publishes the processed dataset to Hugging Face or Kaggle. Currently returns 501 — not yet implemented. ### Parameters - `datasetID: string` - `body: DatasetPublishParams` - `target: "huggingface" | "kaggle"` Destination platform for publishing the dataset - `"huggingface"` - `"kaggle"` - `target_spec?: Record` Target-specific configuration (e.g. repo name for HuggingFace, slug for Kaggle) ### Returns - `DatasetPublishResponse` - `publish_id: string` Unique identifier for the publish job - `status: string` Status of the publish job - `message?: string` Additional information about the publish request ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.publish('dataset_id', { target: 'huggingface' }); console.log(response.publish_id); ``` #### Response ```json { "publish_id": "550e8400-e29b-41d4-a716-446655440000", "status": "queued", "message": "message" } ``` ## Start an augmentation run (or estimate cost) `client.datasets.run(stringdatasetID, DatasetRunParamsbody, RequestOptionsoptions?): DatasetRunResponse` **post** `/api/v1/datasets/{dataset_id}/run` Validates column mapping and recipe configuration, reserves credits, and starts the augmentation pipeline. Set estimate=true to validate and get a cost quote without starting a run. ### Parameters - `datasetID: string` - `body: DatasetRunParams` - `brand_controls?: BrandControls` Brand and quality controls for generated completions (length, safety, hallucination grounding). - `hallucination_mitigation?: boolean` Enable web-search grounding to reduce hallucinations in generated completions - `length?: "minimal" | "concise" | "detailed" | "extensive"` Target response length. Controls verbosity of generated completions. - `"minimal"` - `"concise"` - `"detailed"` - `"extensive"` - `safety_categories?: Array` Content safety categories to enforce. Completions violating these are filtered. - `column_mapping?: ColumnMapping` Column role assignments for augmentation. Required for real runs, optional for estimate-only requests. - `prompt: string` Column to use as the prompt/instruction field - `chat?: string` Column containing chat/conversation data (alternative to prompt+completion) - `completion?: string` Column to use as the completion/response field - `context?: Array` Columns to include as context - `estimate?: boolean` When true, validates the request and returns the estimated credit cost without starting a run. - `job_specification?: JobSpecification` Job execution parameters - `idempotency_key?: string` Client-generated idempotency key for safe retries. If a launch with the same key already exists, the original response is returned. - `max_rows?: number` Maximum number of rows to process in this run - `recipe_specification?: RecipeSpecification` Augmentation recipe configuration. Omitted recipes use backend defaults. - `recipes?: Recipes` Augmentation recipe toggles. Omitted recipes use backend defaults. - `deduplication?: boolean` Remove near-duplicate rows - `preference_pairs?: boolean` Generate DPO-style preference pairs (chosen/rejected) instead of instruction completions - `prompt_metadata_injection?: boolean` Inject context and constraints into prompts - `prompt_rephrase?: boolean` Rephrase prompts for variety and clarity - `reasoning_traces?: boolean` Add reasoning traces (chain-of-thought) to completions - `version?: string` Recipe schema version. Allows recipe options to evolve across releases. ### Returns - `DatasetRunResponse` - `estimate: boolean` Whether this was an estimate-only request (no run started) - `estimatedCreditsConsumed: number` Estimated number of credits that will be consumed by this run - `estimatedMinutes: number` Estimated processing time in minutes - `run_id?: string | null` Unique identifier for this pipeline run. Null for estimate-only requests. ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.run('dataset_id'); console.log(response.run_id); ``` #### Response ```json { "estimate": true, "estimatedCreditsConsumed": 0, "estimatedMinutes": 0, "run_id": "dataset-550e8400-e29b-41d4-a716-446655440000-1712234567890" } ``` ## Get evaluation results for a dataset `client.datasets.getEvaluation(stringdatasetID, RequestOptionsoptions?): DatasetGetEvaluationResponse` **get** `/api/v1/datasets/{dataset_id}/evaluation` Get evaluation results for a dataset ### Parameters - `datasetID: string` ### Returns - `DatasetGetEvaluationResponse` - `dataset_id: string` Dataset ID - `quality: Quality | null` Structured quality metrics. Null until evaluation completes. - `grade_after: string | null` Letter grade (A-E) after augmentation - `grade_before: string | null` Letter grade (A-E) before augmentation - `improvement_percent: number | null` Relative quality improvement as a percentage - `percentile_after: number | null` Percentile rank (0-100) after augmentation - `score_after: number | null` Quality score (0-10) after augmentation - `score_before: number | null` Quality score (0-10) before augmentation - `raw_results: Record | null` Raw evaluation results payload for advanced use. Null until evaluation completes. - `status: string | null` Evaluation pipeline status: pending | running | succeeded | failed | skipped ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.getEvaluation('dataset_id'); console.log(response.dataset_id); ``` #### Response ```json { "dataset_id": "dataset_id", "quality": { "grade_after": "A", "grade_before": "C", "improvement_percent": 37.1, "percentile_after": 92.3, "score_after": 8.5, "score_before": 6.2 }, "raw_results": { "foo": "bar" }, "status": "succeeded" } ``` ## Domain Types ### Dataset - `Dataset` - `configured_column_mapping: ConfiguredColumnMapping | null` User-configured column mapping. Null if not yet configured. - `chat: string | null` - `completion: string | null` - `context: Array` - `prompt: string | null` - `created_at: string` Timestamp when the dataset was created - `dataset_id: string` Unique dataset identifier - `error: Error | null` Error details if the dataset failed. Null otherwise. - `message: string` Error message - `evaluation_summary: EvaluationSummary | null` Compact evaluation summary. Null if evaluation has not completed. - `grade_after: string | null` Letter grade (A-E) after augmentation - `grade_before: string | null` Letter grade (A-E) before augmentation - `improvement_percent: number | null` Relative improvement percentage - `score_after: number | null` Quality score after augmentation - `score_before: number | null` Quality score before augmentation - `name: string | null` Human-readable name for the dataset - `progress: Progress | null` Processing progress. Null when no run is active. - `percent: number | null` Progress percentage (0-100) - `processed_rows: number | null` Number of rows processed so far - `total_rows: number | null` Total rows to process (samples_to_process or row_count) - `row_count: number | null` Total number of rows in the dataset - `run_id: string | null` ID of the currently active run - `status: "pending" | "running" | "succeeded" | "failed"` Lifecycle status: pending, running, succeeded, or failed - `"pending"` - `"running"` - `"succeeded"` - `"failed"` - `updated_at: string` Timestamp of the last update # Upload ## Initiate a dataset upload `client.datasets.upload.initiate(UploadInitiateParamsbody, RequestOptionsoptions?): UploadInitiateResponse` **post** `/api/v1/datasets/upload/initiate` Initiate a dataset upload ### Parameters - `body: UploadInitiateParams` - `file_format: "csv" | "json" | "jsonl" | "parquet"` Format of the file being uploaded - `"csv"` - `"json"` - `"jsonl"` - `"parquet"` - `name: string` Human-readable name for the dataset ### Returns - `UploadInitiateResponse` - `upload_url: string` Pre-signed S3 URL — upload the file directly to this URL via HTTP PUT ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.upload.initiate({ file_format: 'csv', name: 'my-training-data', }); console.log(response.upload_url); ``` #### Response ```json { "upload_url": "https://s3.amazonaws.com/bucket/key?X-Amz-Signature=..." } ``` ## Complete a dataset upload and trigger processing `client.datasets.upload.complete(UploadCompleteParamsbody, RequestOptionsoptions?): UploadCompleteResponse` **post** `/api/v1/datasets/upload/complete` Complete a dataset upload and trigger processing ### Parameters - `body: UploadCompleteParams` - `file_format: "csv" | "json" | "jsonl" | "parquet"` Format of the uploaded file - `"csv"` - `"json"` - `"jsonl"` - `"parquet"` - `file_size_bytes: number` Size of the uploaded file in bytes - `name: string` Human-readable name for the dataset - `s3_key: string` S3 object key returned in the pre-signed URL response from /upload/initiate ### Returns - `UploadCompleteResponse` - `dataset_id: string` ID of the newly created dataset ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.upload.complete({ file_format: 'csv', file_size_bytes: 1048576, name: 'my-training-data', s3_key: 'uploads/550e8400-e29b-41d4-a716-446655440000/my-training-data.csv', }); console.log(response.dataset_id); ``` #### Response ```json { "dataset_id": "550e8400-e29b-41d4-a716-446655440000" } ``` ## Complete a file upload and trigger processing `client.datasets.upload.completeByID(stringdatasetID, UploadCompleteByIDParamsbody, RequestOptionsoptions?): UploadCompleteByIDResponse` **post** `/api/v1/datasets/{dataset_id}/upload/complete` File uploads only. Call after uploading bytes to the presigned URL from POST /datasets. Verifies the file exists in S3, then triggers the preprocessing pipeline. ### Parameters - `datasetID: string` - `body: UploadCompleteByIDParams` - `file_size_bytes: number` Size of the uploaded file in bytes (for verification) - `sha256?: string` SHA-256 hex digest of the uploaded file (for integrity verification) ### Returns - `UploadCompleteByIDResponse` - `dataset_id: string` ID of the dataset - `status: string` Current status of the dataset after completing upload ### Example ```typescript import Adaption from 'adaption'; const client = new Adaption({ apiKey: process.env['ADAPTION_API_KEY'], // This is the default and can be omitted }); const response = await client.datasets.upload.completeByID('dataset_id', { file_size_bytes: 1048576, }); console.log(response.dataset_id); ``` #### Response ```json { "dataset_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing" } ```