# Evaluation Test Cases ## Create `client.agents.evaluationTestCases.create(EvaluationTestCaseCreateParamsbody?, RequestOptionsoptions?): EvaluationTestCaseCreateResponse` **post** `/v2/gen-ai/evaluation_test_cases` To create an evaluation test-case send a POST request to `/v2/gen-ai/evaluation_test_cases`. ### Parameters - `body: EvaluationTestCaseCreateParams` - `dataset_uuid?: string` Dataset against which the test‑case is executed. - `description?: string` Description of the test case. - `metrics?: Array` Full metric list to use for evaluation test case. - `name?: string` Name of the test case. - `star_metric?: APIStarMetric` - `metric_uuid?: string` - `name?: string` - `success_threshold?: number` The success threshold for the star metric. This is a value that the metric must reach to be considered successful. - `success_threshold_pct?: number` The success threshold for the star metric. This is a percentage value between 0 and 100. - `workspace_uuid?: string` The workspace uuid. ### Returns - `EvaluationTestCaseCreateResponse` - `test_case_uuid?: string` Test‑case UUID. ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const evaluationTestCase = await client.agents.evaluationTestCases.create(); console.log(evaluationTestCase.test_case_uuid); ``` ## Retrieve `client.agents.evaluationTestCases.retrieve(stringtestCaseUuid, EvaluationTestCaseRetrieveParamsquery?, RequestOptionsoptions?): EvaluationTestCaseRetrieveResponse` **get** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}` To retrive information about an existing evaluation test case, send a GET request to `/v2/gen-ai/evaluation_test_case/{test_case_uuid}`. ### Parameters - `testCaseUuid: string` - `query: EvaluationTestCaseRetrieveParams` - `evaluation_test_case_version?: number` Version of the test case. ### Returns - `EvaluationTestCaseRetrieveResponse` - `evaluation_test_case?: APIEvaluationTestCase` - `archived_at?: string` - `created_at?: string` - `created_by_user_email?: string` - `created_by_user_id?: string` - `dataset?: Dataset` - `created_at?: string` Time created at. - `dataset_name?: string` Name of the dataset. - `dataset_uuid?: string` UUID of the dataset. - `file_size?: string` The size of the dataset uploaded file in bytes. - `has_ground_truth?: boolean` Does the dataset have a ground truth column? - `row_count?: number` Number of rows in the dataset. - `dataset_name?: string` - `dataset_uuid?: string` - `description?: string` - `latest_version_number_of_runs?: number` - `metrics?: Array` - `description?: string` - `inverted?: boolean` If true, the metric is inverted, meaning that a lower value is better. - `metric_name?: string` - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - `metric_uuid?: string` - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `range_max?: number` The maximum value for the metric. - `range_min?: number` The minimum value for the metric. - `name?: string` - `star_metric?: APIStarMetric` - `metric_uuid?: string` - `name?: string` - `success_threshold?: number` The success threshold for the star metric. This is a value that the metric must reach to be considered successful. - `success_threshold_pct?: number` The success threshold for the star metric. This is a percentage value between 0 and 100. - `test_case_uuid?: string` - `total_runs?: number` - `updated_at?: string` - `updated_by_user_email?: string` - `updated_by_user_id?: string` - `version?: number` ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const evaluationTestCase = await client.agents.evaluationTestCases.retrieve( '"123e4567-e89b-12d3-a456-426614174000"', ); console.log(evaluationTestCase.evaluation_test_case); ``` ## Update `client.agents.evaluationTestCases.update(stringtestCaseUuid, EvaluationTestCaseUpdateParamsbody?, RequestOptionsoptions?): EvaluationTestCaseUpdateResponse` **put** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}` To update an evaluation test-case send a PUT request to `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`. ### Parameters - `testCaseUuid: string` - `body: EvaluationTestCaseUpdateParams` - `dataset_uuid?: string` Dataset against which the test‑case is executed. - `description?: string` Description of the test case. - `metrics?: Metrics` - `metric_uuids?: Array` - `name?: string` Name of the test case. - `star_metric?: APIStarMetric` - `metric_uuid?: string` - `name?: string` - `success_threshold?: number` The success threshold for the star metric. This is a value that the metric must reach to be considered successful. - `success_threshold_pct?: number` The success threshold for the star metric. This is a percentage value between 0 and 100. - `test_case_uuid?: string` Test-case UUID to update ### Returns - `EvaluationTestCaseUpdateResponse` - `test_case_uuid?: string` - `version?: number` The new verson of the test case. ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const evaluationTestCase = await client.agents.evaluationTestCases.update( '"123e4567-e89b-12d3-a456-426614174000"', ); console.log(evaluationTestCase.test_case_uuid); ``` ## List `client.agents.evaluationTestCases.list(RequestOptionsoptions?): EvaluationTestCaseListResponse` **get** `/v2/gen-ai/evaluation_test_cases` To list all evaluation test cases, send a GET request to `/v2/gen-ai/evaluation_test_cases`. ### Returns - `EvaluationTestCaseListResponse` - `evaluation_test_cases?: Array` Alternative way of authentication for internal usage only - should not be exposed to public api - `archived_at?: string` - `created_at?: string` - `created_by_user_email?: string` - `created_by_user_id?: string` - `dataset?: Dataset` - `created_at?: string` Time created at. - `dataset_name?: string` Name of the dataset. - `dataset_uuid?: string` UUID of the dataset. - `file_size?: string` The size of the dataset uploaded file in bytes. - `has_ground_truth?: boolean` Does the dataset have a ground truth column? - `row_count?: number` Number of rows in the dataset. - `dataset_name?: string` - `dataset_uuid?: string` - `description?: string` - `latest_version_number_of_runs?: number` - `metrics?: Array` - `description?: string` - `inverted?: boolean` If true, the metric is inverted, meaning that a lower value is better. - `metric_name?: string` - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - `metric_uuid?: string` - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `range_max?: number` The maximum value for the metric. - `range_min?: number` The minimum value for the metric. - `name?: string` - `star_metric?: APIStarMetric` - `metric_uuid?: string` - `name?: string` - `success_threshold?: number` The success threshold for the star metric. This is a value that the metric must reach to be considered successful. - `success_threshold_pct?: number` The success threshold for the star metric. This is a percentage value between 0 and 100. - `test_case_uuid?: string` - `total_runs?: number` - `updated_at?: string` - `updated_by_user_email?: string` - `updated_by_user_id?: string` - `version?: number` ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const evaluationTestCases = await client.agents.evaluationTestCases.list(); console.log(evaluationTestCases.evaluation_test_cases); ``` ## List Evaluation Runs `client.agents.evaluationTestCases.listEvaluationRuns(stringevaluationTestCaseUuid, EvaluationTestCaseListEvaluationRunsParamsquery?, RequestOptionsoptions?): EvaluationTestCaseListEvaluationRunsResponse` **get** `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs` To list all evaluation runs by test case, send a GET request to `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs`. ### Parameters - `evaluationTestCaseUuid: string` - `query: EvaluationTestCaseListEvaluationRunsParams` - `evaluation_test_case_version?: number` Version of the test case. ### Returns - `EvaluationTestCaseListEvaluationRunsResponse` - `evaluation_runs?: Array` List of evaluation runs. - `agent_deleted?: boolean` Whether agent is deleted - `agent_name?: string` Agent name - `agent_uuid?: string` Agent UUID. - `agent_version_hash?: string` Version hash - `agent_workspace_uuid?: string` Agent workspace uuid - `created_by_user_email?: string` - `created_by_user_id?: string` - `error_description?: string` The error description - `evaluation_run_uuid?: string` Evaluation run UUID. - `evaluation_test_case_workspace_uuid?: string` Evaluation test case workspace uuid - `finished_at?: string` Run end time. - `pass_status?: boolean` The pass status of the evaluation run based on the star metric. - `queued_at?: string` Run queued time. - `run_level_metric_results?: Array` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `run_name?: string` Run name. - `star_metric_result?: APIEvaluationMetricResult` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `started_at?: string` Run start time. - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more` Evaluation Run Statuses - `"EVALUATION_RUN_STATUS_UNSPECIFIED"` - `"EVALUATION_RUN_QUEUED"` - `"EVALUATION_RUN_RUNNING_DATASET"` - `"EVALUATION_RUN_EVALUATING_RESULTS"` - `"EVALUATION_RUN_CANCELLING"` - `"EVALUATION_RUN_CANCELLED"` - `"EVALUATION_RUN_SUCCESSFUL"` - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"` - `"EVALUATION_RUN_FAILED"` - `test_case_description?: string` Test case description. - `test_case_name?: string` Test case name. - `test_case_uuid?: string` Test-case UUID. - `test_case_version?: number` Test-case-version. ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const response = await client.agents.evaluationTestCases.listEvaluationRuns( '"123e4567-e89b-12d3-a456-426614174000"', ); console.log(response.evaluation_runs); ``` ## Domain Types ### API Evaluation Test Case - `APIEvaluationTestCase` - `archived_at?: string` - `created_at?: string` - `created_by_user_email?: string` - `created_by_user_id?: string` - `dataset?: Dataset` - `created_at?: string` Time created at. - `dataset_name?: string` Name of the dataset. - `dataset_uuid?: string` UUID of the dataset. - `file_size?: string` The size of the dataset uploaded file in bytes. - `has_ground_truth?: boolean` Does the dataset have a ground truth column? - `row_count?: number` Number of rows in the dataset. - `dataset_name?: string` - `dataset_uuid?: string` - `description?: string` - `latest_version_number_of_runs?: number` - `metrics?: Array` - `description?: string` - `inverted?: boolean` If true, the metric is inverted, meaning that a lower value is better. - `metric_name?: string` - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - `metric_uuid?: string` - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `range_max?: number` The maximum value for the metric. - `range_min?: number` The minimum value for the metric. - `name?: string` - `star_metric?: APIStarMetric` - `metric_uuid?: string` - `name?: string` - `success_threshold?: number` The success threshold for the star metric. This is a value that the metric must reach to be considered successful. - `success_threshold_pct?: number` The success threshold for the star metric. This is a percentage value between 0 and 100. - `test_case_uuid?: string` - `total_runs?: number` - `updated_at?: string` - `updated_by_user_email?: string` - `updated_by_user_id?: string` - `version?: number` ### API Star Metric - `APIStarMetric` - `metric_uuid?: string` - `name?: string` - `success_threshold?: number` The success threshold for the star metric. This is a value that the metric must reach to be considered successful. - `success_threshold_pct?: number` The success threshold for the star metric. This is a percentage value between 0 and 100.