# Evaluation Test Cases

## Create

`client.agents.evaluationTestCases.create(EvaluationTestCaseCreateParamsbody?, RequestOptionsoptions?): EvaluationTestCaseCreateResponse`

**post** `/v2/gen-ai/evaluation_test_cases`

To create an evaluation test-case send a POST request to `/v2/gen-ai/evaluation_test_cases`.

### Parameters

- `body: EvaluationTestCaseCreateParams`

  - `dataset_uuid?: string`

    Dataset against which the test‑case is executed.

  - `description?: string`

    Description of the test case.

  - `metrics?: Array<string>`

    Full metric list to use for evaluation test case.

  - `name?: string`

    Name of the test case.

  - `star_metric?: APIStarMetric`

    - `metric_uuid?: string`

    - `name?: string`

    - `success_threshold?: number`

      The success threshold for the star metric.
      This is a value that the metric must reach to be considered successful.

    - `success_threshold_pct?: number`

      The success threshold for the star metric.
      This is a percentage value between 0 and 100.

  - `workspace_uuid?: string`

    The workspace uuid.

### Returns

- `EvaluationTestCaseCreateResponse`

  - `test_case_uuid?: string`

    Test‑case UUID.

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const evaluationTestCase = await client.agents.evaluationTestCases.create();

console.log(evaluationTestCase.test_case_uuid);
```

## Retrieve

`client.agents.evaluationTestCases.retrieve(stringtestCaseUuid, EvaluationTestCaseRetrieveParamsquery?, RequestOptionsoptions?): EvaluationTestCaseRetrieveResponse`

**get** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`

To retrive information about an existing evaluation test case, send a GET request to `/v2/gen-ai/evaluation_test_case/{test_case_uuid}`.

### Parameters

- `testCaseUuid: string`

- `query: EvaluationTestCaseRetrieveParams`

  - `evaluation_test_case_version?: number`

    Version of the test case.

### Returns

- `EvaluationTestCaseRetrieveResponse`

  - `evaluation_test_case?: APIEvaluationTestCase`

    - `archived_at?: string`

    - `created_at?: string`

    - `created_by_user_email?: string`

    - `created_by_user_id?: string`

    - `dataset?: Dataset`

      - `created_at?: string`

        Time created at.

      - `dataset_name?: string`

        Name of the dataset.

      - `dataset_uuid?: string`

        UUID of the dataset.

      - `file_size?: string`

        The size of the dataset uploaded file in bytes.

      - `has_ground_truth?: boolean`

        Does the dataset have a ground truth column?

      - `row_count?: number`

        Number of rows in the dataset.

    - `dataset_name?: string`

    - `dataset_uuid?: string`

    - `description?: string`

    - `latest_version_number_of_runs?: number`

    - `metrics?: Array<APIEvaluationMetric>`

      - `description?: string`

      - `inverted?: boolean`

        If true, the metric is inverted, meaning that a lower value is better.

      - `metric_name?: string`

      - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"`

        - `"METRIC_TYPE_UNSPECIFIED"`

        - `"METRIC_TYPE_GENERAL_QUALITY"`

        - `"METRIC_TYPE_RAG_AND_TOOL"`

      - `metric_uuid?: string`

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `range_max?: number`

        The maximum value for the metric.

      - `range_min?: number`

        The minimum value for the metric.

    - `name?: string`

    - `star_metric?: APIStarMetric`

      - `metric_uuid?: string`

      - `name?: string`

      - `success_threshold?: number`

        The success threshold for the star metric.
        This is a value that the metric must reach to be considered successful.

      - `success_threshold_pct?: number`

        The success threshold for the star metric.
        This is a percentage value between 0 and 100.

    - `test_case_uuid?: string`

    - `total_runs?: number`

    - `updated_at?: string`

    - `updated_by_user_email?: string`

    - `updated_by_user_id?: string`

    - `version?: number`

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const evaluationTestCase = await client.agents.evaluationTestCases.retrieve(
  '"123e4567-e89b-12d3-a456-426614174000"',
);

console.log(evaluationTestCase.evaluation_test_case);
```

## Update

`client.agents.evaluationTestCases.update(stringtestCaseUuid, EvaluationTestCaseUpdateParamsbody?, RequestOptionsoptions?): EvaluationTestCaseUpdateResponse`

**put** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`

To update an evaluation test-case send a PUT request to `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`.

### Parameters

- `testCaseUuid: string`

- `body: EvaluationTestCaseUpdateParams`

  - `dataset_uuid?: string`

    Dataset against which the test‑case is executed.

  - `description?: string`

    Description of the test case.

  - `metrics?: Metrics`

    - `metric_uuids?: Array<string>`

  - `name?: string`

    Name of the test case.

  - `star_metric?: APIStarMetric`

    - `metric_uuid?: string`

    - `name?: string`

    - `success_threshold?: number`

      The success threshold for the star metric.
      This is a value that the metric must reach to be considered successful.

    - `success_threshold_pct?: number`

      The success threshold for the star metric.
      This is a percentage value between 0 and 100.

  - `test_case_uuid?: string`

    Test-case UUID to update

### Returns

- `EvaluationTestCaseUpdateResponse`

  - `test_case_uuid?: string`

  - `version?: number`

    The new verson of the test case.

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const evaluationTestCase = await client.agents.evaluationTestCases.update(
  '"123e4567-e89b-12d3-a456-426614174000"',
);

console.log(evaluationTestCase.test_case_uuid);
```

## List

`client.agents.evaluationTestCases.list(RequestOptionsoptions?): EvaluationTestCaseListResponse`

**get** `/v2/gen-ai/evaluation_test_cases`

To list all evaluation test cases, send a GET request to `/v2/gen-ai/evaluation_test_cases`.

### Returns

- `EvaluationTestCaseListResponse`

  - `evaluation_test_cases?: Array<APIEvaluationTestCase>`

    Alternative way of authentication for internal usage only - should not be exposed to public api

    - `archived_at?: string`

    - `created_at?: string`

    - `created_by_user_email?: string`

    - `created_by_user_id?: string`

    - `dataset?: Dataset`

      - `created_at?: string`

        Time created at.

      - `dataset_name?: string`

        Name of the dataset.

      - `dataset_uuid?: string`

        UUID of the dataset.

      - `file_size?: string`

        The size of the dataset uploaded file in bytes.

      - `has_ground_truth?: boolean`

        Does the dataset have a ground truth column?

      - `row_count?: number`

        Number of rows in the dataset.

    - `dataset_name?: string`

    - `dataset_uuid?: string`

    - `description?: string`

    - `latest_version_number_of_runs?: number`

    - `metrics?: Array<APIEvaluationMetric>`

      - `description?: string`

      - `inverted?: boolean`

        If true, the metric is inverted, meaning that a lower value is better.

      - `metric_name?: string`

      - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"`

        - `"METRIC_TYPE_UNSPECIFIED"`

        - `"METRIC_TYPE_GENERAL_QUALITY"`

        - `"METRIC_TYPE_RAG_AND_TOOL"`

      - `metric_uuid?: string`

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `range_max?: number`

        The maximum value for the metric.

      - `range_min?: number`

        The minimum value for the metric.

    - `name?: string`

    - `star_metric?: APIStarMetric`

      - `metric_uuid?: string`

      - `name?: string`

      - `success_threshold?: number`

        The success threshold for the star metric.
        This is a value that the metric must reach to be considered successful.

      - `success_threshold_pct?: number`

        The success threshold for the star metric.
        This is a percentage value between 0 and 100.

    - `test_case_uuid?: string`

    - `total_runs?: number`

    - `updated_at?: string`

    - `updated_by_user_email?: string`

    - `updated_by_user_id?: string`

    - `version?: number`

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const evaluationTestCases = await client.agents.evaluationTestCases.list();

console.log(evaluationTestCases.evaluation_test_cases);
```

## List Evaluation Runs

`client.agents.evaluationTestCases.listEvaluationRuns(stringevaluationTestCaseUuid, EvaluationTestCaseListEvaluationRunsParamsquery?, RequestOptionsoptions?): EvaluationTestCaseListEvaluationRunsResponse`

**get** `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs`

To list all evaluation runs by test case, send a GET request to `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs`.

### Parameters

- `evaluationTestCaseUuid: string`

- `query: EvaluationTestCaseListEvaluationRunsParams`

  - `evaluation_test_case_version?: number`

    Version of the test case.

### Returns

- `EvaluationTestCaseListEvaluationRunsResponse`

  - `evaluation_runs?: Array<APIEvaluationRun>`

    List of evaluation runs.

    - `agent_deleted?: boolean`

      Whether agent is deleted

    - `agent_name?: string`

      Agent name

    - `agent_uuid?: string`

      Agent UUID.

    - `agent_version_hash?: string`

      Version hash

    - `agent_workspace_uuid?: string`

      Agent workspace uuid

    - `created_by_user_email?: string`

    - `created_by_user_id?: string`

    - `error_description?: string`

      The error description

    - `evaluation_run_uuid?: string`

      Evaluation run UUID.

    - `evaluation_test_case_workspace_uuid?: string`

      Evaluation test case workspace uuid

    - `finished_at?: string`

      Run end time.

    - `pass_status?: boolean`

      The pass status of the evaluation run based on the star metric.

    - `queued_at?: string`

      Run queued time.

    - `run_level_metric_results?: Array<APIEvaluationMetricResult>`

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

    - `run_name?: string`

      Run name.

    - `star_metric_result?: APIEvaluationMetricResult`

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

    - `started_at?: string`

      Run start time.

    - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more`

      Evaluation Run Statuses

      - `"EVALUATION_RUN_STATUS_UNSPECIFIED"`

      - `"EVALUATION_RUN_QUEUED"`

      - `"EVALUATION_RUN_RUNNING_DATASET"`

      - `"EVALUATION_RUN_EVALUATING_RESULTS"`

      - `"EVALUATION_RUN_CANCELLING"`

      - `"EVALUATION_RUN_CANCELLED"`

      - `"EVALUATION_RUN_SUCCESSFUL"`

      - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"`

      - `"EVALUATION_RUN_FAILED"`

    - `test_case_description?: string`

      Test case description.

    - `test_case_name?: string`

      Test case name.

    - `test_case_uuid?: string`

      Test-case UUID.

    - `test_case_version?: number`

      Test-case-version.

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const response = await client.agents.evaluationTestCases.listEvaluationRuns(
  '"123e4567-e89b-12d3-a456-426614174000"',
);

console.log(response.evaluation_runs);
```

## Domain Types

### API Evaluation Test Case

- `APIEvaluationTestCase`

  - `archived_at?: string`

  - `created_at?: string`

  - `created_by_user_email?: string`

  - `created_by_user_id?: string`

  - `dataset?: Dataset`

    - `created_at?: string`

      Time created at.

    - `dataset_name?: string`

      Name of the dataset.

    - `dataset_uuid?: string`

      UUID of the dataset.

    - `file_size?: string`

      The size of the dataset uploaded file in bytes.

    - `has_ground_truth?: boolean`

      Does the dataset have a ground truth column?

    - `row_count?: number`

      Number of rows in the dataset.

  - `dataset_name?: string`

  - `dataset_uuid?: string`

  - `description?: string`

  - `latest_version_number_of_runs?: number`

  - `metrics?: Array<APIEvaluationMetric>`

    - `description?: string`

    - `inverted?: boolean`

      If true, the metric is inverted, meaning that a lower value is better.

    - `metric_name?: string`

    - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"`

      - `"METRIC_TYPE_UNSPECIFIED"`

      - `"METRIC_TYPE_GENERAL_QUALITY"`

      - `"METRIC_TYPE_RAG_AND_TOOL"`

    - `metric_uuid?: string`

    - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

      - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

      - `"METRIC_VALUE_TYPE_NUMBER"`

      - `"METRIC_VALUE_TYPE_STRING"`

      - `"METRIC_VALUE_TYPE_PERCENTAGE"`

    - `range_max?: number`

      The maximum value for the metric.

    - `range_min?: number`

      The minimum value for the metric.

  - `name?: string`

  - `star_metric?: APIStarMetric`

    - `metric_uuid?: string`

    - `name?: string`

    - `success_threshold?: number`

      The success threshold for the star metric.
      This is a value that the metric must reach to be considered successful.

    - `success_threshold_pct?: number`

      The success threshold for the star metric.
      This is a percentage value between 0 and 100.

  - `test_case_uuid?: string`

  - `total_runs?: number`

  - `updated_at?: string`

  - `updated_by_user_email?: string`

  - `updated_by_user_id?: string`

  - `version?: number`

### API Star Metric

- `APIStarMetric`

  - `metric_uuid?: string`

  - `name?: string`

  - `success_threshold?: number`

    The success threshold for the star metric.
    This is a value that the metric must reach to be considered successful.

  - `success_threshold_pct?: number`

    The success threshold for the star metric.
    This is a percentage value between 0 and 100.