List Evaluation Runs

List Evaluation Runs by Test Case

client.agents.evaluationTestCases.listEvaluationRuns(, ?, ?): EvaluationTestCaseListEvaluationRunsResponse { evaluation_runs }

get/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs

To list all evaluation runs by test case, send a GET request to /v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs.

ParametersExpand Collapse

evaluationTestCaseUuid: string

query: EvaluationTestCaseListEvaluationRunsParams { evaluation_test_case_version }

evaluation_test_case_version?: number

Version of the test case.

ReturnsExpand Collapse

EvaluationTestCaseListEvaluationRunsResponse { evaluation_runs }

evaluation_runs?: Array<APIEvaluationRun { agent_deleted, agent_name, agent_uuid, 19 more } >

List of evaluation runs.

agent_deleted?: boolean

Whether agent is deleted

agent_name?: string

Agent name

agent_uuid?: string

Agent UUID.

agent_version_hash?: string

Version hash

agent_workspace_uuid?: string

Agent workspace uuid

created_by_user_email?: string

created_by_user_id?: string

error_description?: string

The error description

evaluation_run_uuid?: string

Evaluation run UUID.

evaluation_test_case_workspace_uuid?: string

Evaluation test case workspace uuid

finished_at?: string

Run end time.

formatdate-time

pass_status?: boolean

The pass status of the evaluation run based on the star metric.

queued_at?: string

Run queued time.

formatdate-time

run_level_metric_results?: Array<APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more } >

error_description?: string

Error description if the metric could not be calculated.

metric_name?: string

Metric name

metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

number_value?: number

The value of the metric as a number.

formatdouble

reasoning?: string

Reasoning of the metric result.

string_value?: string

The value of the metric as a string.

run_name?: string

Run name.

star_metric_result?: APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more }

error_description?: string

Error description if the metric could not be calculated.

metric_name?: string

Metric name

metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

number_value?: number

The value of the metric as a number.

formatdouble

reasoning?: string

Reasoning of the metric result.

string_value?: string

The value of the metric as a string.

started_at?: string

Run start time.

formatdate-time

status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more

Evaluation Run Statuses

Accepts one of the following:

"EVALUATION_RUN_STATUS_UNSPECIFIED"

"EVALUATION_RUN_QUEUED"

"EVALUATION_RUN_RUNNING_DATASET"

"EVALUATION_RUN_EVALUATING_RESULTS"

"EVALUATION_RUN_CANCELLING"

"EVALUATION_RUN_CANCELLED"

"EVALUATION_RUN_SUCCESSFUL"

"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"

"EVALUATION_RUN_FAILED"

test_case_description?: string

Test case description.

test_case_name?: string

Test case name.

test_case_uuid?: string

Test-case UUID.

test_case_version?: number

Test-case-version.

formatint64

List Evaluation Runs by Test Case

import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const response = await client.agents.evaluationTestCases.listEvaluationRuns(
  '"123e4567-e89b-12d3-a456-426614174000"',
);

console.log(response.evaluation_runs);

{
  "evaluation_runs": [
    {
      "agent_deleted": true,
      "agent_name": "example name",
      "agent_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "agent_version_hash": "example string",
      "agent_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "created_by_user_email": "[email protected]",
      "created_by_user_id": "12345",
      "error_description": "example string",
      "evaluation_run_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "evaluation_test_case_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "finished_at": "2023-01-01T00:00:00Z",
      "pass_status": true,
      "queued_at": "2023-01-01T00:00:00Z",
      "run_level_metric_results": [
        {
          "error_description": "example string",
          "metric_name": "example name",
          "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
          "number_value": 123,
          "reasoning": "example string",
          "string_value": "example string"
        }
      ],
      "run_name": "example name",
      "star_metric_result": {
        "error_description": "example string",
        "metric_name": "example name",
        "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
        "number_value": 123,
        "reasoning": "example string",
        "string_value": "example string"
      },
      "started_at": "2023-01-01T00:00:00Z",
      "status": "EVALUATION_RUN_STATUS_UNSPECIFIED",
      "test_case_description": "example string",
      "test_case_name": "example name",
      "test_case_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "test_case_version": 123
    }
  ]
}

Returns Examples

{
  "evaluation_runs": [
    {
      "agent_deleted": true,
      "agent_name": "example name",
      "agent_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "agent_version_hash": "example string",
      "agent_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "created_by_user_email": "[email protected]",
      "created_by_user_id": "12345",
      "error_description": "example string",
      "evaluation_run_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "evaluation_test_case_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "finished_at": "2023-01-01T00:00:00Z",
      "pass_status": true,
      "queued_at": "2023-01-01T00:00:00Z",
      "run_level_metric_results": [
        {
          "error_description": "example string",
          "metric_name": "example name",
          "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
          "number_value": 123,
          "reasoning": "example string",
          "string_value": "example string"
        }
      ],
      "run_name": "example name",
      "star_metric_result": {
        "error_description": "example string",
        "metric_name": "example name",
        "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
        "number_value": 123,
        "reasoning": "example string",
        "string_value": "example string"
      },
      "started_at": "2023-01-01T00:00:00Z",
      "status": "EVALUATION_RUN_STATUS_UNSPECIFIED",
      "test_case_description": "example string",
      "test_case_name": "example name",
      "test_case_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "test_case_version": 123
    }
  ]
}