Evaluation Test Cases

List Evaluation Test Cases

agents.evaluation_test_cases.list() -> EvaluationTestCaseListResponse

get/v2/gen-ai/evaluation_test_cases

Create Evaluation Test Case.

agents.evaluation_test_cases.create() -> EvaluationTestCaseCreateResponse

post/v2/gen-ai/evaluation_test_cases

List Evaluation Runs by Test Case

agents.evaluation_test_cases.list_evaluation_runs(, ) -> EvaluationTestCaseListEvaluationRunsResponse

get/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs

Retrieve Information About an Existing Evaluation Test Case

agents.evaluation_test_cases.retrieve(, ) -> EvaluationTestCaseRetrieveResponse

get/v2/gen-ai/evaluation_test_cases/{test_case_uuid}

Update an Evaluation Test Case.

agents.evaluation_test_cases.update(, ) -> EvaluationTestCaseUpdateResponse

put/v2/gen-ai/evaluation_test_cases/{test_case_uuid}

ModelsExpand Collapse

class APIEvaluationTestCase: …

archived_at: Optional[datetime]

formatdate-time

created_at: Optional[datetime]

formatdate-time

created_by_user_email: Optional[str]

created_by_user_id: Optional[str]

formatuint64

dataset: Optional[Dataset]

created_at: Optional[datetime]

Time created at.

formatdate-time

dataset_name: Optional[str]

Name of the dataset.

dataset_uuid: Optional[str]

UUID of the dataset.

file_size: Optional[str]

The size of the dataset uploaded file in bytes.

formatuint64

has_ground_truth: Optional[bool]

Does the dataset have a ground truth column?

row_count: Optional[int]

Number of rows in the dataset.

formatint64

dataset_name: Optional[str]

dataset_uuid: Optional[str]

description: Optional[str]

latest_version_number_of_runs: Optional[int]

formatint32

metrics: Optional[List[APIEvaluationMetric]]

category: Optional[Literal["METRIC_CATEGORY_UNSPECIFIED", "METRIC_CATEGORY_CORRECTNESS", "METRIC_CATEGORY_USER_OUTCOMES", 3 more]]

Accepts one of the following:

"METRIC_CATEGORY_UNSPECIFIED"

"METRIC_CATEGORY_CORRECTNESS"

"METRIC_CATEGORY_USER_OUTCOMES"

"METRIC_CATEGORY_SAFETY_AND_SECURITY"

"METRIC_CATEGORY_CONTEXT_QUALITY"

"METRIC_CATEGORY_MODEL_FIT"

description: Optional[str]

inverted: Optional[bool]

If true, the metric is inverted, meaning that a lower value is better.

is_metric_goal: Optional[bool]

metric_name: Optional[str]

metric_rank: Optional[int]

formatint64

metric_type: Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]

Accepts one of the following:

"METRIC_TYPE_UNSPECIFIED"

"METRIC_TYPE_GENERAL_QUALITY"

"METRIC_TYPE_RAG_AND_TOOL"

metric_uuid: Optional[str]

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

range_max: Optional[float]

The maximum value for the metric.

formatfloat

range_min: Optional[float]

The minimum value for the metric.

formatfloat

star_metric: Optional[APIStarMetric]

metric_uuid: Optional[str]

success_threshold: Optional[float]

The success threshold for the star metric. This is a value that the metric must reach to be considered successful.

formatfloat

success_threshold_pct: Optional[int]

The success threshold for the star metric. This is a percentage value between 0 and 100.

formatint32

test_case_uuid: Optional[str]

total_runs: Optional[int]

formatint32

updated_at: Optional[datetime]

formatdate-time

updated_by_user_email: Optional[str]

updated_by_user_id: Optional[str]

formatuint64

version: Optional[int]

formatint64

class APIStarMetric: …

metric_uuid: Optional[str]

success_threshold: Optional[float]

The success threshold for the star metric. This is a value that the metric must reach to be considered successful.

formatfloat

success_threshold_pct: Optional[int]

The success threshold for the star metric. This is a percentage value between 0 and 100.

formatint32