Skip to content
  • Auto
  • Light
  • Dark

Evaluation Test Cases

Evaluation Test Cases

Create Evaluation Test Case.
agents.evaluation_test_cases.create(EvaluationTestCaseCreateParams**kwargs) -> EvaluationTestCaseCreateResponse
post/v2/gen-ai/evaluation_test_cases
List Evaluation Test Cases
agents.evaluation_test_cases.list() -> EvaluationTestCaseListResponse
get/v2/gen-ai/evaluation_test_cases
List Evaluation Runs by Test Case
agents.evaluation_test_cases.list_evaluation_runs(strevaluation_test_case_uuid, EvaluationTestCaseListEvaluationRunsParams**kwargs) -> EvaluationTestCaseListEvaluationRunsResponse
get/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs
Retrieve Information About an Existing Evaluation Test Case
agents.evaluation_test_cases.retrieve(strtest_case_uuid, EvaluationTestCaseRetrieveParams**kwargs) -> EvaluationTestCaseRetrieveResponse
get/v2/gen-ai/evaluation_test_cases/{test_case_uuid}
Update an Evaluation Test Case.
agents.evaluation_test_cases.update(strpath_test_case_uuid, EvaluationTestCaseUpdateParams**kwargs) -> EvaluationTestCaseUpdateResponse
put/v2/gen-ai/evaluation_test_cases/{test_case_uuid}
ModelsExpand Collapse
class APIEvaluationTestCase:
archived_at: Optional[datetime]
created_at: Optional[datetime]
created_by_user_email: Optional[str]
created_by_user_id: Optional[str]
dataset: Optional[Dataset]
created_at: Optional[datetime]

Time created at.

formatdate-time
dataset_name: Optional[str]

Name of the dataset.

dataset_uuid: Optional[str]

UUID of the dataset.

file_size: Optional[str]

The size of the dataset uploaded file in bytes.

formatuint64
has_ground_truth: Optional[bool]

Does the dataset have a ground truth column?

row_count: Optional[int]

Number of rows in the dataset.

formatint64
dataset_name: Optional[str]
dataset_uuid: Optional[str]
description: Optional[str]
latest_version_number_of_runs: Optional[int]
metrics: Optional[List[APIEvaluationMetric]]
description: Optional[str]
inverted: Optional[bool]

If true, the metric is inverted, meaning that a lower value is better.

metric_name: Optional[str]
metric_type: Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]
Accepts one of the following:
"METRIC_TYPE_UNSPECIFIED"
"METRIC_TYPE_GENERAL_QUALITY"
"METRIC_TYPE_RAG_AND_TOOL"
metric_uuid: Optional[str]
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
range_max: Optional[float]

The maximum value for the metric.

formatfloat
range_min: Optional[float]

The minimum value for the metric.

formatfloat
name: Optional[str]
star_metric: Optional[APIStarMetric]
test_case_uuid: Optional[str]
total_runs: Optional[int]
updated_at: Optional[datetime]
updated_by_user_email: Optional[str]
updated_by_user_id: Optional[str]
version: Optional[int]
class APIStarMetric:
metric_uuid: Optional[str]
name: Optional[str]
success_threshold: Optional[float]

The success threshold for the star metric. This is a value that the metric must reach to be considered successful.

formatfloat
success_threshold_pct: Optional[int]

The success threshold for the star metric. This is a percentage value between 0 and 100.

formatint32