Skip to content
  • Auto
  • Light
  • Dark

Evaluation Runs

Evaluation Runs

Run an Evaluation Test Case
agents.evaluation_runs.create(EvaluationRunCreateParams**kwargs) -> EvaluationRunCreateResponse
post/v2/gen-ai/evaluation_runs
Retrieve Results of an Evaluation Run
agents.evaluation_runs.list_results(strevaluation_run_uuid, EvaluationRunListResultsParams**kwargs) -> EvaluationRunListResultsResponse
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results
Retrieve Information About an Existing Evaluation Run
agents.evaluation_runs.retrieve(strevaluation_run_uuid) -> EvaluationRunRetrieveResponse
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}
Retrieve Results of an Evaluation Run Prompt
agents.evaluation_runs.retrieve_results(intprompt_id, EvaluationRunRetrieveResultsParams**kwargs) -> EvaluationRunRetrieveResultsResponse
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}
ModelsExpand Collapse
class APIEvaluationMetric:
description: Optional[str]
inverted: Optional[bool]

If true, the metric is inverted, meaning that a lower value is better.

metric_name: Optional[str]
metric_type: Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]
Accepts one of the following:
"METRIC_TYPE_UNSPECIFIED"
"METRIC_TYPE_GENERAL_QUALITY"
"METRIC_TYPE_RAG_AND_TOOL"
metric_uuid: Optional[str]
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
range_max: Optional[float]

The maximum value for the metric.

formatfloat
range_min: Optional[float]

The minimum value for the metric.

formatfloat
class APIEvaluationMetricResult:
error_description: Optional[str]

Error description if the metric could not be calculated.

metric_name: Optional[str]

Metric name

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_value: Optional[float]

The value of the metric as a number.

formatdouble
reasoning: Optional[str]

Reasoning of the metric result.

string_value: Optional[str]

The value of the metric as a string.

class APIEvaluationPrompt:
ground_truth: Optional[str]

The ground truth for the prompt.

input: Optional[str]
input_tokens: Optional[str]

The number of input tokens used in the prompt.

formatuint64
output: Optional[str]
output_tokens: Optional[str]

The number of output tokens used in the prompt.

formatuint64
prompt_chunks: Optional[List[PromptChunk]]

The list of prompt chunks.

chunk_usage_pct: Optional[float]

The usage percentage of the chunk.

formatdouble
chunk_used: Optional[bool]

Indicates if the chunk was used in the prompt.

index_uuid: Optional[str]

The index uuid (Knowledge Base) of the chunk.

source_name: Optional[str]

The source name for the chunk, e.g., the file name or document title.

text: Optional[str]

Text content of the chunk.

prompt_id: Optional[int]

Prompt ID

formatint64
prompt_level_metric_results: Optional[List[APIEvaluationMetricResult]]

The metric results for the prompt.

error_description: Optional[str]

Error description if the metric could not be calculated.

metric_name: Optional[str]

Metric name

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_value: Optional[float]

The value of the metric as a number.

formatdouble
reasoning: Optional[str]

Reasoning of the metric result.

string_value: Optional[str]

The value of the metric as a string.

class APIEvaluationRun:
agent_deleted: Optional[bool]

Whether agent is deleted

agent_name: Optional[str]

Agent name

agent_uuid: Optional[str]

Agent UUID.

agent_version_hash: Optional[str]

Version hash

agent_workspace_uuid: Optional[str]

Agent workspace uuid

created_by_user_email: Optional[str]
created_by_user_id: Optional[str]
error_description: Optional[str]

The error description

evaluation_run_uuid: Optional[str]

Evaluation run UUID.

evaluation_test_case_workspace_uuid: Optional[str]

Evaluation test case workspace uuid

finished_at: Optional[datetime]

Run end time.

formatdate-time
pass_status: Optional[bool]

The pass status of the evaluation run based on the star metric.

queued_at: Optional[datetime]

Run queued time.

formatdate-time
run_level_metric_results: Optional[List[APIEvaluationMetricResult]]
error_description: Optional[str]

Error description if the metric could not be calculated.

metric_name: Optional[str]

Metric name

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_value: Optional[float]

The value of the metric as a number.

formatdouble
reasoning: Optional[str]

Reasoning of the metric result.

string_value: Optional[str]

The value of the metric as a string.

run_name: Optional[str]

Run name.

star_metric_result: Optional[APIEvaluationMetricResult]
started_at: Optional[datetime]

Run start time.

formatdate-time
status: Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]

Evaluation Run Statuses

Accepts one of the following:
"EVALUATION_RUN_STATUS_UNSPECIFIED"
"EVALUATION_RUN_QUEUED"
"EVALUATION_RUN_RUNNING_DATASET"
"EVALUATION_RUN_EVALUATING_RESULTS"
"EVALUATION_RUN_CANCELLING"
"EVALUATION_RUN_CANCELLED"
"EVALUATION_RUN_SUCCESSFUL"
"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"
"EVALUATION_RUN_FAILED"
test_case_description: Optional[str]

Test case description.

test_case_name: Optional[str]

Test case name.

test_case_uuid: Optional[str]

Test-case UUID.

test_case_version: Optional[int]

Test-case-version.

formatint64