Evaluation Runs
Evaluation Runs
Run an Evaluation Test Case
Retrieve Results of an Evaluation Run
Retrieve Information About an Existing Evaluation Run
Retrieve Results of an Evaluation Run Prompt
ModelsExpand Collapse
class APIEvaluationMetric: …
If true, the metric is inverted, meaning that a lower value is better.
metric_type: Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
The maximum value for the metric.
The minimum value for the metric.
class APIEvaluationMetricResult: …
Error description if the metric could not be calculated.
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
class APIEvaluationPrompt: …
The ground truth for the prompt.
The number of input tokens used in the prompt.
The number of output tokens used in the prompt.
prompt_chunks: Optional[List[PromptChunk]]The list of prompt chunks.
The list of prompt chunks.
The usage percentage of the chunk.
Indicates if the chunk was used in the prompt.
The index uuid (Knowledge Base) of the chunk.
The source name for the chunk, e.g., the file name or document title.
Text content of the chunk.
Prompt ID
The metric results for the prompt.
The metric results for the prompt.
Error description if the metric could not be calculated.
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
class APIEvaluationRun: …
Whether agent is deleted
Agent name
Agent UUID.
Version hash
Agent workspace uuid
The error description
Evaluation run UUID.
Evaluation test case workspace uuid
Run end time.
The pass status of the evaluation run based on the star metric.
Run queued time.
Error description if the metric could not be calculated.
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
Run name.
star_metric_result: Optional[APIEvaluationMetricResult]
Error description if the metric could not be calculated.
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
Run start time.
status: Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]Evaluation Run Statuses
Evaluation Run Statuses
Test case description.
Test case name.
Test-case UUID.
Test-case-version.