Evaluation Runs
Evaluation Runs
Run an Evaluation Test Case
Retrieve Results of an Evaluation Run
Retrieve Information About an Existing Evaluation Run
Retrieve Results of an Evaluation Run Prompt
ModelsExpand Collapse
APIEvaluationMetric = object { description, inverted, metric_name, 5 more }
If true, the metric is inverted, meaning that a lower value is better.
metric_type: optional "METRIC_TYPE_UNSPECIFIED" or "METRIC_TYPE_GENERAL_QUALITY" or "METRIC_TYPE_RAG_AND_TOOL"
metric_value_type: optional "METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"
The maximum value for the metric.
The minimum value for the metric.
APIEvaluationMetricResult = object { error_description, metric_name, metric_value_type, 3 more }
Error description if the metric could not be calculated.
Metric name
metric_value_type: optional "METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
APIEvaluationPrompt = object { ground_truth, input, input_tokens, 5 more }
The ground truth for the prompt.
The number of input tokens used in the prompt.
The number of output tokens used in the prompt.
prompt_chunks: optional array of object { chunk_usage_pct, chunk_used, index_uuid, 2 more } The list of prompt chunks.
The list of prompt chunks.
The usage percentage of the chunk.
Indicates if the chunk was used in the prompt.
The index uuid (Knowledge Base) of the chunk.
The source name for the chunk, e.g., the file name or document title.
Text content of the chunk.
Prompt ID
prompt_level_metric_results: optional array of APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more } The metric results for the prompt.
The metric results for the prompt.
Error description if the metric could not be calculated.
Metric name
metric_value_type: optional "METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
APIEvaluationRun = object { agent_deleted, agent_name, agent_uuid, 19 more }
Whether agent is deleted
Agent name
Agent UUID.
Version hash
Agent workspace uuid
The error description
Evaluation run UUID.
Evaluation test case workspace uuid
Run end time.
The pass status of the evaluation run based on the star metric.
Run queued time.
run_level_metric_results: optional array of APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more }
Error description if the metric could not be calculated.
Metric name
metric_value_type: optional "METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
Run name.
star_metric_result: optional APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more }
Error description if the metric could not be calculated.
Metric name
metric_value_type: optional "METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
Run start time.
status: optional "EVALUATION_RUN_STATUS_UNSPECIFIED" or "EVALUATION_RUN_QUEUED" or "EVALUATION_RUN_RUNNING_DATASET" or 6 moreEvaluation Run Statuses
Evaluation Run Statuses
Test case description.
Test case name.
Test-case UUID.
Test-case-version.