Prompt Optimization Supported Metrics
Prompt Optimization metrics are special metrics based on the evaluation metrics that are designed to evaluate the quality of the prompts generated by the Prompt Optimization API. The following metrics are supported:
Metric Name | Description | Required dataset fields |
---|---|---|
Correctness | Measures how close the generated answer is the the ground truth reference answers. | question , ground_truth_answers |
Exact Match | Similar to Correctness and Output Correctness but performs a string comparison. | ground_truth_answers |
Token Recall | Calculates how much of the ground truth answer is covered in the generated answer (token overlap). | ground_truth_answers |
Rouge | Measures the longest common subsequence between the generated answer and the ground truth answers.. | ground_truth_answers |
Style Consistency | Assess the style aspects such as tone, verbosity, formality, complexity, terminology, etc. and completeness of the generated answer based on the ground truth answer. | ground_truth_answers |
Faithfulness | Measures how faithful the generated answer is to the ground truth context (i.e., it's not hallucinating). | question , ground_truth_context |
Relevance | Measures how relevant is the generated answer is to the question. | question |
SQL Correctness | Measures how close the generated SQL query is to the ground truth SQL query. | question ,ground_truth_answers |