Prompt Optimization Supported Metrics

Prompt Optimization metrics are special metrics based on the evaluation metrics that are designed to evaluate the quality of the prompts generated by the Prompt Optimization API. The following metrics are supported:

Metric Name	Description	Required dataset fields
Correctness	Measures how close the generated answer is the the ground truth reference answers.	`question`, `ground_truth_answers`
Exact Match	Similar to Correctness and Output Correctness but performs a string comparison.	`ground_truth_answers`
Token Recall	Calculates how much of the ground truth answer is covered in the generated answer (token overlap).	`ground_truth_answers`
Rouge	Measures the longest common subsequence between the generated answer and the ground truth answers..	`ground_truth_answers`
Style Consistency	Assess the style aspects such as tone, verbosity, formality, complexity, terminology, etc. and completeness of the generated answer based on the ground truth answer.	`ground_truth_answers`
Faithfulness	Measures how faithful the generated answer is to the ground truth context (i.e., it's not hallucinating).	`question`, `ground_truth_context`
Relevance	Measures how relevant is the generated answer is to the question.	`question`
SQL Correctness	Measures how close the generated SQL query is to the ground truth SQL query.	`question`,`ground_truth_answers`