Skip to main content

Prompt Optimization Supported Metrics

Prompt Optimization metrics are special metrics based on the evaluation metrics that are designed to evaluate the quality of the prompts generated by the Prompt Optimization API. The following metrics are supported:

Metric NameDescriptionRequired dataset fields
CorrectnessMeasures how close the generated answer is the the ground truth reference answers.question, ground_truth_answers
Exact MatchSimilar to Correctness and Output Correctness but performs a string comparison.ground_truth_answers
Token RecallCalculates how much of the ground truth answer is covered in the generated answer (token overlap).ground_truth_answers
RougeMeasures the longest common subsequence between the generated answer and the ground truth answers..ground_truth_answers
Style ConsistencyAssess the style aspects such as tone, verbosity, formality, complexity, terminology, etc. and completeness of the generated answer based on the ground truth answer.ground_truth_answers
FaithfulnessMeasures how faithful the generated answer is to the ground truth context (i.e., it's not hallucinating).question, ground_truth_context
RelevanceMeasures how relevant is the generated answer is to the question.question
SQL CorrectnessMeasures how close the generated SQL query is to the ground truth SQL query.question,ground_truth_answers