LLM-based Answer Relevance
Definition
LLM-based Answer Relevance outputs a score between 0.0 - 1.0 assessing the consistency of the generated answer based on the reference ground truth answers.
Scoring rubric in LLM Prompt:
- 0.0 means that the answer is completely irrelevant to the question.
- 0.5 means that the answer is partially relevant to the question or it only partially answers the question.
- 1.0 means that the answer is relevant to the question and completely answers the question.
Example Usage
Required data items: question
, answer
from continuous_eval.metrics.generation.text import LLMBasedAnswerRelevancefrom continuous_eval.llm_factory import LLMFactory
datum = { "question": "Who wrote 'Romeo and Juliet'?", "answer": "Shakespeare wrote 'Romeo and Juliet'",}
metric = LLMBasedAnswerRelevance(LLMFactory("gpt-4-1106-preview"))print(metric(**datum))
Sample Output
{ 'LLM_based_answer_relevance': 1.0, 'LLM_based_answer_relevance_reasoning': "The answer is relevant to the question and completely answers the question by correctly identifying Shakespeare as the author of 'Romeo and Juliet'."}