Overview

Metric Overview

Relari API offers the following metrics:

Module	Category	Metrics
Retrieval	Deterministic	PrecisionRecallF1, RankedRetrievalMetrics, PrecisionRecallF1Ext
Retrieval	LLM-based	LLMBasedContextPrecision, LLMBasedContextCoverage
Text Generation	Deterministic	DeterministicAnswerCorrectness, DeterministicFaithfulness, FleschKincaidReadability
Text Generation	LLM-based	LLMBasedFaithfulness, LLMBasedAnswerCorrectness, LLMBasedAnswerRelevance, LLMBasedStyleConsistency
Code Generation	Deterministic	CodeStringMatch, PythonASTSimilarity, SQLSyntaxMatch, SQLASTSimilarity
Classification	Deterministic	ClassificationAccuracy
Agent Tools	Deterministic	ToolSelectionAccuracy

Metric Definition and Inputs

Brief definition and required inputs of available metrics. Please check the individual metric pages for specific examples.

Retrieval metrics

Deterministic

PrecisionRecallF1

Definition: Rank-agnostic metrics including Precision, Recall, and F1 of Retrieved Contexts
Inputs: retrieved_context, ground_truth_context

RankedRetrievalMetrics

Definition: Rank-aware metrics including Mean Average Precision (MAP), Mean Reciprical Rank (MRR), NDCG (Normalized Discounted Cumulative Gain) of retrieved contexts
Inputs: retrieved_context, ground_truth_context

LLM-based

LLMBasedContextPrecision

Definition: Precision and Mean Average Precision (MAP) based on context relevancy classified by LLM
Inputs: question, retrieved_context

LLMBasedContextCoverage

Definition: Proportion of statements in ground truth answer that can be attributed to Retrieved Contexts calculated by LLM
Inputs: question, retrieved_context, ground_truth_answers

Text Generation metrics

Deterministic

DeterministicAnswerRelevance

Definition: Includes Token Overlap (Precision, Recall, F1), ROUGE-L (Precision, Recall, F1), and BLEU score of Generated Answer vs. Ground Truth Answer
Inputs: question, answer

DeterministicFaithfulness

Definition: Proportion of sentences in Answer that can be matched to Retrieved Contexts using ROUGE-L precision, Token Overlap precision, and BLEU score
Inputs: retrieved_context, answer

FleschKincaidReadability

Definition: How easy or difficult it is to understand the LLM generated answer.
Inputs: answer

LLM-based

LLMBasedFaithfulness

Definition: Binary classifications of whether the statements in the Generated Answer can be attributed to the Retrieved Contexts by LLM
Inputs: question, retrieved_context, answer

LLMBasedAnswerCorrectness

Definition: Overall correctness of the Generated Answer based on the Question and Ground Truth Answer calculated by LLM
Inputs: question, answer, ground_truth_answers

LLMBasedAnswerRelevance

Definition: Relevance of the Generated Answer with respect to the Question
Inputs: question, answer

LLMBasedStyleConsistency

Definition: Consistency of style between the Generated Answer and the Ground Truth Answer(s)
Inputs: answer, ground_truth_answers

Classification metrics

Deterministic

SingleLabelClassification

Definition: Proportion of correctly identified items out of the total items
Inputs: predicted_class, ground_truth_class

Code Generation metrics

Deterministic

CodeStringMatch

Definition: Exact and fuzzy match scores between generated code strings and the ground truth code strings
Inputs: answer, ground_truth_answers

PythonASTSimilarity

Definition: Similarity of Abstract Syntax Trees (ASTs) for Python code, comparing the generated code to the ground truth code
Inputs: answer, ground_truth_answers

SQLSyntaxMatch

Definition: Sntactic equivalence between generated SQL queries and a set of ground truth queries
Inputs: answer, ground_truth_answers

SQLASTSimilarity

Definition: Similarity of Abstract Syntax Trees (ASTs) for SQL queries, comparing the generated code to the ground truth code
Inputs: answer, ground_truth_answers

Agent Tools metrics

Deterministic

ToolSelectionAccuracy

Definition: Accuracy of selecting the correct tool(s) for a given task by the agent
Inputs: tools, ground_truths

Metric Overview​

Metric Definition and Inputs​

Retrieval metrics​

Deterministic​

LLM-based​

Text Generation metrics​

Deterministic​

LLM-based​

Classification metrics​

Deterministic​

Code Generation metrics​

Deterministic​

Agent Tools metrics​

Deterministic​

Metric Overview

Metric Definition and Inputs

Retrieval metrics

Deterministic

LLM-based

Text Generation metrics

Deterministic

LLM-based

Classification metrics

Deterministic

Code Generation metrics

Deterministic

Agent Tools metrics

Deterministic