Ranked-Aware Metrics

Definitions

Rank-aware metrics takes into account the order in which the contexts are retrieved.

Average Precision (AP) measures all relevant chunks retrieved and calculates weighted score. Mean of AP across dataset is frequently referred to as MAP.

Reciprocal Rank (RR) measures when your first relevant chunk appear in your retrieval. Mean of RR across dataset is frequently referred to as MRR.

Normalized Discounted Cumulative Gain (NDCG) accounts for the cases where your classification of relevancy is non-binary.

Matching Strategy

Please checkout explanation for Matching strategy in Matching Strategy

Example Usage

Required data items: retrieved_context, ground_truth_context

from continuous_eval.metrics.retrieval import RankedRetrievalMetrics, RougeChunkMatch

datum = {
    "retrieved_context": [
        "Lyon is a major city in France.",
        "Paris is the capital of France and also the largest city in the country.",
    ],
    "ground_truth_context": ["Paris is the capital of France."],
}

metric = RankedRetrievalMetrics(RougeChunkMatch())
print(metric(**datum))

Example Output

{
    'average_precision': 0.5,
    'reciprocal_rank': 0.5,
    'ndcg': 0.6309297535714574
}