Skip to main content

Extended Context Precision & Recall

Definitions

This metric is similar Context Precision & Recall but with additional matching strategies to determine relevance.

Matching Strategy

Given that the ground truth contexts can be defined differently from the exact chunks retrieved. For example, a ground truth contexts can be a sentence that contains the information, while the contexts retrieved are uniform 512-token chunks. We have following matching strategies that determine relevance:

Match TypeComponentRetrieved Component Considered relevant if:
Rouge Chunk MatchChunkMatch to a Ground Truth Context Chunk with ROUGE-L Recall > 0.7.
Rouge Sentence MatchSentenceMatch to a Ground Truth Context Sentence with ROUGE-L Recall > 0.8.

Example Usage

Required data items: retrieved_context, ground_truth_context

res = client.metrics.compute(
Metric.PrecisionRecallF1Ext,
args={
"retrieved_context": [
"Paris is the capital of France and also the largest city in the country.",
"Lyon is a major city in France.",
],
"ground_truth_context": ["Paris is the capital of France."],
},
)
print(res)

Example Output

{
'sentence_precision': 1.0,
'sentence_recall': 1.0,
'sentence_f1': 1.0,
'chunk_precision': 1.0,
'chunk_recall': 1.0,
'chunk_f1': 1.0
}