Extended Context Precision & Recall
Definitions
This metric is similar Context Precision & Recall but with additional matching strategies to determine relevance.
Matching Strategy
Given that the ground truth contexts can be defined differently from the exact chunks retrieved. For example, a ground truth contexts can be a sentence that contains the information, while the contexts retrieved are uniform 512-token chunks. We have following matching strategies that determine relevance:
Match Type | Component | Retrieved Component Considered relevant if: |
---|---|---|
Rouge Chunk Match | Chunk | Match to a Ground Truth Context Chunk with ROUGE-L Recall > 0.7. |
Rouge Sentence Match | Sentence | Match to a Ground Truth Context Sentence with ROUGE-L Recall > 0.8. |
Example Usage
Required data items: retrieved_context
, ground_truth_context
res = client.metrics.compute(
Metric.PrecisionRecallF1Ext,
args={
"retrieved_context": [
"Paris is the capital of France and also the largest city in the country.",
"Lyon is a major city in France.",
],
"ground_truth_context": ["Paris is the capital of France."],
},
)
print(res)
Example Output
{
'sentence_precision': 1.0,
'sentence_recall': 1.0,
'sentence_f1': 1.0,
'chunk_precision': 1.0,
'chunk_recall': 1.0,
'chunk_f1': 1.0
}