Context Precision, Recall, and F1
Definitions
Context Precision: measures signal vs. noise — what proportion of the retrieved contexts are relevant?
Context Recall: measures completeness — what proportion of all relevant contexts are retrieved?
F1: harmonic mean of precision and recall
tip
Context Recall should be the North Star metric for retrieval. This is because a retrieval system is only acceptable for generation if there is confidence that the retrieved context is complete enough to answer the question
Example Usage
Required data items: retrieved_context
, ground_truth_context
res = client.metrics.compute(
Metric.PrecisionRecallF1,
args={
"retrieved_context": [
"Paris is the capital of France and also the largest city in the country.",
"Lyon is a major city in France.",
],
"ground_truth_context": ["Paris is the capital of France."],
},
)
print(res)
Example Output
{
'context_precision': 0.5,
'context_recall': 1.0,
'context_f1': 0.6666666666666666
}