Skip to main content

Context Precision, Recall, and F1

Definitions

Context Precision: measures signal vs. noise — what proportion of the retrieved contexts are relevant?

Context Precision=Relevant Retrieved ContextsAll Retrieved Contexts\text{Context Precision} = \frac{\text{Relevant Retrieved Contexts}}{\text{All Retrieved Contexts}}

Context Recall: measures completeness — what proportion of all relevant contexts are retrieved?

Context Recall=Relevant Retrieved ContextsAll Ground Truth Contexts\text{Context Recall} = \frac{\text{Relevant Retrieved Contexts}}{\text{All Ground Truth Contexts}}

F1: harmonic mean of precision and recall

F1 Score=2×Context Precision×Context RecallContext Precision+Context Recall\text{F1 Score} = 2 \times \frac{\text{Context Precision} \times \text{Context Recall}}{\text{Context Precision} + \text{Context Recall}}
tip

Context Recall should be the North Star metric for retrieval. This is because a retrieval system is only acceptable for generation if there is confidence that the retrieved context is complete enough to answer the question

Example Usage

Required data items: retrieved_context, ground_truth_context

res = client.metrics.compute(
Metric.PrecisionRecallF1,
args={
"retrieved_context": [
"Paris is the capital of France and also the largest city in the country.",
"Lyon is a major city in France.",
],
"ground_truth_context": ["Paris is the capital of France."],
},
)
print(res)

Example Output

{
'context_precision': 0.5,
'context_recall': 1.0,
'context_f1': 0.6666666666666666
}