Context Precision, Recall, and F1

Definitions

Context Precision: measures signal vs. noise — what proportion of the retrieved contexts are relevant?

\text{Context Precision} = \frac{\text{Relevant Retrieved Contexts}}{\text{All Retrieved Contexts}}

Context Recall: measures completeness — what proportion of all relevant contexts are retrieved?

\text{Context Recall} = \frac{\text{Relevant Retrieved Contexts}}{\text{All Ground Truth Contexts}}

F1: harmonic mean of precision and recall

\text{F1 Score} = 2 \times \frac{\text{Context Precision} \times \text{Context Recall}}{\text{Context Precision} + \text{Context Recall}}

tip

Context Recall should be the North Star metric for retrieval. This is because a retrieval system is only acceptable for generation if there is confidence that the retrieved context is complete enough to answer the question

Example Usage

Required data items: retrieved_context, ground_truth_context

res = client.metrics.compute(
    Metric.PrecisionRecallF1,
    args={
        "retrieved_context": [
            "Paris is the capital of France and also the largest city in the country.",
            "Lyon is a major city in France.",
        ],
        "ground_truth_context": ["Paris is the capital of France."],
    },
)
print(res)

Example Output

{
    'context_precision': 0.5, 
    'context_recall': 1.0, 
    'context_f1': 0.6666666666666666
}

Definitions​

Example Usage​

Example Output​

Definitions

Example Usage

Example Output