Skip to main content

Tool Selection Accuracy

Definitions

Tool Selection Accuracy measures how well an LLM selects a tool / function in a given module.

The used tools are compared with the expected tools and the metric outputs:

num_correct: total number of tools that are selected AND called with the correct arguments
score: num_correct / total number of tools in ground_truths

Example Usage

Required data items: tools, ground_truths

tools = [
    ToolCall(name="useless", kwargs={}),
    ToolCall(name="multiply", kwargs={"a": 2, "b": 3}),
]

ground_truths = [
    ToolCall(name="useless", kwargs={}),
    ToolCall(name="add", kwargs={"a": 2, "b": 3}),
]

res = client.metrics.compute(
    Metric.ToolCallAccuracy,
    args={
        "tools": tools,
        "ground_truths": ground_truths,
    },
)
print(res)

Example Output

{
    "num_correct": 1, 
    "score": 0.5
}

Definitions
Example Usage
Example Output