Tool Selection Accuracy
Definitions
Tool Selection Accuracy measures how well an LLM selects a tool / function in a given module.
The used tools are compared with the expected tools and the metric outputs:
num_correct
: total number of tools that are selected AND called with the correct argumentsscore
:num_correct
/ total number of tools inground_truths
Example Usage
Required data items: tools
, ground_truths
tools = [
ToolCall(name="useless", kwargs={}),
ToolCall(name="multiply", kwargs={"a": 2, "b": 3}),
]
ground_truths = [
ToolCall(name="useless", kwargs={}),
ToolCall(name="add", kwargs={"a": 2, "b": 3}),
]
res = client.metrics.compute(
Metric.ToolCallAccuracy,
args={
"tools": tools,
"ground_truths": ground_truths,
},
)
print(res)
Example Output
{
"num_correct": 1,
"score": 0.5
}