Experiments (Evaluations)
Evaluations are used to run asynchronous metric computations over datasets. You can create a new evaluation, list all evaluations, and get the status of an evaluation.
Evaluations on user-provided data
- Python
- Node
import os
from relari import RelariClient
from relari import Metric
client = RelariClient()
data = [
{
"ground_truth_context": [
"Software as a service is a way of delivering applications remotely over the internet."
],
"retrieved_context": [
"Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
],
},
{
"ground_truth_context": [
"Python's asyncio module provides a framework for writing concurrent code using async/await syntax."
],
"retrieved_context": [
"The asyncio module in Python is used for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives."
],
},
]
eval_id = client.evaluations.submit(
project_id="665ddb7bbdce1320a58e2ce7",
name=None,
metadata=dict(),
pipeline=[Metric.PrecisionRecallF1, Metric.RankedRetrievalMetrics],
data=data,
)
import { RelariClient, MetricName } from "relari-sdk"
const client = new RelariClient()
const data = [
{
"ground_truth_context": [
"Software as a service is a way of delivering applications remotely over the internet."
],
"retrieved_context": [
"Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
],
},
{
"ground_truth_context": [
"Python's asyncio module provides a framework for writing concurrent code using async/await syntax."
],
"retrieved_context": [
"The asyncio module in Python is used for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives."
],
},
]
const createdEval = await relariClient.experiments.submit(
project,
undefined, // Name
[MetricName.PrecisionRecallF1, MetricName.RankedRetrievalMetrics],
data,
)
const evalId = createdEval.id
In the pipeline field, you can specify the metrics you want to compute. The data
field is a list of dictionaries, each representing a datum. Each datum is required to have all the fields needed for the metrics you want to compute.
The snippet above will return the evaluation id, you can check the status of the evaluation using the CLI or the SDK.
relari-cli evaluations status EVALUATION_ID
If the evaluation's status is COMPLETED
you can download the results with the CLI (which will save a JSON file with the results in the current directory) or the SDK:
- CLI
- Python
- Node
relari-cli evaluations get EVALUATION_ID
eval_data = client.evaluations.get(EVALUATION_ID)
const evalData = await relariClient.experiments.get(evalId)
In the eval_data['results']
(or evalData.results
) you will find the results of the evaluation.
It's a dictionary indexed by the uid
(unique identifier of the datum) and the metrics computed, each element contains:
datum
: Dictionary with the datummetrics
: Dictionary with the metrics computed
Evaluation over Relari-hosted Datasets
You can also run evaluation over Relari-hosted datasets. The process is similar to the one described above, but you need to provide the dataset ID instead and use DatasetDatum
to specify the pipeline results.
- Python
- Node
from relari import RelariClient
from relari import Metric
from relari.core.types import DatasetDatum
client = RelariClient()
data = [
DatasetDatum(
label="22", # unique identifier for the datum
data={
"retrieved_context": [
"Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
],
},
),
DatasetDatum(
label="23", # unique identifier for the datum
data={
"retrieved_context": [
"The asyncio module in Python is used for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives."
],
},
),
]
eval_id = client.evaluations.submit(
project_id=PROJECT_ID,
dataset=DATASET_ID,
name=None,
metadata=dict(), # optional metadata
pipeline=[Metric.PrecisionRecallF1, Metric.RankedRetrievalMetrics],
data=data,
)
import { RelariClient, MetricName } from "relari-sdk"
const client = new RelariClient()
const data = data = [
{
label:"22", // unique identifier for the datum
data: {
"retrieved_context": [
"Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
]
}
},
{
label:"23",
data: {
"retrieved_context": [
"The asyncio module in Python is used for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives."
]
}
}
]
const project = await relariClient.projects.findOne("My Project")
const dataset = await relariClient.datasets.findOne(project.id, "My Dataset")
const experiment = await relariClient.experiments.submit(
project.id,
undefined,
[MetricName.PrecisionRecallF1, MetricName.RankedRetrievalMetrics],
data,
dataset.id,
{ category: 'Enterprise' }, // optional metadata
)
The label
field is the unique identifier for the datum in the dataset. We also did not have to provide the ground truth context, as it is already present in the dataset.
You can see that this time, in the Python SDK, we used DatasetDatum
to specify the pipeline results.