Quick Start
If you haven't installed relari
, go here.
The Relari package also offers a Python SDK and a CLI to interact with the API, you can use the CLI to create projects, upload data, and evaluate your LLM projects.
Core concepts of the Relari API:
- Projects: A project is a container for your datasets and evaluations.
- Datasets: A dataset is a collection of data to be evaluated.
- Evaluations: An evaluation is the a set of metrics computed on a dataset. You can think of it as an experiment.
- Datum: A datum is a single data point to be evaluated.
- Metrics: A metric is a function that computes a score on a datum.
If using the Node or Python SDKs, you'll need to instantiate the client before running any of the code snippets in this documentation:
- Python
- Node
from relari import RelariClient
client = RelariClient()
import { RelariClient } from "relari-sdk"
const client = new RelariClient()
Projects
After setting up the relari
package, you can create a new project through the CLI
relari-cli projects new "My RAG"
We should see it in the list of projects
relari-cli projects ls
for example:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Project ID ┃ Name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 6655e1b462e11bab6820edd9 │ My RAG │
└──────────────────────────┴────────┘
Every project, dataset and evaluation has unique IDs that you can use to interact with the API.
To use the SDK:
- Python
- Node
res = client.projects.create(`My Project Name`)
const res = await relariClient.projects.create(`My Project Name`)
In both cases res
will contain the unique id of your new project.
Use Synchronous Metrics to Calculate a Single Metric
You can compute individual metrics with our SDKs.
- Python
- Node
from relari import RelariClient
from relari import Metric
client = RelariClient()
res = client.metrics.compute(
Metric.PrecisionRecallF1,
args={
"ground_truth_context": [
"Software as a service is a way of delivering applications remotely over the internet."
],
"retrieved_context": [
"Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
],
},
)
print(res)
import { RelariClient, PrecisionRecallF1 } from "relari-sdk"
const client = new RelariClient()
const res = await relariClient.metrics.computeMetric(
new PrecisionRecallF1({
retrieved_context: [
"Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
],
ground_truth_context: [
"Software as a service is a way of delivering applications remotely over the internet."
],
}),
)
In this snippet we used the PrecisionRecallF1
metric, which computes the precision, recall, and F1 score for retrieval system.
You need to pass the entire datum (a single data point to be evaluated) as the args
parameter.
The endpoint is synchronous, so your code will block until the response is received.
Run Asynchronous Metrics to Evaluate a Dataset
Asynchronous metrics are useful for running experiments over large datasets, asynchronous metrics are grouped in "evaluations".
To start a new evaluation from the SDK you can use the following snippet:
- Python
- Node
from relari import RelariClient
from relari import Metric
client = RelariClient()
data = [
{
"retrieved_context": ["Context_1"],
"ground_truth_context": ["Context_2"],
},
{
"retrieved_context": ["Context_3"],
"ground_truth_context": ["Context_3"],
},
]
eval_id = client.evaluations.submit(
project_id=PROJECT_ID, # You can get it from the CLI (relari-cli projects ls)
name=None, # Optional, if not provided a random name will be generated
metadata=dict(), # Optional, metadata to be stored with the evaluation
pipeline=[Metric.PrecisionRecallF1, Metric.RankedRetrievalMetrics], # List of metrics to compute
data=data, # List of datum to compute the metrics on, same as the batch_compute
)
import { RelariClient, MetricName } from "relari-sdk"
const client = new RelariClient()
const data = [
{
label: "Label 1",
data: {
"retrieved_context": ["Context_1"],
"ground_truth_context": ["Context_2"],
},
},
{
label: "Label 2",
data: {
"retrieved_context": ["Context_3"],
"ground_truth_context": ["Context_3"],
},
},
]
const { id: evalId } = await relariClient.experiments.submit(
projectId,
`My experiment name`,
[MetricName.PrecisionRecallF1, MetricName.RankedRetrievalMetrics],
testData,
dataset.id,
{}, // metadata
)
This will create a new job in the backend and return the evaluation ID. You will not receive the results immediately, you can check the status of the evaluation with the CLI or the SDK.
relari-cli evaluations ls PROJECT_ID
If the evaluation's status is COMPLETED
you can download the results with the CLI or the SDK.
relari-cli evaluations get EVALUATION_ID
which will save the results to a file named evaluation-name.json
(e.g., stubbe-tendencies-fluorine.json
).
or, using the SDK:
- Python
- Node
eval_data = client.evaluations.get(EVALUATION_ID)
const evalData = await relariClient.experiments.get(evalId)
In the eval_data['results']
you will find the results of the evaluation.
It's a dictionary indexed by uid (unique identifier of the datum) and the metrics computed, each element contains:
datum
: Dictionary with the datummetrics
: Dictionary with the metrics computed
Next Steps
Now that you have a basic understanding of how to interact with the Relari API, you can start creating your own projects, uploading data, and evaluating your LLM application pipelines.