Skip to main content

Quick Start

If you haven't installed relari, go here.

The Relari package also offers a Python SDK and a CLI to interact with the API, you can use the CLI to create projects, upload data, and evaluate your LLM projects.

Core concepts of the Relari API:

  • Datum: A datum is a single data point to be evaluated.
  • Projects: A project is a container for your datasets and evaluations.
  • Datasets: A dataset is a collection of data to be evaluated.
  • Metrics: A metric is a function that computes a score on a datum.
  • Evaluations: An evaluation is the a set of metrics computed on a dataset. You can think of it as an experiment.

Projects

After setting up the relari package, you can create a new project through the CLI

relari-cli projects new "My RAG"

We should see it in the list of projects

relari-cli projects ls

for example:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Project ID ┃ Name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 6655e1b462e11bab6820edd9 │ My RAG │
└──────────────────────────┴────────┘

Every project, dataset and evaluation has unique IDs that you can use to interact with the API.

Use Synchronous Metrics to Calculate a Single Metric

You can interact with the Relari API using the RelariClient. You can compute individual metrics with the Python SDK.

from relari import RelariClient
from relari import Metric

client = RelariClient()

res = client.metrics.compute(
Metric.PrecisionRecallF1,
args={
"ground_truth_context": [
"Software as a service is a way of delivering applications remotely over the internet."
],
"retrieved_context": [
"Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
],
},
)
print(res)

In this snippet we used the PrecisionRecallF1 metric, which computes the precision, recall, and F1 score for retrieval system. You need to pass the entire datum (a single data point to be evaluated) as the args parameter. The endpoint is synchronous, so your code will block until the response is received.

Run Asynchronous Metrics to Evaluate a Dataset

Asynchronous metrics are useful for running experiments over large datasets, asynchronous metrics are grouped in "evaluations".

To start a new evaluation from the SDK you can use the following snippet:

from relari import RelariClient
from relari import Metric

client = RelariClient()

data = [
{
"retrieved_context": ["Context_1"],
"ground_truth_context": ["Context_2"],
},
{
"retrieved_context": ["Context_3"],
"ground_truth_context": ["Context_3"],
},
]

eval_id = client.evaluations.submit(
project_id=PROJECT_ID, # You can get it from the CLI (relari-cli projects ls)
name=None, # Optional, if not provided a random name will be generated
metadata=dict(), # Optional, metadata to be stored with the evaluation
pipeline=[Metric.PrecisionRecallF1, Metric.RankedRetrievalMetrics], # List of metrics to compute
data=data, # List of datum to compute the metrics on, same as the batch_compute
)

This will create a new job in the backend and return the evaluation ID. You will not receive the results immediately, you can check the status of the evaluation with the CLI or the SDK.

relari-cli evaluations ls PROJECT_ID

If the evaluation's status is COMPLETED you can download the results with the CLI or the SDK.

relari-cli evaluations get EVALUATION_ID

which will save the results to a file named evaluation-name.json (e.g., stubbe-tendencies-fluorine.json).

or

eval_data = client.evaluations.get(EVALUATION_ID)

In the eval_data['results'] you will find the results of the evaluation. It's a dictionary indexed by uid (unique identifier of the datum) and the metrics computed, each element contains:

  • datum: Dictionary with the datum
  • metrics: Dictionary with the metrics computed

Next Steps

Now that you have a basic understanding of how to interact with the Relari API, you can start creating your own projects, uploading data, and evaluating your LLM application pipelines.