Quick Start

If you haven't installed relari, go here.

The Relari package also offers a Python SDK and a CLI to interact with the API, you can use the CLI to create projects, upload data, and evaluate your LLM projects.

Core concepts of the Relari API:

Projects: A project is a container for your datasets and evaluations.
Datasets: A dataset is a collection of data to be evaluated.
Evaluations: An evaluation is the a set of metrics computed on a dataset. You can think of it as an experiment.
Datum: A datum is a single data point to be evaluated.
Metrics: A metric is a function that computes a score on a datum.

If using the Node or Python SDKs, you'll need to instantiate the client before running any of the code snippets in this documentation:

Python
Node

from relari import RelariClient
client = RelariClient()

import { RelariClient } from "relari-sdk"
const client = new RelariClient()

Projects

After setting up the relari package, you can create a new project through the CLI

relari-cli projects new "My RAG"

We should see it in the list of projects

relari-cli projects ls

for example:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Project ID               ┃ Name   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 6655e1b462e11bab6820edd9 │ My RAG │
└──────────────────────────┴────────┘

Every project, dataset and evaluation has unique IDs that you can use to interact with the API.

To use the SDK:

Python
Node

res = client.projects.create(`My Project Name`)

const res = await relariClient.projects.create(`My Project Name`)

In both cases res will contain the unique id of your new project.

Use Synchronous Metrics to Calculate a Single Metric

You can compute individual metrics with our SDKs.

Python
Node

from relari import RelariClient
from relari import Metric

client = RelariClient()

res = client.metrics.compute(
    Metric.PrecisionRecallF1,
    args={
        "ground_truth_context": [
            "Software as a service is a way of delivering applications remotely over the internet."
        ],
        "retrieved_context": [
            "Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
        ],
    },
)
print(res)

import { RelariClient, PrecisionRecallF1 } from "relari-sdk"
const client = new RelariClient()

const res = await relariClient.metrics.computeMetric(
  new PrecisionRecallF1({
    retrieved_context: [
      "Software as a service (SaaS) is a way of delivering applicabtions remotely over the internet instead of locally on machines (known as “on-premise” software)."
    ],
    ground_truth_context: [
      "Software as a service is a way of delivering applications remotely over the internet."
    ],
  }),
)

In this snippet we used the PrecisionRecallF1 metric, which computes the precision, recall, and F1 score for retrieval system. You need to pass the entire datum (a single data point to be evaluated) as the args parameter. The endpoint is synchronous, so your code will block until the response is received.

Run Asynchronous Metrics to Evaluate a Dataset

Asynchronous metrics are useful for running experiments over large datasets, asynchronous metrics are grouped in "evaluations".

To start a new evaluation from the SDK you can use the following snippet:

Python
Node

from relari import RelariClient
from relari import Metric

client = RelariClient()

data = [
    {
        "retrieved_context": ["Context_1"],
        "ground_truth_context": ["Context_2"],
    },
    {
        "retrieved_context": ["Context_3"],
        "ground_truth_context": ["Context_3"],
    },
]

eval_id = client.evaluations.submit(
    project_id=PROJECT_ID, # You can get it from the CLI (relari-cli projects ls)
    name=None, # Optional, if not provided a random name will be generated
    metadata=dict(), # Optional, metadata to be stored with the evaluation
    pipeline=[Metric.PrecisionRecallF1, Metric.RankedRetrievalMetrics], # List of metrics to compute
    data=data, # List of datum to compute the metrics on, same as the batch_compute
)

import { RelariClient, MetricName } from "relari-sdk"
const client = new RelariClient()

const data = [
  {
    label: "Label 1",
    data: {
      "retrieved_context": ["Context_1"],
      "ground_truth_context": ["Context_2"],
    },
  },
  {
    label: "Label 2",
    data: {
      "retrieved_context": ["Context_3"],
      "ground_truth_context": ["Context_3"],
    },
  },
]

const { id: evalId } = await relariClient.experiments.submit(
  projectId,
  `My experiment name`,
  [MetricName.PrecisionRecallF1, MetricName.RankedRetrievalMetrics],
  testData,
  dataset.id,
  {}, // metadata
)

This will create a new job in the backend and return the evaluation ID. You will not receive the results immediately, you can check the status of the evaluation with the CLI or the SDK.

relari-cli evaluations ls PROJECT_ID

If the evaluation's status is COMPLETED you can download the results with the CLI or the SDK.

relari-cli evaluations get EVALUATION_ID

which will save the results to a file named evaluation-name.json (e.g., stubbe-tendencies-fluorine.json).

or, using the SDK:

Python
Node

eval_data = client.evaluations.get(EVALUATION_ID)

const evalData = await relariClient.experiments.get(evalId)

In the eval_data['results'] you will find the results of the evaluation. It's a dictionary indexed by uid (unique identifier of the datum) and the metrics computed, each element contains:

datum: Dictionary with the datum
metrics: Dictionary with the metrics computed

Next Steps

Now that you have a basic understanding of how to interact with the Relari API, you can start creating your own projects, uploading data, and evaluating your LLM application pipelines.

Projects​

Use Synchronous Metrics to Calculate a Single Metric​

Run Asynchronous Metrics to Evaluate a Dataset​

Next Steps​

Projects

Use Synchronous Metrics to Calculate a Single Metric

Run Asynchronous Metrics to Evaluate a Dataset

Next Steps