Skip to main content

Intro to Relari

๐Ÿš€ Welcome to Relari Docsโ€‹

In the Getting Started section, we will walk through:

  • The high-level Philosophy and Concepts behind Relari's data-driven evaluation and improvement toolkit.
  • How to use the Relari Cloud UI to generate datasets, run experiments, and auto-improve your LLM applications.

For specific references and code examples, please visit the Python SDK & CLI, REST API and Metrics sections.

๐ŸŽฏ Purposeโ€‹

LLMs are incredibly powerful but tricky to work with because they are non-deterministic. This often means LLM applications get stuck in the alpha/beta phases, with no clear path to true production, especially for mission-critical applications.

To get to production, AI developers have tons of options to sift throughโ€”different models, prompting strategies, RAG architectures, agent setups, and more. The best choice often depends on the specific use case. For example, a customer support chatbot vs. a legal copilot will likely need very different system design.With so many options, it's hard to figure out the right path forward.

Relari is here to take the guesswork out of this process. We use a data-driven approach to help AI teams systematically make these design choices and get from demo to production faster.

๐Ÿ“Š Why Data-Driven Approach?โ€‹

Data can be your secret weapon to make your AI app stand out. Leveraging use-case-specific data helps you understand your app's performance better, automate parts of the iteration process, and make your application more reliable.

Using data throughout your developmentโ€”from early experiments to after launchโ€”lets you make confident decisions and boosts user satisfaction.

How Relari Helps AI Teams:

  • Rigorous Evaluation and Benchmarking: Use comprehensive, tailored metrics to measure and compare performance.
  • Golden Dataset Generation: Curate high-quality datasets from human labels or synthetically generate at scale.
  • Automated Prompt Optimization: Automatically optimize prompts to capture nuanced requirements.
  • Systematic LLM Fine-Tuning: Take the guesswork out of the fine-tuning domain-specific LLMs to outperform large models.
  • Run-time Monitoring: Monitor your LLM app's performance in production and use data to inform adjustments.
Relari data-driven evaluation and improvement toolkitRelari data-driven evaluation and improvement toolkit
Relari data-driven evaluation and improvement toolkit

๐Ÿ” Backgroundโ€‹

Relari started as an open-source LLM evaluation framework continuous-eval, offering 30+ metrics covering various applications. As we iterate with continuous feedback from the developer community, we realized that AI builders need much more than standard metrics and there's a lot more potential we can help unlock by building the comprehensive toolkit to enable the data flywheel in LLM products.

Whether you're already using continuous-eval or just starting with Relari Cloud, we're excited to help you build better LLM applications faster. We look forward to hearing your feedback as we shape this new category of data-driven development infrastructure.