Skip to main content

4 docs tagged with "experiments"

View all tags

Run experiments

Running experiments (or evaluations) is a systematic way to measure the performance of an AI system across a set number of samples. By altering prompts, models, or hyperparameters, you can observe how different settings impact performance. Experiments can be run on single data points or entire datasets to quickly understand the effect of changes across various scenarios.

Runtime monitor (online evaluation)

The Runtime Monitor feature allows you to evaluate results on production data in real-time. Reference-free metrics should be used in runtime monitors. When you evaluate results on the fly, you won't need reference outputs in datasets to compare against.