4 docs tagged with "experiments"

Compare evaluation results

After running multiple experiments, you can compare their results to gain deeper actionable insights.

Interpret evaluation results

Once you submit the experiments, you can view them in the UI in the Project > Experiments Tab.

Running experiments (or evaluations) is a systematic way to measure the performance of an AI system across a set number of samples. By altering prompts, models, or hyperparameters, you can observe how different settings impact performance. Experiments can be run on single data points or entire datasets to quickly understand the effect of changes across various scenarios.

Runtime monitor (online evaluation)

The Runtime Monitor feature allows you to evaluate results on production data in real-time. Reference-free metrics should be used in runtime monitors. When you evaluate results on the fly, you won't need reference outputs in datasets to compare against.

Compare evaluation results

Interpret evaluation results

Run experiments

Runtime monitor (online evaluation)