Welcome!

Start today with continuous-eval and make your LLM development a science not an art!

🚀 Getting Started Install the package and learn how to get started quickly.

🚰 Pipeline Define your GenAI application pipeline and run evaluation over a tailored dataset.

📊 Metrics Explore the available metrics and learn how to combine multiple metrics effectively.

🔍 Datasets Explore sample datasets and try generating a synthetic evaluation dataset from documents.

💡 Examples Discover code snippets and examples to help you understand and implement different evaluation pipelines.

Blog Posts:
- Practical Guide to RAG Pipeline Evaluation: Part 1: Retrieval
- Practical Guide to RAG Pipeline Evaluation: Part 2: Generation
- How important is a Golden Dataset for LLM evaluation? link
- How to evaluate complex GenAI Apps: a granular approach link
- How to make the most out of LLM production data: simulated user feedback link
- Generate synthetic data to test LLM applications link
Discord: Join our community of LLM developers Discord
Reach out to founders: Email or Schedule a chat