Skip to content


Start today with continuous-eval and make your LLM development a science not an art!

🚀 Getting Started Install the package and learn how to get started quickly.
🚰 Pipeline Define your GenAI application pipeline and run evaluation over a tailored dataset.
📊 Metrics Explore the available metrics and learn how to combine multiple metrics effectively.
🔍 Datasets Explore sample datasets and try generating a synthetic evaluation dataset from documents.
💡 Examples Discover code snippets and examples to help you understand and implement different evaluation pipelines.

Other Resources

  • Blog Posts:

    • Practical Guide to RAG Pipeline Evaluation: Part 1: Retrieval
    • Practical Guide to RAG Pipeline Evaluation: Part 2: Generation
    • How important is a Golden Dataset for LLM evaluation? link
    • How to evaluate complex GenAI Apps: a granular approach link
    • How to make the most out of LLM production data: simulated user feedback link
    • Generate synthetic data to test LLM applications link
  • Discord: Join our community of LLM developers Discord

  • Reach out to founders: Email or Schedule a chat