Skip to main content

Generate synthetic datasets

Architecture of Synthetic Data Generation Pipeline

Here's the high level flow to generate the synthetic datasets. We will use your existing application logic and optionally environment data and seed example data to create the synthetic inputs and expected outputs (both intermediate and final).

Generate synthetic datasetsGenerate synthetic datasets
Generate synthetic datasets

Navigate to the Datasets tab in the project and click on the Generate Synthetic Dataset button to start the process.

Navigate to Datasets tabNavigate to Datasets tab
Navigate to Datasets tab

Step 1: Define the use case application logic

Define the use case application logic by selecting a common LLM application logic. You can define custom application logic by selecting the Custom option in the enterprise version.

Step 1: Define the use case application logicStep 1: Define the use case application logic
Step 1: Define the use case application logic

Step 2: Define the environment data

Define the environment data by specifying the contextual information and parameters that will influence the generated dataset.

Step 2: Define the environment dataStep 2: Define the environment data
Step 2: Define the environment data

Step 3: Seed example data (optional)

Provide initial samples (Inputs and/or Expected Outputs) that will serve as a foundation for generating more extensive synthetic data.

Step 3: Seed example dataStep 3: Seed example data
Step 3: Seed example data

Step 4: Generate synthetic data

Submit to the Relari Cloud for generation. You will get an email notification once the synthetic data is ready.

Example Synthetic RAG DatasetExample Synthetic RAG Dataset
Example Synthetic RAG Dataset