1. User gives a description of what data they need 2. call LLM with the user input and the schema of the variable 3. return the synthetic dataset 4. user can use the dataset to test their prompt