- Define โGoodโ: You start by defining a test scenario in plain language, including the userโs goal and a clear description of the successful outcome you expect. This becomes your objective quality bar.
- Simulate and Collect Data: The AI User Simulator acts as a test user, interacting with your application based on your scenario. It runs these interactions many times to collect a robust dataset of conversations.
- Judge and Analyze: The AI Judge scores each conversation against your definition of success. SigmaEval then applies statistical methods to these scores to determine if your quality bar has been met with a specified level of confidence.
