> ## Documentation Index > Fetch the complete documentation index at: https://docs.sigmaeval.com/llms.txt > Use this file to discover all available pages before exploring further. # End-to-End Testing of Conversational AI > SigmaEval is an open-source Python framework for the end-to-end testing of conversational AI, chatbots, virtual assistants, and other LLM-based applications. ## Beyond "It Seems to Work" Testing Generative AI applications is fundamentally different from testing traditional software. Methodologies designed for predictable, deterministic systems are ill-equipped to handle the non-deterministic nature of LLM-based apps. The core difficulty stems from two interconnected problems: * **The Infinite Input Space:** The range of possible user inputs is endless, making it impossible to write enough static test cases to cover every scenario. * **Non-Deterministic & "Fuzzy" Outputs:** An LLM model can produce a wide variety of responses to the same prompt, and quality itself is often subjective. This reality means we must move beyond simple pass/fail checks and adopt a more robust, statistical approach to evaluation. **SigmaEval** is a Python framework for the **statistical**, **end-to-end** evaluation of Gen AI apps, agents, and bots that helps you move from "it seems to work" to making rigorous, data-driven statements about your AI's quality. It allows you to set and enforce objective quality bars by making statements like: > *"We are confident that at least 90% of user issues coming into our customer > support chatbot will be resolved with a quality score of 8/10 or higher."* > *"With a high degree of confidence, the median response time of our new > AI-proposal generator will be lower than our 5-second SLO."* Start using SigmaEval in minutes with a quickstart guide. ## How it Works At its core, SigmaEval uses two AI agents to automate evaluation: an **AI User Simulator** that realistically tests your application, and an **AI Judge** that scores its performance. The process is as follows: You start by defining a test scenario in plain language, including the user's goal and a clear description of the successful outcome you expect. This becomes your objective quality bar. The **AI User Simulator** acts as a test user, interacting with your application based on your scenario. It runs these interactions many times to collect a robust dataset of conversations. The **AI Judge** scores each conversation against your definition of success. SigmaEval then applies statistical methods to these scores to determine if your quality bar has been met with a specified level of confidence. SigmaEval Architecture Diagram

## Key Features Perform comprehensive statistical analyses of your Gen AI applications with confidence intervals and rigorous testing. Test all aspects of your Gen AI app performance, from response quality to latency and reliability. Drop SigmaEval directly into your existing test suites. It's fully compatible with popular frameworks like Pytest and Unittest. Support for over 100 LLM providers for the AI Judge and User Simulator. Move from intuition-based decisions to quantifiable, objective assessments of your AI's capabilities. ## Resources View the source code, contribute, and report issues. Install and view package details on PyPI. ## Get Started Start using SigmaEval in minutes with a quickstart guide.