> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sigmaeval.com/llms.txt
> Use this file to discover all available pages before exploring further.

# End-to-End Testing of Conversational AI

> SigmaEval is an open-source Python framework for the end-to-end testing of conversational AI, chatbots, virtual assistants, and other LLM-based applications.

## Beyond "It Seems to Work"

Testing Generative AI applications is fundamentally different from testing traditional software. Methodologies designed for predictable, deterministic systems are ill-equipped to handle the non-deterministic nature of LLM-based apps.

The core difficulty stems from two interconnected problems:

* **The Infinite Input Space:** The range of possible user inputs is endless, making it impossible to write enough static test cases to cover every scenario.
* **Non-Deterministic & "Fuzzy" Outputs:** An LLM model can produce a wide variety of responses to the same prompt, and quality itself is often subjective.

This reality means we must move beyond simple pass/fail checks and adopt a more robust, statistical approach to evaluation.

**SigmaEval** is a Python framework for the **statistical**, **end-to-end** evaluation of Gen AI apps, agents, and bots that helps you move from "it seems to work" to making rigorous, data-driven statements about your AI's quality. It allows you to set and enforce objective quality bars by making statements like:

> *"We are confident that at least 90% of user issues coming into our customer
> support chatbot will be resolved with a quality score of 8/10 or higher."*

> *"With a high degree of confidence, the median response time of our new
> AI-proposal generator will be lower than our 5-second SLO."*

<Card title="Get Started" icon="rocket" href="/quickstart" horizontal>
  Start using SigmaEval in minutes with a quickstart guide.
</Card>

## How it Works

At its core, SigmaEval uses two AI agents to automate evaluation: an **AI User Simulator** that realistically tests your application, and an **AI Judge** that scores its performance. The process is as follows:

<Steps>
  <Step title="Define 'Good'">
    You start by defining a test scenario in plain language, including the
    user's goal and a clear description of the successful outcome you expect.
    This becomes your objective quality bar.
  </Step>

  <Step title="Simulate and Collect Data">
    The **AI User Simulator** acts as a test user, interacting with your
    application based on your scenario. It runs these interactions many times to
    collect a robust dataset of conversations.
  </Step>

  <Step title="Judge and Analyze">
    The **AI Judge** scores each conversation against your definition of
    success. SigmaEval then applies statistical methods to these scores to
    determine if your quality bar has been met with a specified level of
    confidence.
  </Step>
</Steps>

<Frame>
  <img src="https://mintcdn.com/sigmaeval/KMRma32cDuC4d6nn/images/sigmaeval-architecture.jpg?fit=max&auto=format&n=KMRma32cDuC4d6nn&q=85&s=29fc426670e50b98b1bc84ec56ddc3d9" alt="SigmaEval Architecture Diagram" width="1754" height="2762" data-path="images/sigmaeval-architecture.jpg" />
</Frame>

## Key Features

<CardGroup cols={2}>
  <Card title="Statistical Evaluation" icon="chart-line">
    Perform comprehensive statistical analyses of your Gen AI applications with
    confidence intervals and rigorous testing.
  </Card>

  <Card title="End-to-End Testing" icon="list-check">
    Test all aspects of your Gen AI app performance, from response quality to
    latency and reliability.
  </Card>

  <Card title="Pytest & Unittest Ready" icon="python">
    Drop SigmaEval directly into your existing test suites. It's fully
    compatible with popular frameworks like Pytest and Unittest.
  </Card>

  <Card title="100+ LLM Providers" icon="cubes">
    Support for over 100 LLM providers for the AI Judge and User Simulator.
  </Card>

  <Card title="Data-Driven Decisions" icon="brain">
    Move from intuition-based decisions to quantifiable, objective assessments
    of your AI's capabilities.
  </Card>
</CardGroup>

## Resources

<CardGroup cols={2}>
  <Card title="GitHub Repository" icon="github" href="https://github.com/SigmaEval/SigmaEval">
    View the source code, contribute, and report issues.
  </Card>

  <Card title="PyPI Package" icon="box" href="https://pypi.org/project/sigmaeval-framework/">
    Install and view package details on PyPI.
  </Card>
</CardGroup>

## Get Started

<Card title="Get Started" icon="rocket" href="/quickstart" horizontal>
  Start using SigmaEval in minutes with a quickstart guide.
</Card>
