Prerequisites:
- Python 3.10+
- An API key for an LLM provider (e.g., OpenAI, Anthropic, Google)
pytest
. We’ll start by building a simple AI assistant to have an application to test. By the end, you will have a complete, runnable example that you can adapt for your own projects.
Installation
First, install the necessary libraries from PyPI. We recommend creating a virtual environment to avoid dependency conflicts.The code in this tutorial uses Gemini as the LLM provider. If you want to use a different provider replace the model name in the code with the name of the model you want to use. See the LiteLLM documentation for more information.
Step 1: Build a Simple AI Assistant
First, let’s create a simple, but complete, AI application that we can test. This assistant is for an e-commerce store and uses a system prompt to define its capabilities. Create a file namedapp.py
:
app.py
Step 2: Write Your First Evaluation with Pytest
Now that we have an application, we can write a test for it using SigmaEval andpytest
. This test will verify that our assistant not only provides the correct information but also does so in a timely manner.
Create a file named test_app.py
in the same directory:
test_app.py
Step 3: Run the Test and Interpret the Results
With yourapp.py
and test_app.py
files in place, you can run the evaluation from your terminal.
terminal
expect_behavior
) and the performance (expect_metric
) expectations were met with statistical confidence.
Next Steps
Now that you’ve run your first end-to-end evaluation, you can start applying SigmaEval to your own Gen AI applications. Try modifying the system prompt inapp.py
or the expectations in test_app.py
to see how the results change.