Skip to main content
Now that you have an agent working, you need to write tests to ensure that the quality of your answers don’t degrade as you add additional context. To add a test to your agent, you can add the following to your .agent.yml file.
tests:
  - type: consistency
    n: 10
    task_description: "how many nights did I get high quality sleep?"
You can add as many tests as you’d like, for as many prompts as you like. For example:
tests:
  - type: consistency
    n: 10
    task_description: "how many nights did I get high quality sleep?"
  - type: consistency
    n: 10
    task_description: "how many hours do I sleep on average?"
  - type: consistency
    n: 10
    task_description: "what day do I typically get the most sleep?"
You can then run these tests using the following command:
oxy test my-agent.agent.yml
This will generate a final accuracy score and surface any consistency errors that the LLM detects.

Advanced Testing Options

CI/CD Integration

For automated testing in CI/CD pipelines, use the JSON output format:
oxy test my-agent.agent.yml --format json
This outputs machine-readable JSON like {"accuracy": 0.855} that can be parsed by your CI tools.

Quality Gates

Enforce minimum accuracy thresholds to prevent regressions:
# Fail the build if accuracy drops below 80%
oxy test my-agent.agent.yml --format json --min-accuracy 0.8
The command will exit with code 1 if the threshold isn’t met, making it perfect for CI quality gates.

Multiple Test Management

If you have multiple tests in your agent file, control how thresholds are evaluated:
# Average mode: average of all tests must meet threshold (default)
oxy test my-agent.agent.yml --min-accuracy 0.8 --threshold-mode average

# All mode: every individual test must meet threshold
oxy test my-agent.agent.yml --min-accuracy 0.8 --threshold-mode all
For complete documentation on testing features, see the Testing Guide. At this point, you have a working agent as well as the ability to modify and test this agent. Congratulations!