Tests are a core part of oxy. Tests can be written either as a part of agents or workflows. At present, we support a single type of test, type: consistency, which measures the consistency between two results. Within agents, this can be implemented as follows:
tests:
  - type: consistency
    n: 5  # number of runs to test
    task_description: "how many users do we have?"
The task_description field is the question that you want to test the LLM’s performance on (note: we don’t call this prompt because we are nesting this task_description within a separate prompt that runs the evaluation, so prompt in this situation would be ambiguous). n indicates the number of times to run the agent to produce a response to the task_description request. For workflows, task_description is not required, but instead a task_ref value should be provided, as shown below:
tests:
  - type: consistency
    task_ref: task_name
    n: 5  # number of runs to test
The task_ref field indicates the task name that is to be tested. No task_description is required because the given prompt will be used for evaluation. These tests can be run by running either, for an agent:
oxy test agent-name.agent.yml
Or, for a workflow:
oxy test workflow-name.workflow.yml