.agent.yml
files abstract away much of this complexity,
allowing you to focus more on the logic of the prompted LLM rather than the
details of the code execution.
Agent components
Specifically, in our agentyml
files, you need to specify the following:
Component | Description | Required |
---|---|---|
model | LLM model to use (defined in config.yml, referenced by name ) | Required |
system_instructions | System instructions passed to the LLM | Required |
context | A list of files that can be injected into system_instructions | Optional |
tools | Tools to use, see the Tools section below | Optional |
database | Database to use (defined in config.yml, referenced by name ) | Optional |
Context
The context object allows for deterministic injection of files intosystem_instructions
. Primarily this is an organizational and code re-use
utility so rather than prompting your agents with long text blobs, you can
instead divide these prompts into logical sections, broken into distinct files.
Each context entry in the list requires three fields:
Field | Description | Example |
---|---|---|
name | Identifier for the context that will be used in system_instructions | ”anon_youtube” |
type | Type of the context file (e.g., file, semantic_model) | “semantic_model” |
src | Path to the source file relative to the project root (can be single file or list) | “data/acme.sem.yml” |
file
-type context objects in your system_instructions using the following syntax:
{{ context.name }}
For objects that are of type semantic_model
, you can access their properties using:
{{ context.name.property }}
For example, if you have a semantic model context named “acme”, you can access its entities using:
{{ context.acme.entities }}
For smaller projects, we encourage saving any pertinent SQL files and injecting
these in as context objects, rather than opting for leveraging retrieval to
pull in relevant sources. We find that this tends to provide much more
deterministic outputs.
This can be accomplished by adding the following section to an .agent.yml file
:
system_instructions
as follows:
Tools
To enable the LLM to flexibly accomplish a wider range of tasks, our internal chain logic is as follows:-
We render your
system_instructions
using all retrieved queries and context. -
We feed the rendered
system_instructions
and prompt into the LLM tool calling API. -
We repeat steps 1 and 2 with the results of step 2 until the request specified in
system_instructions
is fulfilled.
tools
section with specific tools that can be used by the agent. The following tools are available:
type: execute_sql
/ validate_sql
There are two kinds of sql-writing tools — execute_sql
, which will execute
the sql and return the results to the LLM, and validate_sql
, which will
attempt to write queries until the query successful executes, sending only the
query itself back to the LLM (not the result set).
Both can be configured by specifying a database
to connect to, as shown below:
type: retrieval
While we generally recommend avoid using retrieval (and instead using
context
) to avoid injecting additional nondeterminism into the system, we do
support in-process retrieval against embeddings. The retrieval
tool spins up
an in-process vector db and can be configured as follows:
src
specifies the directory that will be searched through and embedded.
To use this tool, embeddings must be built before any oxy run
command. To
learn about how to build embeddings, see Embedding
management. In short, you’ll
need to log into the huggingface CLI, then build embeddings using the following
command:
The accepted format of these parameters will likely change in the future.
Database
Database information can be accessed withinsystem_instructions
by using the databases
namespace, then referencing by name
, as follows:
Sample config
semantic_model.agent.yml