Prompted LLM chatbots that can reason about data
.agent.yml
files abstract away much of this complexity,
allowing you to focus more on the logic of the prompted LLM rather than the
details of the code execution.
yml
files, you need to specify the following:
Component | Description | Required |
---|---|---|
model | LLM model to use (defined in config.yml, referenced by name ) | Required |
system_instructions | System instructions passed to the LLM | Required |
context | A list of files that can be injected into system_instructions | Optional |
tools | Tools to use, see the Tools section below | Optional |
database | Database to use (defined in config.yml, referenced by name ) | Optional |
system_instructions
. Primarily this is an organizational and code re-use
utility so rather than prompting your agents with long text blobs, you can
instead divide these prompts into logical sections, broken into distinct files.
Each context entry in the list requires three fields:
Field | Description | Example |
---|---|---|
name | Identifier for the context that will be used in system_instructions | ”anon_youtube” |
type | Type of the context file (e.g., file, semantic_model) | “semantic_model” |
src | Path to the source file relative to the project root (can be single file or list) | “data/acme.sem.yml” |
file
-type context objects in your system_instructions using the following syntax:
{{ context.name }}
For objects that are of type semantic_model
, you can access their properties using:
{{ context.name.property }}
For example, if you have a semantic model context named “acme”, you can access its entities using:
{{ context.acme.entities }}
For smaller projects, we encourage saving any pertinent SQL files and injecting
these in as context objects, rather than opting for leveraging retrieval to
pull in relevant sources. We find that this tends to provide much more
deterministic outputs.
This can be accomplished by adding the following section to an .agent.yml file
:
system_instructions
as follows:
system_instructions
using all retrieved queries and context.
system_instructions
and prompt into the LLM tool calling API.
system_instructions
is fulfilled.
tools
section with specific tools that can be used by the agent. The following tools are available:
execute_sql
/ validate_sql
execute_sql
, which will execute
the sql and return the results to the LLM, and validate_sql
, which will
attempt to write queries until the query successful executes, sending only the
query itself back to the LLM (not the result set).
Both can be configured by specifying a database
to connect to, as shown below:
retrieval
context
) to avoid injecting additional nondeterminism into the system, we do
support in-process retrieval against embeddings. The retrieval
tool spins up
an in-process vector db and can be configured as follows:
src
specifies the directory that will be searched through and embedded.
To use this tool, embeddings must be built before any oxy run
command. To
learn about how to build embeddings, see Embedding
management. In short, you’ll
need to log into the huggingface CLI, then build embeddings using the following
command:
system_instructions
by using the databases
namespace, then referencing by name
, as follows: