Introduction to guidance
¶
This notebook is a terse tutorial walkthrough of the syntax of guidance
.
Models¶
At the core of any guidance program are model objects that handle the actual text generation. You can create a model object using any of the constructors under guidance.models
. Models can be run on local hardware or accessed through cloud APIs.
Support for Guidance's constrained generation features varies by provider. The table below summarizes current model support across different providers. Refer to other notebooks for detailed documentation on using specific models with Guidance.
Model Support Table¶
Provider | Type | Constrained Generation Support |
---|---|---|
OpenAI | Remote | JSON generation |
Azure OpenAI | Remote | JSON generation |
LlamaCpp | Local | Full support |
HuggingFace Transformers | Local | Full support (may be slower and more memory-intensive than LlamaCpp) |
Support for more remote providers is a work in progress.
Suggested Local Models¶
For the smoothest experience, use one of the models below with the Transformers library. These models are known to work well with Guidance.
Model | Constructor | Notes |
---|---|---|
GPT-2 | models.Transformers("openai-community/gpt2") |
Runs on any hardware but does not generate very coherent text |
Qwen 2.5 Family | models.Transformers("Qwen/Qwen2.5-1.5B") |
Modest hardware requirements; alternative sizes available at Qwen 2.5 collection |
Mistral 7B v0.2 | models.Transformers("mistralai/Mistral-7B-Instruct-v0.2") |
16 GB of VRAM recommended |
Phi 4 mini | models.Transformers("microsoft/Phi-4-mini-instruct") |
Solid all-around model; 16 GB of VRAM recommended |
Using LlamaCpp¶
LlamaCpp offers better performance but may require more troubleshooting. For LlamaCpp, you need to provide the path on disk to a .gguf
model file. One option used internally by the Guidance team is the bartowski/Llama-3.2-3B-Instruct-GGUF, specifically Llama-3.2-3B-Instruct-Q6_K_L.gguf
. See the LlamaCpp Guidance documentation for detailed instructions about LlamaCpp usage.
Example¶
from guidance import models
model = models.Transformers("Qwen/Qwen2.5-1.5B")
# If you want to use the suggested LlamaCpp model
# from huggingface_hub import hf_hub_download
# model = models.LlamaCpp(
# hf_hub_download(
# repo_id="bartowski/Llama-3.2-3B-Instruct-GGUF",
# filename="Llama-3.2-3B-Instruct-Q6_K_L.gguf",
# ),
# verbose=True,
# n_ctx=4096,
# )
Simple generation¶
Once you have an initial model object you can append text to it with the addition operator. This creates a new model object that has the same context (prompt) as the original model, but with the text appended at the end (just like what would happen if you add two strings together).
import guidance
lm = model + "Who won the last Kentucky derby and by how much?"
Once you have added some text to the model you can then ask the model to generate unconstrained text using the gen
guidance function. Guidance functions represent executable components that can be appended to a model. When you append a guidance function to a model the model extends its state by executing the guidance function.
Note that while the lm
and model
objects are semantically separate, for performance purposes they share the same model weights and KV cache, so the incremental creation of new lm objects is very cheap and reuses all the computation from prior objects.
We can add the text and the gen
function in one statement to follow the traditional prompt-then-generate pattern:
from guidance import gen
model + '''\
Q: Who won the last Kentucky derby and by how much?
A:''' + gen(stop="Q:", max_tokens=50)
Simple templates¶
You can define a template in guidance
(v0.1+) using f-strings. You can interpolate both standard variables and also guidance functions. Note that in Python 3.12 you can put anything into f-string slots, but in python 3.11 and below there are a few disallowed characters (like backslash).
query = "Who won the last Kentucky derby and by how much?"
model + f'''\
Q: {query}
A: {gen(stop="Q:", max_tokens=40)}'''
Capturing variables¶
Often when you are building a guidance program you will want to capture specific portions of the output generated by the model. You can do this by giving a name to the element you wish to capture.
query = "Who won the last Kentucky derby and by how much?"
lm = model + f'''\
Q: {query}
A: {gen(name="answer", stop="Q:", max_tokens=50)}'''
Then we can access the variable by indexing into the final model object.
lm["answer"]
Function encapsulation¶
When you have a set of model operations you want to group together, you can place them into a custom guidance function. To do this you define a decorated python function that takes a model as the first positional argument and returns a new updated model. You can add this guidance function to a model to execute it, just like with the built-in guidance functions like gen
.
import guidance
@guidance
def qa_bot(lm, query):
lm += f'''\
Q: {query}
A: {gen(name="answer", stop="Q:")}'''
return lm
query = "Who won the last Kentucky derby and by how much?"
model + qa_bot(query) # note we don't pass the `lm` arg here (that will get passed during execution when it gets added to the model)
Note that one atypical feature of guidance functions is that multi-line string literals defined inside a guidance function respect the python indentation structure. This means that the whitespace before "Q:" and "A:" in the prompt above is stripped (but if they were indented 6 spaces instead of 4 spaces then only the first 4 spaces would be stripped, since that is the current python indentation level). This allows us to define multi-line templates inside guidance functions while retaining indentation readability (if you ever want to disable this behavior you can use @guidance(dedent=False)
).
# Demonstrating the dedent behavior
@guidance(dedent=False)
def qa_bot(lm, query):
lm += f'''\
Q: {query}
A: {gen(name="answer", stop="Q:")}'''
return lm
query = "Who won the last Kentucky derby and by how much?"
model + qa_bot(query)
Selecting among alternatives¶
Guidance has lots of ways to constrain model generation, but the most basic buliding block is the select
function that forces the model to choose between a set of options (either strings or full grammars).
from guidance import select
model + f'''\
Q: {query}
Now I will choose to either SEARCH the web or RESPOND.
Choice: {select(["SEARCH", "RESPOND"], name="choice")}'''
Note that since guidance is smart about when tokens are forced by the program (and so don't need to be predicted by the model) only one token was generated in the program above (the beginning of "SEARCH" that is highlighted in green).
Interleaved generation and control¶
Because guidance is pure Python code you can interleave (constrained) generation commands with traditional python control statements. In the example below we first ask the model to decide if it should search the web or respond directly, then act accordingly.
@guidance
def qa_bot(lm, query):
lm += f'''\
Q: {query}
Now I will choose to either SEARCH the web or RESPOND.
Choice: {select(["SEARCH", "RESPOND"], name="choice")}
'''
if lm["choice"] == "SEARCH":
lm += "A: I don't know, Google it!"
else:
lm += f'A: {gen(stop="Q:", name="answer")}'
return lm
model + qa_bot(query)
Generating lists¶
Whenever you want to generate a list of items you can use the list_append
parameter which will cause the captured value to be appended to a list instead of overwriting previous values.
lm = model + f'''\
Q: {query}
Now I will choose to either SEARCH the web or RESPOND.
Choice: {select(["SEARCH", "RESPOND"], name="choice")}
'''
if lm["choice"] == "SEARCH":
lm += "Here are 3 search queries:\n"
for i in range(3):
lm += f'''{i+1}. "{gen(stop='"', name="queries", temperature=1.0, list_append=True)}"\n'''
lm["queries"]
Chat¶
You can control chat models using special with
context blocks that wrap whatever is inside them with the special formats needed for the chat model you are using. This allows you express chat programs without tying yourself to a single model backend.
from guidance import models
# to use role based chat tags you need a chat model, here we use gpt-4o-mini
gpt4o = models.OpenAI("gpt-4o-mini")
from guidance import system, user, assistant, gen
with system():
lm = gpt4o + "You are a helpful assistant."
with user():
lm += "What is the meaning of life?"
with assistant():
lm += gen("response")
with user():
lm += "And how about why the sky is blue?"
with assistant():
lm += gen("response")
with user():
lm += "Are you sure?"
with assistant():
lm += gen("response")
Multistep
# you can create and guide multi-turn conversations by using a series of role tags
@guidance
def experts(lm, query):
with system():
lm += "You are a helpful assistant."
with user():
lm += f"""\
I want a response to the following question:
{query}
Who are 3 world-class experts (past or present) who would be great at answering this?
Please don't answer the question or comment on it yet."""
with assistant():
lm += gen(name='experts', max_tokens=300)
with user():
lm += f"""\
Great, now please answer the question as if these experts had collaborated in writing a joint anonymous answer.
In other words, their identity is not revealed, nor is the fact that there is a panel of experts answering the question.
If the experts would disagree, just present their different positions as alternatives in the answer itself (e.g. 'some might argue... others might argue...').
Please start your answer with ANSWER:"""
with assistant():
lm += gen(name='answer', max_tokens=500)
return lm
gpt4o + experts(query='What is the meaning of life?')