Advanced Prompt Engineering Guide
This guide will walk you through advanced techniques in prompt design and prompt engineering. If you are new to prompt engineering, we recommend starting with the Introduction to Prompt Engineering Guide.
一、API Types and Prompt Structures
For Azure OpenAI GPT models, there are currently two different APIs where prompt engineering can be applied:
- Chat Completions API: Supports ChatGPT and GPT-4 models, designed to receive inputs in a specific chat-like script format stored in an array of dictionaries.
- Completions API: Supports older GPT-3 models and has more flexible input requirements, accepting text strings without specific format rules. Note: ChatGPT models can work with either API, but we strongly recommend using the Chat Completions API for these models.
二、System Messages
System messages are included at the beginning of a prompt to provide context, instructions, or other use-case related information for the model. They can be used to describe the assistant’s persona, define what the model should and should not answer, and specify the format of the model’s responses.
Example:
plaintext
System Message: The assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Service. Answer questions using only the provided context. If unsure of the answer, say “I don’t know.”
Other examples:
- “The assistant is a large language model trained by OpenAI.”
- “The assistant is an intelligent chatbot designed to help users answer their tax-related questions.”
- “You are an assistant designed to extract entities from text. Users will paste text strings, and you will respond with entities extracted from the text as a JSON object. Example output format: plaintext
{
“name”: "",
“company”: "",
“phone_number”: ""
}
Important note: Even if you instruct the model in the system message to respond “I don’t know” when unsure, this does not guarantee compliance. Well-designed system messages can increase the likelihood of specific outcomes but may still produce incorrect responses that contradict the intent of the instructions.
三、Few-Shot Learning
A common method to adapt language models to new tasks is through few-shot learning. In few-shot learning, you provide a set of training examples in the prompt to give the model additional context.
When using the Chat Completions API, a series of messages between the user and assistant (written in the new prompt format) can serve as examples for few-shot learning. These examples guide the model to respond in a specific way, mimic behaviors, and provide seed answers to common questions.
四、Non-Chat Scenarios
While the Chat Completions API is optimized for multi-turn conversations, it can also be used in non-chat scenarios. For example, in a sentiment analysis scenario, you might use the following prompt:
五、Prompt Design Techniques
1. Start with Clear Instructions
The order of information in a prompt matters. GPT-style models are structured in a way that defines how they process input. Our research shows that telling the model what task you want it to perform at the beginning of the prompt—before sharing other contextual information or examples—helps generate higher-quality outputs.
Note: While this approach is generally still recommended, our tests show that for ChatGPT and GPT-4 models, model responses are the same regardless of whether this technique is used, compared to earlier model versions (GPT-3 and older). In the example below, adding the statement “Several news sources… outbreak” at the beginning or end of the prompt did not change the final model response.
2. Repeat Instructions at the End
Models may be susceptible to recency bias, meaning information at the end of a prompt may influence the output more than information at the beginning. Therefore, it’s worth trying to repeat instructions at the end of the prompt and evaluate the impact on the generated response.
3. Guide Output
This refers to including words or phrases at the end of a prompt to steer the model’s response into a desired form. For example, using a prompt like “Here’s a bulleted list of key points:\n- ” helps ensure the output is a bulleted list.
In the prompt above, the text “A possible search query is:” guides the model to generate a single output. Without this prompt, the model might generate multiple search queries.
4. Use Explicit Syntax
Using explicit syntax in prompts—including punctuation, headings, and section markers—helps convey intent and often makes outputs easier to analyze.
In the example below, delimiters (in this case, ---) are added between different information sources or steps. This allows --- to be used as a stop condition for generation. Additionally, section headings or special variables are displayed in uppercase for differentiation.
If unsure which syntax to use, consider Markdown or XML. These models have been trained on 大量 (large amounts of) web content with XML and Markdown, which may yield better results.
5. Decompose Tasks
Large language models (LLMs) often perform better when tasks are broken down into smaller steps. For example, in the search query prompt referenced earlier, you could restructure the prompt to first instruct the model to extract relevant facts, then generate search queries to validate those facts.
Note that clear syntax should be used to distinguish different sections and guide output. In this simple example, decomposing the task from one step to two may not be dramatic, but it makes a significant difference for long texts with many factual claims.
6. Use Affordances
Sometimes, instead of relying solely on the model’s parameters, we can enable the model to use affordances—such as search—to mitigate the risk of fabricated answers and obtain up-to-date information.
A simple way to use affordances is to stop generation when the model calls for an affordance (e.g., SEARCH), then paste the results back into the prompt. Below is an example of a follow-up call after executing a SEARCH request, showing how search results are inserted into the prompt to replace the original SEARCH call.
7. Chain-of-Thought Prompting
This is a variant of the task decomposition technique. Instead of splitting a task into smaller steps, you instruct the model to respond incrementally, outlining all involved steps. This reduces the likelihood of inaccurate results and makes it easier to evaluate the model’s response.
8. Specify Output Structure
Specifying the output structure in a prompt can significantly impact the nature and quality of results. Sometimes, system messages like “Only state true facts” or “Do not fabricate information” are insufficient to address issues. Instead, requiring the model to include citations with responses can reduce the probability of errors.
If you instruct the model to cite source materials when writing statements, those statements are more likely to be grounded. Requiring citations makes the model prone to two errors when generating responses: first, fabricating the response, and second, providing incorrect citations. Note that the closer a citation is to the text it supports, the shorter the distance the model needs to predict the citation, indicating that inline citations are more effective than end-of-content citations for mitigating false content generation.
Similarly, if asked to extract factual statements from a paragraph, the model might extract compound statements like “X is performing Y and Z” (which may be harder to verify). This can be avoided by specifying an output structure, such as (Entity 1, Relationship, Entity 2).
The example below demonstrates the use of citations and guides the model’s response to fit a defined structure.
六、Temperature and Top_p Parameters
Adjusting the temperature parameter changes the model’s output. The temperature parameter ranges from 0 to 2. Higher values (e.g., 0.7) make outputs more random and divergent, while lower values (e.g., 0.2) make outputs more focused and specific. Higher temperatures are suitable for generating fictional stories, while lower temperatures are recommended for legal documents.
Top_probability (Top_p) is another parameter that controls the randomness of model responses, similar to temperature but with a different control mechanism. It is generally recommended to adjust only one of these parameters at a time, not both simultaneously.
七、Provide Grounding Context
One of the most effective ways to obtain reliable answers is to provide the model with data to derive responses from. If your use case relies on up-to-date, reliable information (and is not purely creative), we strongly recommend providing grounding data. Typically, the closer the source material is to the final form of the desired answer, the less work the model has to do—and the lower the chance of errors.
The example below provides the system with the “latest blog post describing the launch of GPT-4 in Azure OpenAI Service” and asks it to name some early customers.
Original source: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/advanced-prompt-engineering