Harnessing Zero-Shot and Few-Shot Learning in LLMs

An overview of zero-shot and few-shot learning techniques.

Large Language Models (LLMs), like OpenAI’s ChatGPT, exhibit remarkable abilities to carry out tasks they were not specifically trained for. Sometimes, they can manage this without any prior examples; other times, a few labeled instances are needed for the model to recognize the pattern.

Two notable methods that aid LLMs in tackling unfamiliar tasks are zero-shot and few-shot prompting. This article will delve into how these techniques function, provide examples, and outline appropriate contexts for their application.

Originally published on the neptune.ai blog.

Understanding the training process of LLMs.

Understanding Zero-Shot and Few-Shot Learning in LLMs

The primary aim of zero-shot and few-shot learning is to enable a machine-learning model to execute a task it hasn’t been trained on. A natural question arises: what exactly are LLMs trained to accomplish?

LLMs typically undergo two phases of training:

Pre-training: The model is exposed to a vast collection of text, learning to predict the next word based on preceding context.
Fine-tuning: The next-word prediction model is refined to behave like a chatbot, responding to user inquiries in a conversational style that meets human expectations.

For instance, let’s see if OpenAI’s ChatGPT (powered by GPT-4) can complete a well-known pangram in English:

As anticipated, it accurately finishes the well-known sentence, likely having encountered it multiple times during pre-training. Users familiar with ChatGPT know that it tends to provide extensive factual knowledge and strives to be helpful while avoiding inappropriate language.

However, ChatGPT and similar LLM-driven chatbots can tackle a broader range of tasks, such as language translation, sentiment detection, and even coding, which they haven't been explicitly trained for.

To facilitate these new tasks, zero-shot and few-shot prompting techniques are employed.

Zero-Shot Prompting

Zero-shot prompting involves requesting the model to perform a task it hasn’t been trained on.

The term "zero" indicates that no examples are given for solving the new task. Instead, we simply ask the model to execute it, relying on its general language understanding and training background to formulate a response.

For example, if you were to ask the model to translate a sentence from one language to another, it could likely generate a reasonable translation, despite not having been specifically trained for this task. Likewise, many LLMs can discern between negative and positive sentiments without formal training in sentiment analysis.

Few-Shot Prompting

In contrast, few-shot prompting involves asking an LLM to address a new task while supplying examples of how to tackle it.

This method allows the model to learn from user-provided examples, mimicking a small dataset of training data. Unlike the pre-training or fine-tuning stages, this process does not modify the model’s weights. Instead, the model leverages the context given in the prompt when generating its response, retaining this context throughout the conversation, but without retaining any new information for future sessions.

Sometimes, variants of few-shot learning are highlighted, especially for assessing and comparing model performance. "One-shot" refers to providing a single example, while "two-shot" involves two examples, and so forth.

Distinguishing Few-Shot Prompting from Few-Shot Learning

The terms "few-shot learning" and "zero-shot learning" are established concepts in machine learning that predate the emergence of LLMs. Within the context of LLMs, these terms are sometimes conflated with “few-shot prompting” and “zero-shot prompting,” although they are not synonymous.

Few-shot prompting is about crafting a prompt with examples of input-output pairs to help an LLM identify a pattern. Conversely, few-shot learning pertains to the model’s adaptation stemming from few-shot prompting, allowing it to transition from being unable to solve a task to successfully executing it due to the provided examples.

In LLMs, this "learning" is transient and only applicable to the specific chat session, as the model’s parameters remain unchanged and do not retain new knowledge or abilities.

Applications of Zero-Shot Prompting in LLMs

Zero-shot prompting depends on the model's existing knowledge to generate answers. Therefore, it is most effective for general requests rather than those needing specialized or proprietary information.

When to Use Zero-Shot Prompting

Zero-shot prompting can be effectively used in the following scenarios:

Simple Tasks: For straightforward, knowledge-based tasks that are well-defined, such as word definitions or general knowledge questions.
Tasks Requiring General Knowledge: Tasks that depend on the model’s pre-existing knowledge, like summarizing known information. These typically involve clarifying or detailing established topics rather than exploring new areas. For example, “Who was the first person to climb Mount Everest?” or “Describe the process of photosynthesis.”
Exploratory Queries: When seeking a broad overview of a topic or a starting point for research. These inquiries aim for a general overview rather than specific answers, such as “How do different cultures celebrate New Year?” or “What are the main theories in cognitive psychology?”
Direct Instructions: When you can provide straightforward instructions that don’t necessitate examples for comprehension.

When Not to Use Zero-Shot Prompting

Avoid using zero-shot prompting in these scenarios:

Complex Tasks Requiring Context: If a task demands nuanced understanding or specialized knowledge that the model likely hasn’t acquired during training.
Highly Specific Outcomes Desired: When a response needs to fit a specific format or style, the model may struggle without guidance from input-output examples.

Examples of Zero-Shot Prompting Use Cases

Zero-shot prompting is effective for numerous simple NLP tasks, including language translation or sentiment analysis.

For instance, consider translating a Polish sentence into English, which is straightforward for ChatGPT:

Next, let’s examine sentiment analysis through zero-shot prompting:

Again, the model produced an accurate sentiment classification, even without explicit training for this task. In a more nuanced case, it also provided reasoning behind its classification.

Where Zero-Shot Prompting Fails

Let’s look at two instances where zero-shot prompting may fall short. Remember that these involve complex tasks requiring context and situations needing highly specific outcomes.

Consider the following two prompts:

“Explain the implications of the latest changes in quantum computing for encryption, considering current technologies and future prospects.”
“Write a legal brief arguing the case for a specific, hypothetical scenario where an AI created a piece of art, and there’s a copyright dispute between the AI’s developer and a gallery claiming ownership.”

Feel free to test these prompts with your preferred LLM, but you may find the responses lacking in usefulness.

The first prompt about quantum computing necessitates an understanding of current advancements in both fields. Without specific examples or context, the LLM might fail to provide accurate reflections of the latest research or nuanced implications.

The second prompt regarding a legal brief requires the LLM to adhere to legal conventions and comprehend the complexities of copyright law as it pertains to AI—many aspects of which remain debated. A zero-shot prompt doesn’t offer the necessary guidelines or examples for the model to produce an accurate response.

An illustration of zero-shot prompting limitations.

Applications of Few-Shot Prompting

Few-shot prompting allows the LLM to base its responses on the examples provided. This method is particularly useful when a few examples can reveal a pattern or when a specific output format is required. However, high task complexity and latency constraints often hinder its use.

When to Use Few-Shot Prompting

You may consider few-shot prompting in the following situations:

When Zero-Shot Prompting is Insufficient: If the model struggles to perform a task without examples, but you believe a few examples may suffice.
Limited Training Data Available: When only a few examples are available, and fine-tuning the model isn’t viable, few-shot prompting may be the only option.
Custom Formats or Styles: To guide the model toward a specific output format, style, or structure, examples can be more effective than verbal instructions.
Teaching the Model New Concepts: When you want the model to grasp an unfamiliar concept, a few examples can provide a quick introduction. However, remember this knowledge is only retained for the current conversation.
Improving Accuracy: When precision is critical, and you want to ensure the model understands the task clearly.

When Not to Use Few-Shot Prompting

Consider avoiding few-shot prompting in these cases:

General Knowledge Tasks: For straightforward tasks that don’t need specific formats or nuanced understanding, few-shot prompting may complicate the query unnecessarily.
Speed or Efficiency is a Priority: Few-shot prompting involves more input, which can slow down composition and processing.
Insufficient Examples: If the task is too complex to convey in a few examples, or if the examples you have might confuse the model due to variability.
Complex Reasoning Tasks: If a task demands multiple reasoning steps, even several examples may not be sufficient for the LLM to identify the desired pattern.

Examples of Few-Shot Prompting Use Cases

Let’s explore scenarios where few-shot prompting proves exceptionally effective.

#### Adapting Tasks to Specific Styles

Imagine you work for a company that sells Product B. Your main rival is Product A. After collecting reviews online for both products, you want to assess user perceptions of which product is better. To achieve this, you can prompt the LLM to classify the sentiment of the reviews.

One approach is to create a few examples as follows:

Positive reviews of your product (B) are labeled as positive.
Negative reviews of your product (B) are labeled as negative.
Positive reviews of the competing product (A) are labeled as negative.
Negative reviews of the competing product (A) are labeled as positive.

With this setup, the model should ideally recognize the pattern.

Indeed, the model identified the pattern accurately, predicting a positive review of a competitor’s product as negative and even explaining its reasoning:

> (...) positive sentiment expressions for Product A are labeled as “negative” and negative sentiment expressions are labeled as “positive” (and the conventional labeling for Product B).

This illustrates how few-shot prompting can guide the model in addressing a conventional task (sentiment classification) in a unique manner based on a specific labeling format.

#### Teaching an LLM New Concepts

Few-shot prompting excels at teaching an LLM new or imaginary concepts. This is beneficial when you need the model to discern patterns in your data that require understanding specific quirks and details where general knowledge falls short.

For example, let’s utilize few-shot prompting to teach an LLM the basic grammar of a newly invented language, Blablarian (widely spoken in the Kingdom of Blabland, if you’re curious).

As demonstrated, the model generated what can be considered a correct translation, discerning the meaning of words and distinguishing between different pronouns. This is purely an instance of in-context few-shot learning, as there’s no way Blablarian documents could have been included in the model’s pre-training datasets.

This example effectively captures the essence of few-shot learning. Had we asked the model to translate “How old is he?” from English to Blablarian without any examples (using zero-shot prompting), it wouldn’t have succeeded simply because Blablarian is a fictional language. Nevertheless, the model's general understanding of language and grammar allows it to pick up the patterns of a newly invented language.

Where Few-Shot Prompting Fails

Lastly, let’s look at a situation where few-shot prompting falls short.

Consider this well-known example circulating online:

> Prompt: The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1. > A: The answer is False. > The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24. > A: The answer is True. > The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24. > A: The answer is True. > The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2. > A: The answer is False. > The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:

> Response: The answer is True.

This answer is incorrect. A few examples are insufficient to grasp the underlying pattern; the problem requires comprehension of several fundamental concepts and step-by-step reasoning. Even with a larger number of examples, success is unlikely.

This type of problem may not be solvable through pattern recognition, and no amount of prompt engineering can assist.

Interestingly, modern LLMs can recognize when they’re faced with a problem they cannot solve. These chatbots will resort to tools better suited for the task, akin to how one would use a calculator to multiply large numbers.

For instance, OpenAI’s ChatGPT, rather than fabricating a response, may generate a snippet of Python code to address the question. (This code is visible when you click on “Finished analyzing.”) ChatGPT will execute the code within an interpreter and provide the answer based on the code’s output. In this instance, this method yielded a correct response:

However, this is no longer “few-shot learning.” The model did not utilize the provided examples and would likely produce the same answer in a zero-shot setting.

An illustration of few-shot prompting failures.

Conclusion

This article explored zero-shot and few-shot prompting in Large Language Models, emphasizing their capabilities, applications, and limitations.

Zero-shot learning empowers LLMs to tackle tasks they haven’t been explicitly trained for, relying solely on their pre-existing knowledge and general language comprehension. This approach is best suited for straightforward tasks and exploratory inquiries, as well as when clear, direct instructions are provided.

Few-shot learning allows LLMs to adapt to specific tasks, formats, or styles, enhancing accuracy for more complex queries by integrating a small number of examples into the prompt.

Nevertheless, both methods have their constraints. Zero-shot prompting may fall short for complex tasks requiring nuanced understanding or specific outcomes. Few-shot learning, while effective, may not be ideal for general knowledge tasks or when efficiency is crucial, and it can struggle with overly complex tasks that require more examples for clarity.

As users and developers, understanding the appropriate application of zero-shot and few-shot prompting can help us tap into the full potential of Large Language Models while navigating their limitations.

A summary of zero-shot and few-shot prompting principles.

Thank you for reading! If you found this article helpful, consider subscribing for email updates on my upcoming posts. Need consulting? Feel free to ask me anything or book a 1:1 session here. You might also enjoy exploring my other articles. Can’t decide? Here are a few recommendations:

Designing RAGs: A guide to Retrieval-Augmented Generation design choices.
Evaluating Large Language Models: How to assess the performance of your LLM effectively.
Organizing a Machine Learning Monorepo with Pants: Streamlining your machine learning workflow management.