Mastering GPT-4: Insights from Singapore's Prompt Engineering Contest

How I Achieved Victory in Singapore’s GPT-4 Prompt Engineering Competition

A comprehensive exploration of the techniques I employed to leverage the capabilities of Large Language Models.

Last month, I had the distinct privilege of winning Singapore's inaugural GPT-4 Prompt Engineering competition, which attracted over 400 exceptionally talented participants, organized by the Government Technology Agency of Singapore (GovTech).

Prompt engineering combines the artistry of creativity with scientific understanding. This article outlines the strategies I've gathered that can effectively prompt any LLM to fulfill your requirements and exceed expectations!

This article addresses the following topics, where ? denotes beginner-friendly techniques and ? signifies advanced methods:

[?] Structuring prompts with the CO-STAR framework
[?] Utilizing delimiters for prompt segmentation
[?] Designing system prompts with LLM constraints
[?] Analyzing datasets using only LLMs, without plugins or code — featuring a practical example with a real-world Kaggle dataset using GPT-4

1. [?] Structuring Prompts with the CO-STAR Framework

Proper prompt structuring is essential for obtaining optimal responses from an LLM. The CO-STAR framework, developed by GovTech Singapore’s Data Science & AI team, serves as an effective template for crafting prompts. It encompasses vital elements influencing the quality and relevance of the LLM's output, leading to superior responses.

Here’s the breakdown:

(C) Context: Share background information about the task.

This aids the LLM in grasping the specific scenario, ensuring a pertinent response.

(O) Objective: Clearly define the task you want the LLM to accomplish.

A well-articulated objective directs the LLM to align its response with your specific aim.

(S) Style: Indicate the desired writing style for the LLM's output.

This could reflect the style of a notable figure or a specific professional, guiding the LLM to choose words and tone aligned with your needs.

(T) Tone: Establish the desired attitude of the response.

This ensures the LLM’s output resonates with the intended emotional context, such as being formal, humorous, or empathetic.

(A) Audience: Specify who the response is for.

Tailoring the LLM’s output to a particular audience—be it experts, beginners, or children—ensures appropriateness and clarity.

(R) Response: Define the required output format.

This guarantees the LLM produces responses in the exact format needed for subsequent tasks, such as a list or a JSON object. For many LLM applications requiring programmatic manipulation of responses, a JSON format would be ideal.

Practical Application of CO-STAR Here’s a practical example illustrating the utility of the CO-STAR framework. Suppose you are a social media manager looking to draft a Facebook post to promote a new product.

A quick prompt without CO-STAR might be:

> Write a Facebook post to promote my company’s new product, Alpha, called Beta, an ultra-fast hairdryer.

Here’s GPT-4's output:

> Introducing the Revolutionary Beta Hairdryer by Alpha! > - Say goodbye to long, tedious hair drying sessions! Alpha is proud to unveil the Beta, our latest innovation in haircare technology. > - Ultra-Fast Drying: The Beta is designed to cut your drying time in half, thanks to its advanced, powerful motor. Get ready faster and enjoy more time doing what you love! > Shop Now: [Link to product page]

While functional, this output lacks specific nuances and the targeted appeal needed to resonate with your audience.

Utilizing the CO-STAR template leads to a more comprehensive prompt:

> # CONTEXT # > I want to promote my new product, Beta, an ultra-fast hairdryer by Alpha. > > # OBJECTIVE # > Create a Facebook post that encourages clicks on the product link for purchases. > > # STYLE # > Emulate the writing style of successful brands like Dyson. > > # TONE # > Persuasive > > # AUDIENCE # > Target the older generation who typically engages with hair products on Facebook. > > # RESPONSE # > A concise yet impactful Facebook post.

By applying CO-STAR, GPT-4’s response becomes significantly more targeted:

> Embrace Effortless Elegance with Alpha Beta — The Ultra-Fast Hairdryer for the Wise Generation. > Rediscover the joy of simple, effective hair care with Alpha Beta. Our latest innovation is more than just a hairdryer; it’s a promise of swift, gentle, and reliable hair styling for those who appreciate the finer things in life. > - Easy and Intuitive Use: Say goodbye to complicated gadgets. Alpha Beta is crafted for comfort and simplicity, perfect for those who value straightforward, hassle-free technology. > - Time-Saving Technology: We understand your time is precious. That’s why Alpha Beta cuts down drying time significantly, giving you more moments to enjoy life’s pleasures. > Make every day a good hair day with Alpha Beta. Experience the blend of sophistication and simplicity today. > [Your Product Link Here]

The CO-STAR framework ensures you provide all essential elements of your task to the LLM in an organized manner, resulting in a tailored and optimized response.

2. [?] Utilizing Delimiters for Prompt Segmentation

Delimiters are unique tokens that enable the LLM to identify parts of your prompt as distinct units of meaning. This structuring is crucial since the complete prompt arrives at the LLM as a single long sequence of tokens. Delimiters help delineate specific sections for differential treatment.

While delimiters may not enhance response quality for straightforward tasks, their impact becomes significant for complex tasks.

Delimiters as Special Characters A delimiter can be any unique sequence of special characters that typically wouldn’t appear together:

###
===
```
>>>
```

The selection of characters is less important than their uniqueness for the LLM to identify them as content separators rather than standard punctuation.

Here’s how you might implement such delimiters in a prompt:

> Classify the sentiment of each conversation in <<conversations>> as ‘Positive’ or ‘Negative’. Provide sentiment classifications without additional preamble text. > ### > EXAMPLE CONVERSATIONS > [Agent]: Good morning, how can I assist you today? > [Customer]: This product is terrible, nothing like what was advertised! > [Customer]: I’m extremely disappointed and expect a full refund. > [Agent]: Good morning, how can I help you today? > [Customer]: Hi, I just wanted to say that I’m really impressed with your product. It exceeded my expectations! > ### > EXAMPLE OUTPUTS > Negative > Positive > ### > <<< > [Agent]: Hello! Welcome to our support. How can I help you today? > [Customer]: Hi there! I just wanted to let you know I received my order, and it’s fantastic! > [Agent]: That’s great to hear! We’re thrilled you’re happy with your purchase. Is there anything else I can assist you with? > [Customer]: No, that’s it. Just wanted to give some positive feedback. Thanks for your excellent service! > [Agent]: Hello, thank you for reaching out. How can I assist you today? > [Customer]: I’m very disappointed with my recent purchase. It’s not what I expected at all. > [Agent]: I’m sorry to hear that. Could you please provide more details so I can help? > [Customer]: The product is of poor quality and it arrived late. I’m really unhappy with this experience. > >>>

The examples above use the delimiter ###, with section headers EXAMPLE CONVERSATIONS and EXAMPLE OUTPUTS in uppercase for differentiation. The conversations to be classified are fenced within <<CONVERSATIONS>>, allowing the LLM to understand its classification task.

The output from GPT-4 provides sentiment classifications without any additional introductory text:

> Positive > Negative

Delimiters as XML Tags An alternative method is to use XML tags as delimiters, which are enclosed in angle brackets with opening and closing tags, e.g., <tag> and </tag>. This method is effective since LLMs have been trained on web content formatted in XML.

Here’s the same prompt, structured with XML tags:

> Classify the sentiment of the following conversations into one of two classes, using the examples given. Provide sentiment classifications without any introductory text. > <classes> > Positive > Negative > </classes> > <example-conversations> > [Agent]: Good morning, how can I assist you today? > [Customer]: This product is terrible, nothing like what was advertised! > [Customer]: I’m extremely disappointed and expect a full refund. > </example-conversations> > [Agent]: Good morning, how can I help you today? > [Customer]: Hi, I just wanted to say that I’m really impressed with your product. It exceeded my expectations! > </example-conversations> > <example-classes> > Negative > </example-classes> > Positive > <conversations> > [Agent]: Hello! Welcome to our support. How can I help you today? > [Customer]: Hi there! I just wanted to let you know I received my order, and it’s fantastic! > [Agent]: That’s great to hear! We’re thrilled you’re happy with your purchase. Is there anything else I can assist you with? > [Customer]: No, that’s it. Just wanted to give some positive feedback. Thanks for your excellent service! > [Agent]: Hello, thank you for reaching out. How can I assist you today? > [Customer]: I’m very disappointed with my recent purchase. It’s not what I expected at all. > [Agent]: I’m sorry to hear that. Could you please provide more details so I can help? > [Customer]: The product is of poor quality and it arrived late. I’m really unhappy with this experience. > </conversations>

Using the same nouns for XML tags as those used in the prompt instructions helps the LLM understand the relationship between your instructions and the delimiters.

The structured instructions yield precise responses from GPT-4:

> Positive > Negative

3. [?] Designing System Prompts with LLM Constraints

Before diving deeper, it's essential to clarify that this section is particularly relevant to LLMs that feature a System Prompt capability, with ChatGPT being the most notable example.

Terminology Related to System Prompts The terms “System Prompts,” “System Messages,” and “Custom Instructions” are often used interchangeably in discussions about ChatGPT, leading to confusion. OpenAI has even released resources to clarify these terms. Here's a brief overview:

“System Prompts” and “System Messages” refer to interactions with ChatGPT via its Chat Completions API.
“Custom Instructions” apply when interacting through the user interface at https://chat.openai.com/.

Despite the differing terminologies, all three refer to the same concept, so don’t let terminology confuse you! This section will primarily use the term “System Prompts.”

What Are System Prompts? System Prompts are additional instructions that dictate how the LLM should respond. They are considered supplemental as they are positioned at the start of each new prompt in a chat.

When Should System Prompts Be Used? You might wonder: Why should I include instructions in the System Prompt if I can convey them in the first prompt of a new chat?

The reason lies in the LLM's limited conversational memory. As conversations progress, the LLM may “forget” the initial prompt, rendering those instructions ineffective.

Conversely, when instructions are embedded in the System Prompt, they are automatically included at the beginning of every new prompt, ensuring the LLM retains those guidelines throughout the conversation.

In summary: Utilize System Prompts for instructions you wish the LLM to remember during the entire chat.

What Should System Prompts Include? Instructions typically encompass the following categories:

Task definition, ensuring the LLM consistently understands its responsibilities throughout the chat.
Output format, enabling the LLM to remember how to respond.
Guardrails, delineating how the LLM should not respond. Guardrails represent a growing area in LLM governance, outlining the boundaries within which an LLM can operate.

For instance, a System Prompt may look like this:

> You will answer questions using this text: [insert text]. > You will respond with a JSON object in this format: {'Question': 'Answer'}. > If the text lacks sufficient information to provide an answer, respond with “NA.” > You are only permitted to answer questions related to [insert scope]. Never respond to questions about demographic data, such as age, gender, or religion.

Each segment aligns with the categories outlined:

Extra: Making LLM Guardrails Dynamic In the above example, guardrails are established through a few sentences in the System Prompt. However, if you need different guardrails at various points in the conversation, you might encounter challenges.

Currently, users of the ChatGPT interface have limited options for this. However, those interacting with ChatGPT programmatically are fortunate! The focus on effective LLM guardrails has led to open-source tools, like NeMo Guardrails developed by NVIDIA, allowing for dynamic guardrails that evolve with the conversation flow.

4. [?] Analyzing Datasets Using Only LLMs, Without Plugins or Code

You may be aware of OpenAI’s Advanced Data Analysis plugin available for premium accounts in ChatGPT, which allows users to upload datasets and run code for accurate analysis.

However, did you know that analyzing datasets effectively with LLMs can be achieved without such plugins? First, let’s explore the strengths and limitations of using LLMs alone for dataset analysis.

Types of Dataset Analysis That LLMs Are *Not* Great At LLMs have limitations in performing precise mathematical calculations, making them ill-suited for tasks requiring accurate quantitative analysis, such as:

Descriptive Statistics: Summarizing numerical columns through metrics like mean or variance.
Correlation Analysis: Obtaining the precise correlation coefficient between columns.
Statistical Analysis: Conducting hypothesis testing to assess statistically significant differences among data groups.
Machine Learning: Executing predictive modeling using techniques like linear regression, gradient boosting, or neural networks.

This is precisely why OpenAI’s Advanced Data Analysis plugin exists—to allow programming languages to manage such tasks on datasets.

Why Consider Analyzing Datasets Using Only LLMs? Types of Dataset Analysis That LLMs Excel At LLMs are exceptionally skilled at recognizing patterns and trends. Their training on vast and varied datasets enables them to identify intricate patterns that may not be immediately visible.

This makes them ideal for tasks centered on pattern identification within datasets, such as:

Anomaly Detection: Spotting unusual data points deviating from the norm based on one or more column values.
Clustering: Grouping data points with shared characteristics across columns.
Cross-Column Relationships: Identifying combined trends across multiple columns.
Textual Analysis (for text-based columns): Categorizing based on topic or sentiment.
Trend Analysis (for datasets with temporal elements): Detecting patterns, seasonal variations, or trends within time-related columns.

For these pattern-oriented tasks, using LLMs alone may yield quicker and better results than utilizing code! Let’s illustrate this with a practical example.

Analyzing a Kaggle Dataset Using Only LLMs We’ll examine a widely-used Kaggle dataset focused on Customer Personality Analysis, which aims to segment a company’s customer base for better understanding.

To facilitate validation later, we’ll subset this dataset to 50 rows and retain the most pertinent columns, resulting in a simplified analysis dataset where each row represents a customer and each column contains customer information.

Imagine you’re part of the marketing team and tasked with utilizing this customer information dataset to inform marketing strategies. This involves two steps: first, creating meaningful customer segments; second, formulating marketing ideas for each segment. This business problem is where LLMs excel in pattern recognition.

Let’s create a prompt for this task using four prompt engineering techniques:

Breaking down a complex task into simpler steps.
Referencing intermediate outputs from each step.
Formatting the LLM’s response.
Keeping task instructions separate from the dataset.

> System Prompt: > I want you to act as a data scientist to analyze a dataset. Do not fabricate information that is not in the dataset. For each analysis requested, provide a precise and definitive answer without giving code or instructions for analysis on other platforms.

> Prompt: > # CONTEXT # > I sell wine. I have a dataset containing: [year of birth, marital status, income, number of children, days since last purchase, amount spent].

> #############

> # OBJECTIVE # > I want you to use the dataset to cluster my customers and then suggest marketing strategies for each group. Follow this step-by-step process and avoid using code:

> 1. CLUSTERS: Use the dataset columns to group the rows so that customers in the same cluster share similar values, while those in different clusters show distinct differences. Ensure each row belongs to only one cluster. > > For each cluster identified: > 2. CLUSTER_INFORMATION: Describe the cluster based on the dataset columns. > 3. CLUSTER_NAME: Derive a short name for the customer group in this cluster from [CLUSTER_INFORMATION]. > 4. MARKETING_IDEAS: Suggest marketing strategies for this customer group. > 5. RATIONALE: Justify why [MARKETING_IDEAS] is suitable and effective for this customer group.

> #############

> # STYLE # > Business analytics report

> #############

> # TONE # > Professional, technical

> #############

> # AUDIENCE # > My business partners. Persuade them that your marketing strategy is well-researched and data-driven.

> #############

> # RESPONSE: MARKDOWN REPORT # > <for [clusters]="" cluster="" each="" in=""> > — Customer Group: [CLUSTER_NAME] > — Profile: [CLUSTER_INFORMATION] > — Marketing Ideas: [MARKETING_IDEAS] > — Rationale: [RATIONALE] > </for>

> <annex> > Provide a table listing row numbers for each cluster to substantiate your analysis. Use these table headers: [[CLUSTER_NAME], List of Rows]. > </annex>

> #############

> # START ANALYSIS # > If you comprehend, please request my dataset.

Subsequently, we present the dataset to the LLM in a CSV format. The LLM responds with its analysis in the requested markdown report format.

Following this, GPT-4 provides its analysis in the requested markdown report format:

Validating the LLM’s Analysis For brevity, let’s validate two customer groups generated by the LLM, namely Young Families and Discerning Enthusiasts.

Young Families - Profile generated by LLM: Born after 1980, Married or in a partnership, Moderate to low income, Have children, Frequent small purchases. - Rows clustered into this category by LLM: 3, 4, 7, 10, 16, 20

Examining the dataset reveals:

This aligns perfectly with the LLM's profile, showcasing its ability to cluster the row with a null value without prior preprocessing.

Discerning Enthusiasts - Profile generated by LLM: Diverse age range, Any marital status, High income, Varied children status, Significant spending on purchases. - Rows clustered into this group by LLM: 2, 5, 18, 29, 34, 36

Reviewing the dataset yields:

This also corresponds well with the identified profile!

This example illustrates LLMs' proficiency in pattern recognition, transforming complex datasets into actionable insights while ensuring that analyses are grounded in the dataset's factual accuracy.

What If We Used ChatGPT’s Advanced Data Analysis Plugin? To provide a comprehensive view, I attempted the same task using the prompt and requested ChatGPT to perform the analysis with code, activating its Advanced Data Analysis plugin. The goal was for the plugin to apply a clustering algorithm like K-Means directly on the dataset to obtain customer groups and synthesize their profiles for marketing strategies.

However, multiple attempts resulted in error messages without outputs, even with a dataset of only 50 rows:

Currently, the Advanced Data Analysis plugin seems to handle simpler tasks like calculating descriptive statistics or creating graphs effectively, but more advanced analyses requiring algorithmic computations may sometimes encounter errors and yield no outputs, possibly due to computational limitations.

So… When Should We Analyze Datasets Using LLMs? The answer is contingent upon the type of analysis required.

For tasks demanding precise mathematical calculations or complex, rule-based processing, traditional programming methods remain superior.

Conversely, for pattern-recognition tasks, conventional programming and algorithmic approaches can be challenging and time-consuming. In contrast, LLMs excel in such scenarios, providing supplementary outputs like annexes to support their analyses along with comprehensive reports in markdown format.

Ultimately, the decision to employ LLMs depends on the nature of the task, weighing the strengths of LLMs in pattern recognition against the precision offered by traditional programming techniques.

Returning to Prompt Engineering! Before concluding this section, let’s revisit the prompt utilized for the dataset analysis and identify the key prompt engineering techniques employed:

> Prompt: > # CONTEXT # > I sell wine. I have a dataset containing: [year of birth, marital status, income, number of children, days since last purchase, amount spent].

> #############

> # OBJECTIVE # > I want you to use the dataset to cluster my customers into groups and then suggest marketing strategies for each group. Follow this step-by-step process and avoid using code:

> #############

> # STYLE # > Business analytics report

> #############

> # TONE # > Professional, technical

> #############

> # AUDIENCE # > My business partners. Persuade them that your marketing strategy is well-researched and data-driven.

> #############

> <annex> > Provide a table listing row numbers for each cluster to substantiate your analysis. Use these table headers: [[CLUSTER_NAME], List of Rows]. > </annex>

> #############

> # START ANALYSIS # > If you comprehend, please request my dataset.

Technique 1: Breaking Down Complex Tasks into Simple Steps LLMs excel at executing simple tasks but struggle with complex ones. Therefore, for intricate tasks, it’s crucial to decompose the task into clear, step-by-step instructions that the LLM can follow. The aim is to provide the LLM with a roadmap similar to what a human would take.

In this instance, the steps are outlined as:

> Use this step-by-step process and avoid using code:

This method is far superior to simply instructing the LLM to "Cluster the customers into groups and suggest marketing strategies for each group."

By providing step-by-step guidance, LLMs are significantly more likely to yield accurate results.

Technique 2: Referencing Intermediate Outputs from Each Step While outlining the step-by-step process for the LLM, intermediate outputs from each step are assigned capitalized variable names, namely CLUSTERS, CLUSTER_INFORMATION, CLUSTER_NAME, MARKETING_IDEAS, and RATIONALE.

Capitalization distinguishes these variable names from the accompanying instructions. Later, these outputs can be referenced using square brackets, as in [VARIABLE_NAME].

Technique 3: Formatting the LLM’s Response In this instance, we request a markdown report format, which enhances the presentation of the LLM's response. Utilizing variable names from intermediate outputs assists in structuring the report.

> <annex> > Provide a table listing row numbers for each cluster to substantiate your analysis. Use these table headers: [[CLUSTER_NAME], List of Rows]. > </annex>

By prompting the LLM to output the report as a downloadable file, users can leverage its response while drafting their final reports.

Technique 4: Separating Task Instructions from the Dataset Notice that the dataset was not provided to the LLM initially. Instead, the prompt contains only task instructions, followed by:

> # START ANALYSIS # > If you comprehend, please request my dataset.

ChatGPT then confirms its understanding, and we present the dataset in CSV format in the next prompt.

Why Separate Instructions from the Dataset? The straightforward answer is that LLMs have a limit on their context window, meaning they can only accept a certain number of tokens in a single prompt. A lengthy prompt combining instructions and data may exceed this limit, leading to truncation and loss of information.

A more nuanced explanation is that separating instructions from the dataset helps maintain clarity, reducing the likelihood of overlooking information. Users might have encountered situations where an LLM “forgets” an instruction due to a lengthy prompt. For example, when asking for a 100-word response, the LLM might provide a longer paragraph. By receiving the task instructions first, before the dataset they pertain to, the LLM can first digest what it must do before executing it on the provided dataset.

Note, however, that this separation is feasible with chat-based LLMs, which maintain conversational memory, unlike completion-based LLMs.

Closing Thoughts As we conclude this article, I'd like to share some personal reflections on this incredible journey.

First, a heartfelt thank you to GovTech Singapore for orchestrating such an impressive competition.

Secondly, a big shout-out to my fellow extraordinary competitors, each bringing unique strengths, making the competition as enriching as it was challenging! The final round, with us competing on stage before a live audience, is an experience I will cherish forever.

For me, this was more than just a competition; it was a celebration of talent, creativity, and the joy of learning. I'm genuinely excited to see what lies ahead!

I enjoyed writing this, and if you enjoyed reading it, I would greatly appreciate a moment of your time to leave some claps and a follow! You can also buy me a ? to support my next article. ?

See you next time! Sheila