ChatGPT doesn't know it's bad at math

April 4, 2023 - 7 mins read

ChatGPT: master paraphraser

Recently there’s been a lot of media attention around AI and in particular ChatGPT. Friends and I have played with it and had some funny conversations but it didn’t take us long to start poking it and to discover some limitations (expected or not); ChatGPT is a language model and so won’t be capable of performing a lot of the tasks that people imagine an AI should be able to perform. I asked ChatGPT to generate a response to the prompt

What is ChatGPT and how does it differ from a general AI

ChatGPT is a large language model that is trained on a massive dataset of natural language text using deep learning techniques, specifically the GPT-3.5 architecture. It is designed to be able to understand and generate human-like language in response to user inputs, making it well-suited for conversational applications such as chatbots, virtual assistants, and text-based question answering systems.

In contrast to a general AI, which is designed to perform a wide range of tasks across different domains, ChatGPT is specialized in language processing tasks. While general AI systems may be able to perform tasks such as image recognition, playing games, or controlling a robot, they may not be as proficient in language tasks as a language model like ChatGPT.

Another key difference is that general AI systems are typically built to be more adaptable and flexible, able to learn and generalize across a wide range of tasks and situations, whereas ChatGPT is specifically designed to excel at language processing tasks, but may not be as effective at tasks outside of this domain.

What this means is that, roughly speaking, ChatGPT has access to training information and it can attempt to understand and formulate responses based on that data using natural language processing (NLP) methods such as lemmatization (a method to reduce words to common root words such that they can be processed as single items, these are often then further reduced over sets of synonyms). What this also means is that ChatGPT is unable or poor at performing tasks that diverge too much from natural language processing or that require knowledge not contained within the training data, such as mathematics, or simply presenting opinions:

give me an opinion

ChatGPT: As an AI language model, I do not have personal opinions or beliefs since I am designed to remain impartial and objective. My purpose is to provide information and answer questions to the best of my abilities based on the data and knowledge that I have been trained on.

ChatGPT: unreliable chef

As you can see above, ChatGPT has been designed with some knowledge of these limitations. However, I’ve also discovered a few interesting blind spots, instances where it is confidently incorrect: For example, I asked for a recipe that would provide certain ratios of macronutrients; this involves solving three simultaneous equations and the results were incorrect consistently. Here’s the attempts:

design a mixture of raw peanuts, raw almonds, and raisins that provides a ratio of carbohydrates to proteins to fats of 22.7 to 7 to 13.1

ChatGPT begins by stating the goal:

To design a mixture of raw peanuts, raw almonds, and raisins that provides a ratio of carbohydrates to proteins to fats of 22.7 to 7 to 13.1, we need to start by calculating the percentage of each macronutrient in the desired ratio.

Let's assume we want to make a 100-gram mixture.

The ratio of carbohydrates to proteins to fats of 22.7 to 7 to 13.1 can be simplified to 227:70:131.

and then immediately makes a small arithmetic error:

To find the percentage of carbohydrates, we divide 227 by the total ratio (227+70+131) and multiply by 100. This gives us:

227/(227+70+131) x 100 = 50.89%

To find the percentage of protein, we divide 70 by the total ratio and multiply by 100. This gives us:

70/(227+70+131) x 100 = 15.73%

To find the percentage of fats, we divide 131 by the total ratio and multiply by 100. This gives us:

131/(227+70+131) x 100 = 29.38%

These calculations are all off by a few percent:

$227/428\approx 53\%$
$70/428\approx 16\%$
$131/428\approx 31\%$

ChatGPT fails to detect or correct these errors and continues to state simultaneous equations with three unknowns:

So, we need the mixture to have approximately 50.89 grams of carbohydrates, 15.73 grams of protein, and 29.38 grams of fats.

Now, we can use the nutrition information for raw peanuts, raw almonds, and raisins to calculate the amounts of each ingredient needed to achieve the desired macronutrient ratios.

Raw peanuts have approximately 16.1 grams of carbohydrates, 25.8 grams of protein, and 49.2 grams of fat per 100 grams.

Raw almonds have approximately 21.7 grams of carbohydrates, 21.2 grams of protein, and 49.9 grams of fat per 100 grams.

Raisins have approximately 79.2 grams of carbohydrates, 3.1 grams of protein, and 0.5 grams of fat per 100 grams.

To simplify the calculation, we can assume that the mixture contains only these three ingredients and no other ingredients or additives.

Let's assume that we use x grams of raw peanuts, y grams of raw almonds, and z grams of raisins in the mixture.

The amount of carbohydrates in the mixture can be calculated as:

16.1x + 21.7y + 79.2z = 50.89

The amount of protein in the mixture can be calculated as:

25.8x + 21.2y + 3.1z = 15.73

The amount of fats in the mixture can be calculated as:

49.2x + 49.9y + 0.5z = 29.38

We can solve these three equations simultaneously to find the values of x, y, and z that satisfy the macronutrient ratio requirements.

The solution to these equations is x=0.782, y=0.953, and z=1.588.

This means that we need approximately 0.782 grams of raw peanuts, 0.953 grams of raw almonds, and 1.588 grams of raisins in the mixture to achieve the desired ratio of carbohydrates to proteins to fats of 22.7 to 7 to 13.1. We can adjust the amounts of each ingredient as necessary to make a larger or smaller batch.

These solutions are incorrect, for example $16.1\times 0.782 + 21.7\times 0.953 + 79.2\times 1.588 \approx 159.04$. If we rewrite these and solve them ourselves:

$$x_1x+y_1y+z_1z=A$$ $$x_2x+y_2y+z_2z=B$$ $$x_3x+y_3y+z_3z=C$$ $$\Rightarrow \boxed{z=\left[A-x_1x-y_1y\right]\frac{1}{z_1}}$$ $$\Rightarrow\boxed{x=\left[B z_1-z_2A + y\left(z_2y_1-y_2z_1\right)\right]\frac{1}{z_1x_2-z_2x_1}}$$ $$ky=C+A\left[\frac{z_2x_3}{z_1x_2-z_2x_1}-\frac{z_3}{z_1}-\frac{z_2z_3x_1}{z_1}\frac{1}{z_1x_2-z_2x_1}\right]$$ $$+B\frac{1}{z_1x_2-z_2x_1}(z_3x_1-z_1x_3)$$ $$\Rightarrow\boxed{y = \frac{1}{k}\left[C+AA’+BB’\right]}$$ where $k$ is: $$k=y_3+\frac{z_3y_1}{z_1}+\frac{z_2y_1-y_2z_1}{z_1x_2-z_2x_1}\left(x_3-\frac{z_3x_1}{z_1}\right)$$ and we can directly calculate the coefficients: $$A’\approx 0.7;B’\approx-1.95;k\approx10.34$$ and so the unknowns can be calculated: $$y\approx 3.32;x\approx -2.14;z\approx0.17$$

So we see that ChatGPT is weak at arithmetic and linear algebra. It uses the incorrect calculations to try to scale the amounts and arrives at incorrect amounts of each ingredient, and this is where we start to encounter halting errors:

make a recipe for 100g of that

ChatGPT: Based on the calculations above, here's a recipe for a 100-gram mixture of raw peanuts, raw almonds, and raisins that provides a ratio of carbohydrates to proteins to fats of 22.7 to 7 to 13.1:

    7.82 grams of raw peanuts
    9.53 grams of raw almonds
    15.88 grams of raisins

To make the mixture, simply combine the above amounts of each ingredient in a bowl and mix well. The mixture can be stored in an airtight container for up to several weeks.

Note that the above recipe is for a 100-gram mixture, but you can scale it up or down as needed to make more or less of the mixture.

You’ll notice that the amounts listed in the above recipe sum to less than 100g (33.23g) and pointing this out generates a ‘correction’ of the response which is still incorrect in the same way.

The next combination offered for the recipe is (50.89,15.73,29.38). That is then corrected to (37.02,11.38,21.60) and then finally to:

50g raisins
15.5g almonds
34.5g peanuts

In previous trials of this problem, the third solution was never reached, instead there was an unknown error during response generation. This third combination does at least equal 100g but does it solve the original problem? Taking the nutritional information in ChatGPT’s training data then we can formulate the following to express the actual amounts of macronutrients provided by this recipe: $$C = 0.792r +0.217a + 0.161p \stackrel{?}{=} 50.89$$ $$P = 0.031r +0.212a + 0.258p \stackrel{?}{=} 15.73$$ $$F = 0.005r +0.499a + 0.492p \stackrel{?}{=} 29.38$$

where $r=50;a=15.5;p=34.5$:

$$\Rightarrow C\approx 48.52;P\approx 13.74;F\approx 24.96$$

So the recipe generated by ChatGPT is significantly incorrect due to arithmetic errors. However, what is actually interesting to me is that the AI struggles to detect this and to correct it. But, this was a demonstration of a well-understood limitation and also a demonstration that it’s important to double-check any recipes that ChefGPT generates.