Exploring GPT-4 | The Limitations Of The OpenAI Tool

A new version of the AI chatbot that took the world by storm has just landed. It’s already impressing its users by offering expert content on various subjects, describing images, and even closing in on telling jokes that are actually funny!

While it has clearly improved on some of its predecessor’s flaws, there are still aspects that users need to be vigilant about, especially since it makes the same habitual mistakes that first gave rise to concern when it launched.

Even with its improved language skills, GPT-4 is still not on par with human intelligence, creativity and doing the due diligence that goes into writing an assignment, blog, article, or any other form of content.

Here are a few reasons why GPT-4 is impressive but you’ll still need to be careful when using the AI chatbot.

#1. Improved—but not by much

There’s no denying that many are applauding OpenAI’s launch of GPT-4, however, users shouldn’t expect to be blown away by the latest version.

While OpenAI states that it’s pleased with the human-level performance that’s being exhibited by the chatbot when it comes to academic and professional benchmarks, they also acknowledge the differences between GPT-3.5 and GPT-4 are subtle.

The differences between the two versions become clear when the complexity of the task reaches a certain threshold and the creativity, reliability, and ability to comprehend nuanced instructions are more pronounced in GPT-4 than GPT-3.5.

The San Francisco start-up is also quite anxious to address the safety concerns. The company claims that GPT-4 is 82% less likely to respond to requests for disallowed content. This may sound like a promising figure until you realise the sheer volume of requests that are going to pass through its servers every minute.

This is perhaps why the new release also comes with a disclaimer that acknowledges that GPT-4 has a number of limitations including adversarial prompts, social biases, and ‘hallucinations’ that the company is currently working to address.

Microsoft has also confirmed that Bing is running GPT-4 for the past five weeks and that there are stories emerging about how people have been able to break through the safety barriers and produce entertaining—and even alarming results.

#2. Better accuracy but it still makes mistakes

GPT-4 is said to be much more accurate than its predecessor and Oren Etzioni—an AI researcher and professor—wanted to test out the accuracy of the statement. So here’s what he did.

When he first tried the new version of the AI chatbot, he asked a very straightforward question: ‘What’s the relationship between Oren Etzioni and Eli Etzioni?’ While GPT-3.5 had always responded to the question inaccurately, the new version responded correctly indicating that it had a broader range of knowledge than the earlier version.

The chatbot went on to say that Oren Etzioni is a computer scientist and was also the CEO of the Allen Institute for Artificial Intelligence, while Eli Etzioni is an entrepreneur.

Now the majority of the answer was correct, however, GPT-4 did not know that Etzioni had only recently stepped down from his post of CEO at the Allen Institute.

While the accuracy has indeed increased the bot still lacks knowledge of the latest changes around the world and users should be mindful when taking the information it provides as 100% accurate.

#3. It knows about what’s happened, but no theories on what can happen

GPT-4 has a reasonable handle on the things that have already happened, which is a great feature to have, however, it’s less adept at offering up any hypotheses about the future.

The version seems to draw on hypotheses that others have said and couldn’t really come up with any new guesses or theories on its own, which speaks to the lack of creativity and critical thinking skills that generally requires human intelligence.

When asked about the emerging problems in Natural Language Processing (NLP) over the next decade—referring to the research that drives the design and development of AI chatbots similar to ChatGPT—the bot couldn’t come up with any unique and new ideas.

#4. It’s hallucinating

While the improvements are stacking up and creating excitement for a generation of users to try out the new version and how it can benefit them, earlier problems still exist.

One of the key problems that the earlier version had was that it was making stuff up—or more commonly defined as ‘hallucinating.’

Unfortunately, this isn’t a problem that’s localised to ChatGPT, it’s a problem that haunts all the leading chatbots simply because they do not have the ability to discern between what’s true and what’s false.

As a result, they may generate content that’s based on false information.

Users that rely on the accuracy of their information—especially when it comes to using data for their decision-making process—should take a moment to verify any information that’s being provided by the chatbot to ensure that there’s no risk of including false information.

An example of the chatbots’ tendency to hallucinate was documented when it was asked for addresses of websites that had the latest information on cancer research and at times it generated addresses that simply did not exist causing concern over the reliability of the information.

#5. It can reason but has limits

When asked a simple question: Imagine an infinitely wide doorway. What is more likely to fit through it, a car or a military tank? The chatbot was able to answer the question appropriately, however, the answer failed to take into consideration the height of the doorway which could have changed the answer that the bot gave.

OpenAI’s CEO has stated that the new chatbot could reason ‘a little bit.’ However, when it comes to reasoning skills the chatbot breaks down in many scenarios with the previous version of GPT handling the questions a little better and taking aspects like width and height into consideration.

GPT-4 is impressive, but users should be vigilant

The upgrades and improvements that GPT-4 has undergone are impressive and do most of the issues that critics have identified over the last few months.

However, even with the assurances that GPT-4 is better than its predecessor, users must use the chatbot with caution and do their own due diligence before accepting the information it provides as accurate and reliable.

Need marketing help?

5 reasons why GPT-4 is impressive but won’t overcome the flaws of its predecessor