Skip to main content

Everything We Know About GPT-4

Discover what we know so far about GPT-4, including our assumptions and predictions based on AI trends and info from OpenAI.
Oct 2022  · 8 min read

What We Know About GPT-4

We live in extraordinary times where you see the launch of a new type of model that completely changes the AI space. In July 2022, OpenAI launched DALLE2, a state-of-the-art text-to-image model. And, after a few weeks, Stability.AI launched an open-source version of DALLE-2 called Stable Diffusion. Both of these models are popular and have shown promising results in terms of quality and ability to understand the prompt. 

Recently, OpenAI has introduced an Automatic Speech Recognition (ASR) model called Whisper. It has outperformed all other models in terms of robustness and accuracy.

Looking at the trend, we can assume that OpenAI is going to launch GPT-4 in the upcoming months. There is high demand for large language models in the market, and the popularity of GPT-3 has already proven that people are expecting better accuracy, compute optimization, lower biases, and improved safety from GPT-4. 

Even though OpenAI is quiet about the launch or features, in this post, we will make some assumptions and predictions about GPT-4 based on AI trends and the information provided by OpenAI. Furthermore, we will be learning about large language models and their applications.

What is GPT?

Generative Pre-trained Transformer (GPT) is a text generation deep learning model trained on the data available on the internet. It is used for question & answers, text summarization, machine translation, classification, code generation, and conversation AI. 

You can learn how to build your deep learning model by taking Deep Learning in Python skill track. You will explore the fundamentals of deep learning, gain an introduction to Tensorflow and Keras frameworks, and build multiple input and output models using Keras. 

There are endless applications for GPT models, and you can even fine-tune them on specific data to create even better results. By using transformers, you will be saving costs on computing, time, and other resources. 

Before GPT

Before GPT-1, most Natural Language Processing (NLP) models were trained for particular tasks like classification, translation, etc. They all were using supervised learning. This type of learning comes with two issues: lack of annotated data and failure to generalize tasks.

GPT-1

Transformer architecture

Transformer architecture | GPT-1 Paper

GPT-1 (117M parameters) paper (Improving Language Understanding by Generative Pre-Training) was published in 2018. It has proposed a generative language model that was trained on unlabeled data and fine-tuned on specific downstream tasks such as classification and sentiment analysis. 

GPT-2

Model performance on various tasks

Model performance on various tasks | GPT-2 paper

GPT-2 (1.5B parameters) paper (Language Models are Unsupervised Multitask Learners) was published in 2019. It was trained on a larger dataset with more model parameters to build an even more powerful language model. GPT-2 uses task conditioning, Zero-Shot Learning, and Zero Short Task Transfer to improve model performance.

GPT-3 

Results on three Open-Domain QA tasks

Results on three Open-Domain QA tasks | GPT-3 paper

GPT-3 (175B parameters) paper (Language Models are Few-Shot Learners) was published in 2020. The model has 100 times more parameters than GPT-2. It was trained on an even larger dataset to achieve good results on downstream tasks. It has surprised the world with human-like story writing, SQL queries and Python scripts, language translation, and summarization. It has achieved a state-of-the-art result using In-context learning, few-shot, one-shot, and zero-shot settings.

You can learn more about GPT-3, its uses, and how to get started using it in a separate article. 

What's New in GPT-4?

In a question-answer session at the AC10 online meetup, Sam Altman, the CEO of OpenAI, confirmed the rumors about the launch of the GPT-4 model. In this section, we will be using that information and combining it with current trends to predict the model size, optimal parameter and compute, multimodality, sparsity, and performance. 

Model Size

According to Altman, GPT-4 won’t be much bigger than GPT-3. So, we can assume that it will have around 175B-280B parameters, similar to Deepmind’s language model Gopher.

The large model Megatron NLG is three times larger than GPT-3 with 530B parameters and did not exceed in performance. The smaller model that came after it reached higher performance levels. In simple words, a large size does not mean higher performance. 

Altman said that they are focusing on making smaller models perform better. The large language models required a large dataset, massive computing resources, and complex implementation. Even deploying large models becomes cost ineffective for various companies.  

Optimal parameterization

Large models are mostly under-optimized. It is expensive to train the model, and companies have to make a trade between accuracy and cost. For example, GPT-3 was trained only once, despite errors. Due to unaffordable costs, researchers could not perform hyperparameter optimization. 

Microsoft and OpenAI have proved that GPT-3 could be improved if they have trained it on optimal hyperparameters. In the findings, they have discovered that a 6.7B GPT-3 model with optimized hyperparameters has increased the performance as much as a 13B GPT-3 model. 

They have discovered new parameterization (μP) that the best hyperparameters for the smaller models are the same as the best for the larger ones with the same architecture. It has allowed researchers to optimize large models at a fraction of the cost. 

Optimal compute

DeepMind has recently discovered that the number of training tokens influences the model performance as much as the size. They have proved it by training Chinchilla a 70B model that is four times smaller than Gopher and four times more data than large language models since GPT-3

We can safely assume, for a compute-optimal model, OpenAI will increase training tokens by 5 trillion. It means that it will take 10-20X FLOPs than GPT-3 to train the model and reach minimal loss. 

GPT-4 will be a text-only model

During the Q&A, Altman said that the GPT-4 won’t be multimodal like DALL-E. It will be a text-only model.

Why is that? Good multimodal is hard to build compared to language only or vision only. Combining textual and visual information is a challenging task. It also means that they have to provide better performance than GPT-3 and DALL-E 2.  

So, we won’t be expecting anything fancy in GPT-4.

Sparsity

Sparse models use conditional computation to reduce computing costs. The model can easily scale beyond 1 Trillion parameters without incurring high computing costs. It will help us train large language models on lower resources. 

But GPT-4 won’t be using sparse models. Why? In the past, OpenAI has always relied on dense language models, and they won't increase the size of the model.   

AI alignment

The GPT-4 will be more aligned than GPT-3. OpenAI is struggling with AI alignment. They want language models to follow our intention and adhere to our values. 

They have taken the first step by training InstructGPT. It is a GPT-3 model trained on human feedback to follow instructions. The model was perceived to be better than GPT-3 by human judges. Regardless of language benchmarks. 

GPT-4 release date

The GPT-4 release date is still unconfirmed, and we can assume that the company is focusing more on other technology like text-to-image and speech recognition. So, you might see it next year or next month. We can’t be sure. What we can be sure of is that the next version will solve the problem of the older version and present better results. 

Conclusion

GPT-4 will be a text-only large language model with better performance on a similar size as GPT-3. It will also be more aligned with human commands and values. 

You might hear conflicting news on GPT-4 consisting of 100 trillion parameters and only focusing on code generation. But they all are speculation at this point. There is so much more that we don’t know about, and OpenAI has not revealed anything concrete about the launch date, model architecture, size, and dataset. 

Just like GPT-3, GPT-4 will be used for various language applications such as code generation, text summarization, language translation, classification, chatbot, and grammar correction. The new version of the model will be more secure, less biased, more accurate, and more aligned. It will also be cost-efficient and robust.  

You can read GPT-3 and the Next Generation of AI-Powered Services to learn more about GPT-3 applications. 

Reference

Advanced Deep Learning with Keras

Beginner
4 hours
26,416
Build multiple-input and multiple-output deep learning models using Keras.
See DetailsRight Arrow
Start Course

Deep Learning with PyTorch

Beginner
4 hours
22,243
Learn to create deep learning models with the PyTorch library.

Natural Language Generation in Python

Beginner
4 hours
4,069
Imitate Shakespear, translate language and autocomplete sentences using Deep Learning in Python.
See all coursesRight Arrow
Related

5 More Things Business Leaders Need to Know About Machine Learning

Dive deeper into what you need to know about machine learning to sustainably grow your data function and your company at large.
Hugo Bowne-Anderson's photo

Hugo Bowne-Anderson

7 min

Stock Market Predictions with LSTM in Python

Discover Long Short-Term Memory (LSTM) networks in Python and how you can use them to make stock market predictions!
Thushan Ganegedara's photo

Thushan Ganegedara

30 min

Demystifying Mathematical Concepts for Deep Learning

Explore basic math concepts for data science and deep learning such as scalar and vector, determinant, singular value decomposition, and more.
Avinash Navlani's photo

Avinash Navlani

11 min

Deduce the Number of Layers and Neurons for ANN

There is an optimal number of hidden layers and neurons for an artificial neural network (ANN). This tutorial discusses a simple approach for determining the optimal numbers for layers and neurons for ANN's.
Ahmed Gad's photo

Ahmed Gad

9 min

Demystifying Generative Adversarial Nets (GANs)

Learn what Generative Adversarial Networks are without going into the details of the math and code a simple GAN that can create digits!
DataCamp Team's photo

DataCamp Team

9 min

Natural Language Processing Tutorial

Learn what natural language processing (NLP) is and discover its real-world application, using Google BERT to process text datasets.
DataCamp Team's photo

DataCamp Team

13 min

See MoreSee More