Age of with GPT-3 has been saved
Age of with GPT-3
What is GPT-3?
GPT-3 stands for Generative Pre-trained Transformer 3, and I’m going to unpick these one-at-a-time:
- G: if you’re an English-speaking person listening to this, chances are that when I stop speaking mid-sentence, you’ll be able to predict which word comes [next]. Our brains are very good at filling in the [blanks]. I can even [leave out] multiple words mid-sentence, and you can still make a good prediction for what should fill the gap… A system which suggests new words in order to complete a prompt is performing what’s known as language generation (think of text auto-completion as an example).
- So let’s say I take an AI model, and I feed it real text from books, Wikipedia and the internet in general. The model practices filling in the blanks, and becomes trained to do what machine learning models do best – to recognise the correlations and patterns present across these hundreds of billions of words. Because we are going to use this model to do interesting things later, we say that the model has been pre-trained by this process.
- T: And what type of model did we train on this generation task? Put simply, a Transformer (which is the T) is the specific AI model architecture which we, or rather the OpenAI team, trained on the text data. To confirm that I know what I’m talking about - I asked GPT-3 “How does a Transformer model work in simple terms?” and it replied saying “The Transformer model identifies word dependencies, and then generates the most likely sequence of predictions”, which sounds pretty accurate to me! Interestingly, I actually asked it the same question again by accident (in the same conversation), and it replied “Look, I'll try to keep this simple. The Transformer model identifies the word dependencies, and makes the most likely sequence of predictions”. I felt like I could sense its annoyance coming through in the style which it responded!
- 3: So, on to the number 3, this represents the fact that it’s the third in the GPT series produced by OpenAI, where it’s fair to say that their experiments have essentially been to see how big they can make a language model. GPT-3 built upon GPT-2 and the original GPT, adding successive orders of magnitude of data, computation and model parameters. This is actually part of a trend in Natural Language Processing (NLP) models where, generally speaking, bigger models perform better. GPT-3 has learned from so much real text data, that not only can it answer questions (like ‘what is a Transformer’) but it can hold conversations and do lots of other interesting language-based tasks.