The good news: GPT-4 is here! The bad news: It doesn’t quite live up to the hype.
The versions of GPT-4 currently available to the public are refined and improved versions of their predecessors, sure. But the much-touted multimodal capabilities are more limited than was widely expected. And even the ability of users to upload visuals for various reasons is not quite ready for public roll-out. Disappointingly to many as well, OpenAI is keeping mum on the specifics of GPT’s size and training data.
What is GPT-4?
GPT-4, short for Generative Pre-training Transformer 4, is the latest of OpenAI’s AI language models. (A variation, GTP-4-32K, is being rolled out separately, but for the sake of simplicity, we will refer to both as GTP-4.)
GPT-4 follows in the footsteps of GPT-3.5, the technology behind the now-famous ChatGPT.
"Generative" refers to the fact that GPT models can produce human-like text. It does this by predicting the next word in a sequence of words. "Pre-training" refers to the fact that GPT models are first trained to understand language. Afterward, they receive more specialized training on tasks like answering questions. Finally, "Transformer" refers to the type of neural network architecture used in GPT models.
How was it developed?
GPT was originally described in a research paper in 2018. GPT-2 followed in 2019, and GPT-3 in 2020. But it was with the launch of ChatGPT in late 2022 that this technology really entered the public sphere. ChatGPT is a freely-accessible chatbot with many business use cases.
The GPT-4 release date was 14 March 2023. Like previous models, GPT-4 works by predicting the next word in a sequence. It was trained on a huge amount of text data from the internet, from which it learned to recognize and emulate statistical patterns.
What do we know about the earlier GPT models?
All the models were based on the Transformer architecture. This is a type of neural network designed for natural language processing tasks. The models were trained on a large amount of text data using so-called “unsupervised” pre-training. This involves training the model on a large corpus of text to learn a general representation of language. It is “unsupervised” because the program figures out patterns and relationships in the text on its own.
Steerability
In the context of machine learning, steerability refers to the ability to control or steer the output of a model in a specific direction. This is done by manipulating certain input parameters (see section below).
OpenAI strongly emphasizes steerability in its research and development of AI models. Steerability is a key factor in creating AI systems that are flexible and adaptable to real-world applications. It enables developers or users to control and manipulate the outputs of AI models. This can enhance their interpretability, fairness, and robustness.
The development of steerability in GPT has been a gradual process that has evolved with the different versions of the model. In GPT-1, the first version, the focus was mainly on language modeling. The model did not have any explicit control mechanisms for steering the output toward a specific task.
With the release of GPT-2, there was a significant improvement in the quality of the generated text. The model introduced a few control mechanisms that allowed for some level of steerability.
GPT-3 introduced a more comprehensive set of control mechanisms. This includes, for example, the ability to control the generation length and temperature value (see section below).
Parameters
"Parameters" are numerical values that determine how the network processes and generates text. They are learned by the neural network during training.
The parameters can be thought of as the settings or knobs that control how the GPT model works. They are the numbers that the computer adjusts and fine-tunes during training to get better at understanding and generating text.
These parameters are what make the GPT models so powerful and versatile. By tweaking these numbers, we can adjust how the model generates text. For example, we can tweak the length and complexity of sentences, the style and tone of language, the topics it focuses on, and so on.
One example of a parameter in a GPT language model could be the value assigned to the "temperature" parameter. This controls the randomness and creativity of the model's text generation. A higher temperature value will result in more unpredictable and diverse output. A lower value will generate input that is more conservative and predictable.
- GPT-1: The first version of the GPT model was released by OpenAI in 2018. GPT-1 had 117 million parameters.
- GPT-2: Released in 2019, GPT-2 was a larger and more powerful version of the GPT model, with 1.5 billion parameters.
- GPT-2 "small": OpenAI also released a smaller version of GPT-2, which had only 117 million parameters, the same as GPT-1. This smaller model was designed to be more accessible and efficient than the larger version.
- GPT-3: After GPT-2 “small”, OpenAI released the GPT-3 model in 2020. It had 175 billion parameters. GPT-3 introduced several new features and capabilities, including dynamic control of context length, pattern-based sparse attention, and few-shot learning. Due to its larger size and improved architecture, GPT-3 has achieved state-of-the-art performance on several natural language processing benchmarks and tasks. OpenAI released several variations of the GPT-3 model, including GPT-3 "small" (125 million parameters), GPT-3 "medium" (350 million parameters), GPT-3 "large" (760 million parameters), and GPT-3 "extra large" (1.3 billion parameters).
Dynamic context control
While the temperature value is used to control the randomness and creativity of the generated text, dynamic context control is used to control the relevance and coherence of the generated text with respect to the previous context.
The context length refers to the number of preceding words, or “tokens”, that the model considers when generating the next word or token in the sequence.
- GPT-1: The first version of the GPT model used a fixed context length of 1024 tokens. This means it considered the previous 1024 words before generating the next word or token in the sequence.
- GPT-2: The second version of the GPT model introduced the concept of dynamic control of context length. It can adjust the context length based on the input prompt and the desired length of the generated text. This way, the model can adapt its output to better suit the prompt or task at hand.
- GPT-3: The third version of the GPT model further improved the dynamic control of context length. It introduced new sampling methods that allow for more precise control over the amount of context used during text generation. GPT-3 could generate text up to 2048 tokens long, but could also produce shorter text outputs by adjusting the context length dynamically.
So, what about GPT-4?
We know that, like GPT-3, the newest model can produce text that is indistinguishable from that produced by a human. It can also summarize, complete or translate text, and it can write poetry, prose, or lyrics.
We were told that GPT-4 would be the first model to accept both textual and visual input, although it would still only provide textual output. But users were disappointed to realize after the launch that they are not able to provide visual input just yet. This feature is currently in the research preview stage. In an update on the OpenAI website posted on 15 March, Joshua J. wrote, “We aren’t offering [this] as a service right now. We’re happy to hear that you’re excited about our services and when we have anything to release, we’ll announce this to the community.”
How is it different from the previous models?
More than 50 experts were used to test GPT-4, ensure it refuses dangerous requests, and that it handles sensitive subjects better.
As a bigger and improved model, GPT-4 is, predictably, better able to handle nuanced instructions. Some sensitive topics, such as medical advice, are handled better, too.
OpenAI also put GPT-4 through its paces by having it write several human exams. Examples were the Uniform Bar Exam and LSAT. The feedback was that it performed better than any other large language model created so far.
The standard GPT-4 model will offer 8,000 tokens for the context. GPT-4-32k, an extended 32,000 token context-length model, will be rolled out separately. Among other things, this means that GPT-4 can now generate longer responses. That said, initial user reports seem less than impressed with GPT’s attempts at producing long-form content.
Considering steerability, OpenAI explained on their website that developers will now be better able to prescribe their AI’s style and tone – if GPT-4 is used to power another chatbot. With GPT-3.5, users were already able to specify a certain style or tone, for example, ”Please respond in the way an angry human might respond if I asked them..”
What is different, is that the model can now distinguish between user and system input. What this means is that someone creating a new chatbot powered by GPT-4 would be able to specify the style and tone beforehand. Users won’t be able to override this by, for example, specifying a different tone. This adds a layer of security.
Earlier, the model did not distinguish between user and system input. It handled all text input equally. Now text messages have labels identifying who sent them. When user prompts conflict with system prompts, the model is now programmed to ignore the user prompts.
As for parameters, however, OpenAI has decided to keep the size of GPT-4 undisclosed. When asked by MIT Technology Review for details, the answer was telling: “It’s pretty competitive out there.” The new gold rush in technology is here, and it is clear that OpenAI are keeping their cards close to their chest.
They have also not revealed any information about how GPT-4 was developed, including details about the data, computing power, or training techniques used. This has led to widespread criticism that OpenAI has now become…well, closed.
A remaining concern
According to OpenAI, the newest model still sometimes fabricates information, or “hallucinates”, as they put it. Initial attempts to use ChatGPT in journalism have been surrounded by controversy due to this tendency. Checking content and code for errors remains important.
Who is using GPT-4, and how can you try it?
According to OpenAI, the following companies have already integrated GPT-4 into their products: Duolingo, Stripe, and Khan Academy. (Khan Academy has introduced Khanmigo, a chatbot “tutor” powered by GPT-4.)
It is also powering the new Bing search engine (and apparently has been doing so from the start). So, if you are simply waiting to chat with GPT-4, Bing is one way to access it for free.
As for ChatGPT, unfortunately the free version doesn’t run on this version of the model yet. It is still unclear whether this will ever be the case. However, the paid version (ChatGPT Plus) has the option to switch between GPT-4 and earlier default and legacy models.
As for the GPT-4 API, there is a waiting list.
Will there be a GPT-5?
Probably. OpenAI have not indicated that they are capping their efforts with the release of GPT-4. But they have also not released any details on a possible GPT-5 just yet. Nextbigfuture.com speculates that we could expect GPT-5 at the end of 2024 or in 2025.
In short
That concludes my AI-related post for the week. Gotta go! Overall, GPT-4 is good news for AI language models, but for many users, it's not as impressive as expected. It's better than previous versions, but its multimodal abilities are more limited than anticipated, and uploading visuals is not ready for public use. OpenAI hasn't provided specific details about GPT-4's size and training data. However, the model's development has been gradual and has improved with each version. It will be exciting to see how GPT will keep disrupting the world of content creation and customer service for the better.








Comments
Post a Comment