Choose your Champion! Task-Specific vs. General Models

A big, intimidating robot and a smol one

Should AI models be like Swiss Army knives, versatile and handy in a variety of scenarios? Or do we prefer them as precision tools, finely tuned for specific tasks? In the world of artificial intelligence, and natural language processing specifically, this is an ongoing debate. The question boils down to whether models trained for specific tasks are more effective at these tasks than general models.

Black and white photo of a person holding a pocket knife

Task-specific models: specialization and customization

In my last blog post, we looked at the rise of personalized LLMs, customized for specific users. Personalized LLMs can be seen as an extreme form of task-specific model. Fans of task-specific models stress that these kinds of models are better suited for tasks involving confidential or proprietary data. This is obviously true.

But some people also believe that specialized models necessarily perform better in their specific domains. It may sound logical, but the answer might not be that straightforward. All things being equal, after all, why go to the trouble of fine-tuning a model for a specific use-case – if you don’t have to?

‘Few-shot’ learning

In a 2020 paper titled "Language Models are Few-Shot Learners", Tom B. Brown and his co-authors explain that models trained in a way that is not specific to any particular task are still able to generalize effectively.

The paper admits that using a two-step process – pre-training on a big set of text and then refining for a specific task – results in improvements for many NLP tasks. This method, however, requires specific training datasets with thousands of examples.

In traditional machine learning approaches, models often require large amounts of labeled data to achieve high performance. However, in practice, acquiring such data can be expensive, time-consuming, or simply impractical.

The point here is that, perhaps, there is another way – equally effective and less cumbersome. The answer might be to supersize.

Supersize me: The power of larger models

The title of the paper mentions the term “few-shot” learning. This is where a model is trained to make accurate predictions or perform tasks using a very small amount of labeled training data.

Few-shot learning enables models to learn and generalize from a small number of examples (i.e., a "few shots"). These examples may include just a handful of labeled data points, or even a single example per class or task. The goal is for the model to make accurate predictions or perform tasks based on this limited exposure to training data.

How can this work?

The paper notes that increasing the size of language models has been proven to significantly boost their ability to perform well with minimal task-specific guidance.

In other words, while it may seem counterintuitive, larger language models with more parameters have demonstrated the capacity to excel in tasks with very limited training data.

As an example, the paper’s authors looked at GPT-3, the precursor to the current ChatGPT. GPT-3 had 175 billion parameters.

According to the paper, GPT-3 showed strong performance across various NLP tasks provided with only a few examples and without the need for further adjustments or training updates. These tasks ranged from translation and question-answering to word unscrambling and using new words in sentences.

Other examples: PaLM and Flamingo

Another paper, “PaLM: Scaling Language Modeling with Pathways” (2022) by Aakanksha Chowdhery et al. documents how the Pathways Language Model (PaLM), with 540 billion parameters, showed exceptional performance across diverse natural language tasks through few-shot learning.

This paper emphasizes, again, that the scale of such models appears to be a critical factor in understanding their efficacy.

Results were achieved “on hundreds of language understanding and generation benchmarks.” Specifically, PaLM excelled in complex multi-step reasoning tasks. It also performed very well in multilingual tasks and source code generation.

Similarly, the paper "Flamingo: a Visual Language Model for Few-Shot Learning" (2022) by Jean-Baptiste Alayrac et.al. looked at the Visual Language Model Flamingo.

Flamingo is a type of Visual Language Model that's really good at quickly learning new tasks with just a few examples. It can handle both text and images or videos and learn from a mix of them. This makes it very versatile.

The paper explains that Flamingo was trained on large datasets from the web that have mixed text and images. This helped it learn how to understand things in context. It was tested on various tasks, like answering questions about pictures, describing scenes, and multiple-choice questions. In these, Flamingo was found to perform better than models that needed a lot more task-specific training data.

The researchers found that the larger the model, the better the few-shot performance, “similar to GPT-3”.

Industry concerns and challenges

One of the arguments in favor of task-specific models has nothing to do with capability. Rather, the fear is that the future AI landscape might be controlled by a small number of companies, each with a large, general AI model (much as is currently the case).

Another argument is that while user feedback is crucial for both general and specialized AI, the level of control and fine-tuning required for effective incorporation of this feedback is more manageable in specialized AI systems.

However, the push for specialized AI isn't without its own concerns. For one thing, having many specialized models could get unwieldy. Having many, unrelated specialized models can be tough to deploy and manage. It's easier and cheaper to manage one big, general system than to fit together and organize lots of specific ones.

Future perspective

Task-specific models and general models both have their respective strengths and use cases. It remains to be seen which approach will dominate our future AI landscape. Perhaps the rapid evolution of AI will lead to novel approaches that combine the best of both worlds.

The Machine Mindset

Search This Blog