Skip to main content

Choose your Champion! Task-Specific vs. General Models

Should AI models be like Swiss Army knives, versatile and handy in a variety of scenarios? Or do we prefer them as precision tools, finely tuned for specific tasks? In the world of artificial intelligence, and natural language processing specifically, this is an ongoing debate. The question boils down to whether models trained for specific tasks are more effective at these tasks than general models. Task-specific models: specialization and customization In my last blog post , we looked at the rise of personalized LLMs, customized for specific users. Personalized LLMs can be seen as an extreme form of task-specific model. Fans of task-specific models stress that these kinds of models are better suited for tasks involving confidential or proprietary data. This is obviously true. But some people also believe that specialized models necessarily perform better in their specific domains. It may sound logical, but the ans...

Awaiting the Shoggoth: Why AI Emergence is Uncertain – for Now

A robot hand, reaching from the depths

“It is absolutely necessary, for the peace and safety of mankind, that some of earth’s dark, dead corners and unplumbed depths be let alone; lest sleeping abnormalities wake to resurgent life, and blasphemously surviving nightmares squirm and splash out of their black lairs to newer and wider conquests.”
― H.P. Lovecraft, At the Mountains of Madness

Horror fans might be familiar with author H.P. Lovecraft's fictional “shoggoths”, the shape-shifting and amorphous entities that he wrote about in his Cthulhu Mythos.

In the context of AI emergence, the term "shoggoth" is sometimes used to refer to a possible futuristic advanced form of artificial intelligence. It highlights the idea of an AI system that can rapidly learn, evolve, and assimilate new information and skills, much like how Lovecraft's shoggoths can change their forms and abilities.

Much has been made of so-called emergent abilities in AI. These are skills that are observed to arise unexpectedly and unpredictably within AI systems – like a shoggoth, rising from the depths.

Over the past year, there has been a growing focus on the idea of emergent abilities. Intelligent machines continued acquiring new skills, while, at the same time, their inner workings became less transparent and progressively more difficult for us to understand.

Recently, however, a new paper by Stanford researchers challenged the nature of currently observed emergent abilities in large language models (LLMs, such as GPT-3, PaLM, and LaMDA).

A small plant, growing through the cracks

What are 'emergent abilities'?

A previous study defined emergence as abilities that are absent in smaller models but become present in larger models. The implication is that a machine learning model's performance will remain random until it reaches a specific size threshold. Then, it is expected to experience a sudden leap of improvement.

Experts have cautioned that sudden unexpected advancements of this kind would be a matter of concern. It would mean we could lose control of the AI system in question. At the very least, a sudden and unpredictable emergence of capabilities could lead to deception or malice in the models.

Then, AI researchers and industry leaders started claiming that some of the current LLMs have been unexpectedly exhibiting skills or knowledge beyond their intended programming…

Stanford University

The Stanford research

But the Stanford researchers says the observed emergent abilities are not genuine (yet, perhaps). Rather, they are a product of biased testing, cherry-picked examples, a lack of sufficient data, and the use of the wrong metrics to measure performance. They suggest that the choice of a "non-linear" measurement can create the appearance of sudden changes in performance, when in fact the improvement is gradual. "Linear'' metrics demonstrate more predictable progress.

This study is significant as it challenges the idea that emergent abilities will necessarily be an inherent characteristic of scaling AI models at all.

But while the researchers clarify that they are not saying large language models are incapable of demonstrating such emergent abilities, they emphasize that the emergent abilities reported in LLMs so far "are likely to be illusory".

While this research focused on GPT-3, they compared their findings to previous papers on the same model family.

Amorphous origin goo, such as that which may give rise to a shoggoth

Why the expectation of emergence at all?

Where did we get this expectation of emergent properties in LLMs in the first place? The Stanford paper explains that emergence, the manifestation of new properties as a complex system becomes more intricate, has been extensively studied across various disciplines. These include physics, biology, and mathematics. The authors cite P.W. Anderson's seminal work "More Is Different" (1972). Anderson claimed that with increasing complexity, unforeseen properties may arise that cannot be easily predicted, even with a precise understanding of the system's microscopic details.

The contrasting philosophy to this is known as reductionism. According to this viewpoint, the behavior and properties of complex systems can be explained and predicted solely by understanding the interactions and behaviors of their individual components.

Ants Walking Along Metal Wire

Why ‘More is Different’

Anderson challenged the reductionist hypothesis. In his paper, Anderson proposed that emergent properties and behaviors are qualitatively different from the properties of a system's individual components. In other words, the whole system exhibits properties that cannot be explained by simply studying its individual parts.

Anderson uses examples from various scientific disciplines, such as solid-state physics, to illustrate his argument. He suggested that at certain levels of complexity, new phenomena arise that are not evident or predictable based solely on an understanding of the microscopic details. These emergent properties require a holistic perspective to be fully understood.

Various natural phenomena can be seen to produce such emergent properties. Examples include ants collaborating to build a bridge, birds flying in synchronized patterns, and the alignment of electrons generating magnetic properties. In physics, emergence plays a crucial role in understanding phenomena such as phase transitions, self-organization, and the behavior of complex materials. This collective behavior cannot be solely deduced from the behavior of individual components.

Through the study of emergence, scientists aim to unravel the fundamental principles underlying complex behaviors and structures across various scales.

Sand blowing around, patterns emerging

Emergence as it stands

These days, emergence is a widely recognized and studied phenomenon across various disciplines. It has gained significant attention and relevance, particularly in fields such as physics, biology, complex systems, and, of course, AI. Researchers and scientists are actively investigating emergence to better understand how complex systems exhibit novel properties and behaviors that cannot be easily predicted or explained by analyzing their individual components.

As AI systems, including large language models (LLMs), become more complex and sophisticated, researchers will keep exploring how emergent properties and capabilities may arise.

In the meantime, perhaps, the shoggoth lies waiting…

Comments

Popular posts from this blog

Why the Bots Hallucinate – and Why It's Not an Easy Fix

It’s a common lament: “I asked ChatGPT for scientific references, and it returned the names of non-existent papers.” How and why does this happen? Why would large language models (LLMs) such as ChatGPT create fake information rather than admitting they don’t know the answer? And why is this such a complex problem to solve? LLMs are an increasingly common presence in our digital lives. (Less sophisticated chatbots do exist, but for simplification, I’ll refer to LLMs as “chatbots” in the rest of the post.) These AI-driven entities rely on complex algorithms to generate responses based on their training data. In this blog post, we will explore the world of chatbot responses and their constraints. Hopefully, this will shed some light on why they sometimes "hallucinate." How do chatbots work? Chatbots such as ChatGPT are designed to engage in conversational interactions with users. They are trained on large ...

Chatbots for Lead Generation: How to harness AI to capture leads

What is lead generation? Lead generation is the process of identifying and cultivating potential customers or clients. A “lead” is a potential customer who has shown some interest in your product or service. The idea is to turn leads into customers. Businesses generate leads through marketing efforts like email campaigns or social media ads. Once you have identified one, your business can follow up with them. You can provide information, answer questions, and convert them into a customer. The use of chatbots for lead generation has become popular over the last decade. But recent advancements in artificial intelligence (AI) mean chatbots have become even more effective. This post will explore artificial intelligence lead generation: its uses and methods. We’ll specifically look at a chatbot that has been drawing a lot of attention: ChatGPT . What is ChatGPT? ChatGPT is a so-called “large language model.” This type of artificial intelligence system ...

Liquid Networks: Unleashing the Potential of Continuous Time AI in Machine Learning

In the ever-expanding realm of Artificial Intelligence (AI), a surprising source has led to a new solution. MIT researchers, seeking innovation, found inspiration in an unlikely place: the neural network of a simple worm. This led to the creation of so-called "liquid neural networks," an approach now poised to transform the AI landscape. Artificial Intelligence (AI) holds tremendous potential across various fields, including healthcare, finance, and education. However, the technology faces various challenges. Liquid networks provide answers to many of these. These liquid neural networks have the ability to adapt and learn from new data inputs beyond their initial training phase. This has significant potential for various applications, especially in dynamic and real-time environments like medical diagnosis and autonomous driving. The strengths of scaling traditional neural networks While traditional n...