By Mayank Chhaya-
A Microsoft Research paper says a new version of its Artificial Intelligence (AI) is “strikingly close to human-level performance”, signaling that Artificial General Intelligence (AGI) may be at hand.
The paper dated April 13, 2023, titled “Sparks of Artificial General Intelligence: Early experiments with GPT-4” makes an extraordinary claim that is fraught with profound consequences for humanity, including perhaps even threatening its existence.
“We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system,” the paper by a 14-member team led by Dr. Sébastien Bubeck says.
This is the first major claim that AI, which has so far been fairly rudimentary and nowhere close to being even remotely human-like, could be crossing over to the much-coveted and much-feared AGI.
The paper is being widely discussed among the machine learning community, many of whom say that Microsoft may be extravagant in their thinking that some form of AGI may already have been achieved.
“We have presented our initial exploration of GPT-4 across a wide range of tasks and domains, providing supporting evidence to the claim that GPT-4’s abilities are comparable to human-level for many of them. This conclusion is consistent with the findings by OpenAI presented in [Ope23]. A primary goal of our experiments is to give a preliminary assessment of GPT-4’s intelligence, which is an arduous task given the lack of formal definition for this concept, especially for artificial systems. We hope that our exploration provides a useful and necessary first step to appreciate the remarkable capabilities and challenges of GPT-4, and that it opens up new opportunities for developing more formal and comprehensive methods for testing and analyzing future AI systems with such broad intelligence,” the paper says.
It also says, “the machine learning community needs to move beyond classical benchmarking via structured datasets and tasks, and that the evaluation of the capabilities and cognitive abilities of those new models have become much closer in essence to the task of evaluating those of a human rather than those of a narrow AI model.”
This is a startling breakthrough if the machine learning community is able to not just replicate some of the same results but even go beyond what Microsoft appears to have achieved.
Clearly, Microsoft researchers expect the AI crossover to AGI in fairly short-term. It is a measure of how seriously this is being taken that the paper suggests specific improvements to GPT-4 as well as large language models (LLMs), the latter being equipped to analyze digital text, books and other text-based resources. Under the title “Confidence calibration”, the paper says, “The model has trouble knowing when it should be confident and when it is just guessing. It both makes up facts that have not appeared in its training data, and also exhibits inconsistencies between the generated content and the prompt, which we referred to as open-domain and closed-domain hallucination.”
Under “Long-term memory” the paper says, “The model’s context is very limited, it operates in a “stateless” fashion and there is no obvious way to teach the model new facts. In fact, it is not even clear whether the model is able to perform tasks which require an evolving memory and context, such as reading a book, with the task of following the plot and understanding references to prior chapters over the course of reading.”
It also calls for the model to engage in “continual learning” where it says, “The model lacks the ability to update itself or adapt to a changing environment. The model is fixed once it is trained, and there is no mechanism for incorporating new information or feedback from the user or the world. One can fine-tune the model on new data, but this can cause degradation of performance or overfitting. Given the potential lag between cycles of training, the system will often be out of date when it comes to events, information, and knowledge that came into being after the latest cycle of training.”
In a worrisome display under “Transparency, interpretability and consistency”, the paper says, “Not only does the model hallucinate, make up facts and produce inconsistent content, but it seems that the model has no way of verifying whether or not the content that it produces is consistent with the training data, or whether it’s self-consistent.” It also says, “Since the model does not have a clear sense of its own limitations it makes it hard to establish trust or collaboration with the user without extensive experimentation in a narrow domain.”
One of its most important observations is under the category “Cognitive fallacies and irrationality”. It specifically addresses the serious challenge of human prejudices and biases that AGI could develop while drawing from the training data. “The model seems to exhibit some of the limitations of human knowledge and reasoning, such as cognitive biases and irrationality (such as biases of confirmation, anchoring, and base-rate neglect) and statistical fallacies. The model may inherit some of the biases, prejudices, or errors that are present in its training data, which may reflect the distribution of opinions or perspectives linked to subsets of the population or larger common views and assessments,” it says.