Computers with Common Sense – a Holy Grail

Spike Narayan-

Spike Narayan

Spike Narayan is a seasoned hi-tech executive managing exploratory research in science and technology at IBM.


The world of  Artificial Intelligence (AI) is upon us, and it is impossible to not hear about it every single day in a multitude of contexts be it start-ups or medical use cases or even mobile apps. But very few of these AI applications are really “intelligent”. They are mostly applications trained on data in a narrow field of use and can be effectively used within that space.  So, when can we see truly intelligent applications? In reality, it will likely be a continuum but there are some very stark differences between how today’s AI engines work and what, I believe, is needed to achieve general intelligence or what we would call common sense which is lacking in any compute engine or algorithm of today. Let us explore this quest for a more human-like intelligence and the challenges this presents.

Even as little as two years ago, any use case that merely massaged big data or helped visualize large amounts data were touted as using AI. This is partly due to the history of AI. As we know, the field of AI went through period of what is referred to as the “AI winter” and reawakened 15 or so years ago fueled by the availability of relatively cheap and abundant compute power, access to massive amounts of data (big data) that machines could train on and new powerful algorithms. Since big data, in many ways, enabled this rebirth of AI, all big data uses were loosely referred to as AI infused. Many of the applications that sought to see patterns in data use an architecture that goes by the name – neural network which we will discuss later in this article. A couple of important things have changed in the last two years though. AI applications have escaped from university laboratories globally and industrial use cases are seeing increased adoption in the marketplace. More importantly, the many different flavors of neural network architectures that these use cases employ appear to be stable enough to warrant being called hardened or mature enough for corporate use. This will dramatically accelerate AI adoption in the near future.

According to the 2022 AI Index, an annual study of AI impact and progress at the Stanford Institute for Human-Centered Artificial Intelligence (HAI) led by an independent and interdisciplinary group of experts from across academia and industry the field of artificial intelligence (AI) is at a critical crossroad.  This report points out these two trends among many others. One, private investment in AI has more than doubled since 2020, in part due to larger funding rounds. In 2020, there were four funding rounds worth $500 million or more; in 2021, there were 15 and secondly, AI has become more affordable and higher performing. The cost to train on image classification has decreased by 63.6% and training times have improved by 94.4% since 2018.

This brings us to the technology that we call AI and what it will take to move towards a more general-purpose technology more akin to human capabilities. The AI index referred to above points out that the cost of training has dramatically dropped. This notion of training is at the heart of existing AI technologies. The most common AI systems use what are called neural networks. While the name may suggest a mammalian brain-inspired architecture the similarity to our brain ends there. These neural nets are simply algorithms represented by nodes called neurons which are connected to other nodes arranged in sequential layers.

By carefully (a mathematically time-consuming process) adjusting the connection strength (or weight) between neighboring neurons the network is able to visualize patterns in large data sets. The reader can turn to many sources to dig a little deeper into network types. Since the architectures are composed of neurons and connections which are sometimes called synapses there is a tendency to call them brain-inspired, but as I said earlier the parallels end there. These neural nets are used in what we call machine learning (ML) which actually takes these networks and trains them to recognize patterns in large amounts of data.

As a concrete example, one can train these nets by exposing the network to thousands of cat images, for example, which will help establish the connection strength or synaptic weights in the neural network and will continue to refine these weights as it sees more cat pictures. The training is deemed complete when the ability of the network recognizes a previously unseen cat picture as being a cat with an inference accuracy above a preset minimum. Typically, inference accuracies in the high 90s are common.

Once training is complete the network can be deployed and will be expected to identify a new pattern correctly with the required accuracy. Such ML engines that are trained for narrow use cases (narrow AI), be it visual images or text or voice, will drive a lot of revenue in the next few years as real industrial applications will benefit greatly.  However, such ML use cases are hardly what we would call intelligent when we compare them to even a child’s capability for a number of reasons.

The biggest one being that no child needs to see thousands of cat pictures to be able to identify a cat in real life. Literally, a handful of select pictures will do the job. Let us also not forget that ML engines require kilowatts or megawatts of power to learn a task while the human brain operates at ~20 watts. To the credit of the AI/ML community, these engines have demonstrated inference accuracies better than humans in quite a few use cases. Still, my initial comment about their intelligence is correct.

So, what should we look for then in an “intelligent” machine? While we can narrowly try to define that in each field of use there must be a broader general-purpose definition or benchmark. Therein lies the problem. Computer performance metrics have been around for decades and we must realize that we had very tangible performance descriptors like chip frequency or transistor size or a number of circuits in a 1cm x 1cm chip or the very commonly used Flops (floating-point operations per second) etc. It is very easy to quantify and chart the improvement over time. Measuring intelligence is a different beast altogether.

The society at large cannot even agree on what is a good measure of intelligence in humans let alone machines. Let’s take reading a paragraph of text as an example. While computers have always done well at reading the text for digitizing we have never really expected that the computer has, in any way, understand the content. There has been some progress in that space more recently with Microsoft’s Machine Reading Comprehension which attempts to use a novel neural network architecture called the Reasoning Network (ReasoNet), and the team was able to mimic the inference process of human readers. Tasks like understanding context or intent etc. are human-like traits and depending on the field of use we can come up with what capability would be considered “human-like”. You can begin to see the difficulty of measuring intelligence in a general-purpose “intelligent” machine.

Human common sense is still a leap beyond everything we have seen in the AI/ML space. To describe this my colleague Dr. Wilcke at IBM often uses this question in his talks. Do clouds pay taxes? This is a very interesting example because every adult will answer this question with absolutely no thought or hesitation. Yet, I will argue that, not a single one of has been explicitly taught in school or elsewhere that clouds do not pay taxes. How then is this question so easy to answer? If we pose this question to any ML engine it will be unable to answer it with any degree of confidence unless someone has explicitly programmed this as a rule. This is because humans, starting from our infancy, develop a model of world through learning and more primarily through observation.

Another vitally important difference is the temporal (or sequential) aspect of how we learn. A child can easily recite the English alphabet from A to Z but if we posed a question like “what comes before U?”, as an example, the child will have to try and repeat parts of the sequence to find the answer.  The same goes for a song lyrics. You will begin to see that a lot of our learning is tied to sequences in time or space pointing to an entirely different learning process.

Much of the work today is still with neural networks because of the immediate monetization potential. There have been advances to go from narrow AI to a broader AI with the evolution of transfer learning and one-shot learning and many others but none of them are convincingly getting closer to common sense.

So in summary, can these neural networks, over time, reach this broader inference capability or will it require an entirely different architecture to learn and exhibit common sense. This is an ongoing debate and time will tell. The fact will still remain that neural nets need far too much energy to learn to be competitive with the human 20-watt brain.