A General Model & Incomplete History of AI

April 25, 2024

A General Model and Incomplete History of AI

The discipline of AI has broadly involved two approaches — symbolic and subsymbolic. What we know as AI is largely subsymbolic AI involving machine learning and more specifically, neural networks and deep learning. In order to regulate it, we first need a conceptual model of how it works.

Before delving into questions on how to understand and regulate AI, we must begin with what we mean when we colloquially and in technical materials use the term AI. AI gets defines with varying degrees of breadth and vagueness. For instance, the Singapore Model AI Governance Framework describes it as a system which, “seek[s] to simulate human traits such as knowledge, reasoning, problem solving, perception, learning and planning.” The EU’s AI Act and the OECD define it broadly as system that is designed to operate with varying levels of autonomy and that can, for explicit or implicit objectives, generate output such as predictions, recommendations, or decisions influencing physical or virtual environments.

The Origins of AI

The term, Artificial Intelligence itself goes back to the 1950s. In 1955, a twenty-eight years old computer scientist John McCarthy along with Marvin Minsky, Clause Shannon and Nathaniel Rochester approached the Rockefeller Institute with the aim of securing funding to arrange a summer program at Dartmouth College. McCarthy had to pick a name for what the delegates at this summer school would be pursuing. Perhaps, in an attempt to distinguish the field from the then popular term, cybernetics, he picked Artificial Intelligence. McCarthy later acknowledged that the name was not particularly favoured by anyone—since the objective was authentic intelligence rather than artificial—but he felt compelled to label it something. Armed with the kind of overconfidence that perhaps only technologists can muster, McCarthy and his associates predicted that, they “think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.” The proposal listed subjects to pursue such as natural-language processing, neural networks, machine learning, abstract concepts and reasoning, and creativity, which to this date remain central to the pursuit of AI.

The summer school did not produce anything of substance, but ushered in the first AI spring, as time periods marked with significant excitement, and consequently funding for AI research, have come to be known. [1]

AI as a Family Resemblance Concept

The question of what AI means, remained one that continued to invite debate. Minsky coined the phrase, “suitcase word”. According to him, words like intelligence are imbued with many meanings , and have many cousin words such as thinking, cognition, consciousness, and emotion, packed with a number of different meanings. Minsky’s categorisation harks back to Wittgenstein’s idea of a family resemblance concept.

In Philosophical Investigations published in 1953, Ludwig Wittgenstein wrote that things which we expect to be connected by one essential common feature, may be connected by a series of overlapping similarities, where no one feature is common. Instead of having one definition that works as a grand unification theory, concepts often draw from a common pool of characteristics. Drawing from overlapping characteristics that exist between family members, Wittgenstein uses the phrase ‘family resemblances’ to refer to such concepts.

From a regulatory theory perspective, Wittgenstein’s conception of things which elude a tightly defined formulation has immense utility while approaching regulation of digital technologies. We already see its use in the context of privacy. In Understanding Privacy, Daniel Solove makes a case for privacy being a family resemblance concept. Responding to the discontent in conceptualising privacy, Solove attempted to ground privacy not in a tightly defined idea, but around a web of diverse yet connected ideas. Some of the diverse human experiences that we instinctively associate with privacy are bodily privacy, relationships and family, home and private spaces, sexual identity, personal communications, ability to make decisions without intrusions and sharing of personal data. While these are widely diverse concepts, intrusions upon or interferences with these experiences are all understood as infringements of our privacy. Other scholars too have recognised this dynamic, evolving and difficult to pinpoint nature of privacy. Robert Post described privacy as a concept “engorged with various and distinct meanings.” Helen Nissenbaum advocates a dynamic idea of privacy to be understood in terms of contextual norms.

Similarly, rather than having a strict definition of AI, it is perhaps more useful to think in terms of a set of related concepts of what constitutes AI, the thing that we are trying to regulate. Several terms — narrow and broad AI, general and specific AI, strong and weak AI exist which try to differentiate between varying examples of how AI works, but their distinctions may be of only limited utility to a regulator. However, despite their definitionally elusive character, regulators and consumers alike, need to form a sense of what they are dealing with.

Symbolic AI

In 1961, Simon and Newell created the ambitiously named General Problem Solver for the Rand Corporation. This program worked by having an understanding of the current state and the desired state and a set of rules designed to go from the former to the latter. For each type of problem that the program was asked to solve, it had to call upon a set of encoded rules. This approach, called symbolic AI, relies on the idea that general intelligence can be captured entirely by the right kind of symbol-processing program, where the concepts in the real world are represented by symbols in the program. If that is the case, decision-making processes in the real world can be abstracted into encoded rules which the program can implement. The most obvious uses of symbolic AI are in expert systems where specialised human knowledge is abstracted to computationally processed rules and symbols. The kinds of processing that such systems perform are, by their nature, discursive.

On the face of it, such an approach makes sense for any algorithm that performs functions, which persons themselves can reason with or act upon, or attempts to answer a question or decide an action which persons do commonly or habitually answer or decide themselves and, are commonly able to themselves explain or account for. Thus, symbolic AI recommends itself as an approach for an algorithm that replaces or displaces human actions in an interpretative practice, let us say, where individual acts are discursively accountable to peers.

Subsymbolic AI

On the other hand, sub-symbolic AI was built employing a rudimentary understanding of neuroscience and sought to capture the sometimes-unconscious thought processes underlying what the late Daniel Kahnemann has called System 1 or instinctive and fast thinking, such as recognizing faces or identifying spoken words. In Organization of Behavior, Donald Hebb explains the witticism ‘Neurons that fire together, wire together’. [2] He posited that when axons of a neuron are near another neuron, and repeatedly fired, some growth process or metabolic change takes place and strengthens the link between them. These neurons together were dubbed as ‘cell assembly’ by Hebb. This reflects something psychologists have predicted about humans learning by association for some time. Thus, chess players study the best opening gambits and endgames, gradually creating a metal library of chess moves and strategies. The patterns or cell assemblies get solidified with time in the human brain and become second nature allowing the chess player to almost intuitively arrive at the best possible moves while playing.

The exploration of neural networks traces back to the research by Warren McCulloch and Walter Pitts during the 1940s. They recognized the potential of modeling neurons as electrical circuits, akin to simple logical circuits. Leveraging this concept, they formulated a basic yet highly adaptable mathematical model. Frank Rosenblatt further enhanced this model in the late 1950s, leading to the development of a neural net framework. In the case of neurons, connections with other neurons have varying strengths. Crudely put, when the cumulative sum of all electrical input that a neuron receives exceeds a threshold, then it fires. While calculating the sum, more weightage is given to stronger connections. It is widely believed that learning involves adjustments to strengths between neural connections.

Rosenblatt’s perceptron attempted to replicate this process where a computer program has multiple numerical inputs and one output. It makes a yes-or-no (1 or 0) decision based on whether the sum of its weighted inputs meets a threshold value. Unlike the rules-based systems or symbolic AI, it does not have encoded rules to make decisions. If it has the correct weightage, perceptrons could perform non-discursive or perceptual tasks such as image recognition. Supervised learning is the process of training the program to adjust its weightages through a labelled dataset. So in the case of an image recognition system, say, to train the system to correctly identify cats, it would be fed a large set of both positive and negative examples, which are labelled as positive or negative.

It is not surprising that these two approaches to developing artificial intelligence competed for funding in the 1950s and 1960s. After the Dartmouth Summer School, McCarthy and others came to dominate the discourse on AI. In response to the sub-symbolic approach, Minsky and his MIT colleague Seymour Papert published a book called Perceptrons in 1969 showing that the kind of tasks a perceptron could solve were very limited. Rosenblatt himself recognised these limitations. To make it more useful, it needed to evolve into a multi-layer neural network (which form the basis of deep learning AI systems, as we understand them today), but Minsky and Papert speculated that the “virtues [of the perceptron did not] carry over to the many-layered version.” This indictment by one of the biggest and most influential AI names of the time was responsible for a decline in funding for neural networks until it was revived decades later.

How do Neural Networks work?

A multi-layer neural network is, in simple terms, one which has several layers between the input and output layers, often called hidden layers. In relative obscurity, through the 1970s and 80s, Minsky’s and Papert’s assumptions were proven to be wrong. This was done through a learning algorithm called back propagation. What the back-propagation approach does is to identify an error at the output stage, and ‘propagate’ the blame for the error back to the hidden layers. This assignment of blame works by adjusting the weightage in the network to reduce the error. Rinse and repeat until the output error gets as close to zero as possible.

If one has to stretch the brain-AI parallel further, computers execute tasks incrementally, handling operations like addition or toggling switches one by one, requiring numerous steps to achieve significant outcomes. What sets computers apart is the speed with which transistors can switch on and off, operating at speeds of billions of times per second. On the other hand, human brains excel in parallel processing. They can parallelly conduct computations many times over with billions of neurons. Yet, the pace of each computation is comparatively slower due to neurons firing at a maximum rate of a thousand times per second.

The count of transistors in computers is nearing that of neurons in the human brain, yet the brain surpasses it significantly in terms of connections. When we encounter a familiar face, it takes roughly a tenth of a second to recognize them. This time frame allows for just around a hundred processing steps at neural network’s switching speeds. However, within these steps, your brain efficiently sifts through your entire memory, identifies the best match, and adjusts it to fit the new context, such as different attire or lighting. Each processing step in the brain can be remarkably intricate, incorporating a wealth of information, indicative of a distributed representation.

Proponents of symbolic AI would have tried to write a computer program which tried to identify specific features in an image. If the program spots enough of these features in an image, it would accept it as a cat. But, proponents of sub-symbolic AI who built on the foundations of the perceptron would approach the tasks very differently. They would begin by feeding the program a large number of digitised images, including images of cats and other things. Then it would ask the program to compress this data to look for a set of features in many patches of many images. In this case, a feature may be a uniform color or brightness, or changes in brightness or color. The idea would be to find a set of features which makes it possible to reconstruct a similar image. Then the program is asked to abstract from those features, and to look for common features of those features. This process is repeated many times over. This conversion of raw data to expected input is called feature extraction.

If the original images contained numerous cats, there's a high likelihood that the system will develop certain cat-like characteristics at a higher level. Crucially, these features are dictated by the images themselves rather than predetermined notions of what an AI programmer deemed essential for cat recognition. This unsupervised learning approach has proven remarkably effective, surpassing expectations of AI researchers from just a few decades ago. The triumph of sub-symbolic AI is due to three factors: vast volumes of data available for learning (online, in dedicated databases, or from sensors), advanced computational methods for managing this data, and exceptionally fast computing systems.

Bayesian logic that drives AI

Now that we have a broad sense of the architecture of machine learning algorithms, let us take a brief look at the mathematical models that drive this learning. Bayes’s theorem is a fundamental concept in probability theory and statistics, used in building AI systems. Formulated by Thomas Bayes in the eighteenth century, it is used to determine the probability of an event with uncertain knowledge.

Bayesianism, as we understand it today, was pioneered by Pierre-Simon de Laplace. Laplace was born about fifty years after Bayes, and is widely considered the father of probability theory. Laplace asked how we know that the sun will rise tomorrow in his pioneering work of 1814, “Essai philosophique sur les probabilités”. According to Laplace, the principle of insufficient reason would dictate that we have no empirical evidence of the sun rising the next day, and there's no specific rationale to predict whether it will or won't. Consequently, we should regard both possibilities as equally probable and conclude that the sun will rise again with a probability of fifty percent.

But, how do you factor in the evidence of the sun rising every day so far. The seventh principle of probability outlined in Laplace’s Essai philosophique, the principles of succession, is simple. It asserts that the probability of an event happening is determined by adding up the probability of each potential cause multiplied by the likelihood of that cause leading to the event. If history serves as a predictor of future events, each day the sun rises should bolster our belief that it will continue to do so. After millennia of consistent behavior, the likelihood of the sun rising again tomorrow should be nearly certain, though not absolute, as total certainty is unattainable. Laplace formulated his rule of succession based on this reasoning.

In ‘The Master Algorithm’, Pedro Domingos spends a considerable amount of time explaining Bayes theorem as ‘the theorem that runs the world’. [3] He asks us to picture waking up in the pitch darkness of an unfamiliar planet. Despite only beholding a sky filled with stars, you possess a logical basis to anticipate the sunrise, given the typical motion of planets in the solar system around the sun. Consequently, your initial assessment of the likelihood of the sun rising should surpass fifty percent, let's say two-thirds. Let’s call this assessment the prior probability of the sun rising, established prior to encountering any evidence. It isn't derived from recording past instances of sunrise on this planet, as you were not present to witness them; rather, it reflects your innate beliefs about universal occurrences based on your general understanding. However, as the stars fade away, your assurance in the sun's ascent on this planet grows, drawing from your experiences on Earth. This increased assurance constitutes a posterior probability, formulated after encountering some new evidence. As the sky begins to illuminate, the posterior probability undergoes another surge. Finally, as the sun begins to appear on the horizon, it is now certain that the sun will rise. Bayes theorem answers how the posterior probability evolves as we encounter more evidence.

In essence, Bayes' theorem serves as a straightforward guideline for adjusting one's level of belief in a hypothesis when confronted with new evidence. If the evidence aligns with the hypothesis, the probability of the hypothesis increases; otherwise, it decreases. For instance, testing positive for AIDS would raise the probability of having it. The complexity arises when dealing with multiple pieces of evidence, like the outcomes of numerous tests, necessitating simplifying assumptions to avoid overwhelming complexity. It gets even more complicated when examining numerous hypotheses concurrently, such as all potential diagnoses for a patient. Several AI based algorithms, including those for prediction, anomaly detection, diagnostics, automated insight, reasoning/logic, time series prediction, and decision-making under uncertainty, make use of the Bayes theorem.

Let us consider the decisions that a physician has to take about the course of treatment for her patient. The question she is trying to answer is whether the patient has the flu or not. Whether this is the case or not depends on many factors. For the sake of simplicity, let us assume that all factors are binary, whether the patient has a temperature, a runny nose, a sore throat, chills etc. Domingos highlights that if the data about each symptom is a binary variable, and n symptoms must be considered, a patient could have 2 to the power n possible combinations of symptoms. If we take, for example, twenty symptoms and pair them with a database of ten thousand patients, we've only encountered a tiny fraction of the approximately one million potential combinations. Moreover, achieving precise probability estimations for a specific combination demands a significant number of observations, at least tens of instances, requiring a database encompassing tens of millions of patients. If an additional ten symptoms are introduced, the requisite number of patients would exceed the total human population on Earth. With a hundred symptoms, even if data acquisition were instantaneous, the storage capacity across all hard drives globally would be insufficient to house all the probabilities.

To work around this, we make simplifying assumptions. The most popular assumption that is made is to consider all effects independent given the cause. So, having a fever does not change the patient’s likelihood of having a runny nose, if we already know they have the flu. Under this system, if you already know that the sun will rise, seeing the sky lighten does not alter the probability of the sun rising. A machine learning system that uses Bayes’ theorem assuming that the effects are independent given the cause is called a Naïve Bayes Theorem. It is called so because making such assumptions in reality is naïve. Even so, it is extremely useful for many computational tasks. Its uses range from spam filters, where it helps in identifying words which are often seen in spam emails, or in search engines to predict relevance of web pages. Domingos very usefully draws a connection between the Naïve Bayes Theorem and the way the perceptron works. In both cases, each condition can contribute to the conclusion to varying degrees rather than being strictly binary. The need to deal with uncertainty was defined clearly as a long-term problem in AI development in the 1990s, and after trialling different approaches, the Bayesian inference became the default one.

What, then, is AI?

When we deal with AI, we are essentially dealing with pattern recognition systems, which are designed to automate decisions which are not necessarily pre-programmed or pre-determined. In the current state of art of AI, when we speak of sophisticated AI, we often mean some version of machine learning and more specifically, neural networks and deep learning algorithms. It is useless to talk in terms of sentience or autonomy of AI systems, for most conversations. Despite all their storied sophistication and dazzling aura, AI systems can be fooled by simple things — small, irrelevant edits to text documents, or changes in lighting. Even minor ‘noise’ can disrupt state-of-the-art image recognition systems. If small modifications are made to the rules of games that AI has mastered, it can often fail to adapt. These limitations highlight the AI's lack of understanding of the inputs they process or the outputs they produce, leaving them susceptible to unexpected errors and undetectable attacks. The impact of these machine errors can be minuscule and minor — extra steps to select pictures in a grid before you can log into a website, or extremely severe with high real-life costs — denial of benefits in a government programme with an automated delivery system or being locked out of a platform who you rely on to run your business by way automated detection of violation of community guidelines. Why we may want to regulate AI, has more to do with what it does — its purpose, its outputs and its impacts — rather than what it is. But, it is important to know what it is to arrive at appropriate strategies for how we can regulate it. In other essays in this collection, I talk about different kinds of transparency that AI systems need to deliver. A conceptual model of AI is needed to meaningfully make decisions about it, and respond to the decisions it makes for or about us.

[1] The summer school established four American men as thought leaders in the nascent field of AI—McCarthy himself who went on to set up the Stanford Artificial Intelligence Project, Marvin Minsky who founded the MIT AI Lab, the future Nobel Laureate economist Hervert Simon and computer scientist Allen Newell who set up an artificial intelligence laboratory at Carnegie Mellon University. For a comprehensive history of AI, with the various springs and winters, see Wooldridge, Michael. Brief history of artificial intelligence: What it is, where we are, and where we are going. S.l.: Flatiron Books, 2022.

[2] In the field of Neuropsychology, this is know as Hebb's postulate. For a an analysis of the influence of Hebb's postulate on the field, see Webster, Richard. Why Freud was wrong: Sin, science and psychoanalysis. Oxford: The Orwell Press, 2005.

[3] For a detailed and beautifully constructed narrative about Bayes' theorem and other key algorithms that drive AI, see Domingos, Pedro. The Master Algorithm: How The Quest for the Ultimate Learning Machine will remake our world. UK: Penguin Books, 2017.