The Manifold & Mantra Hypotheses: Why AI works, and why it doesn't.

Business leaders: pay attention.

AI works because of a fact of nature that is sometimes expressed by the Manifold Hypothesis — that complex things can be explained via simpler mechanisms (that an AI might find from samples of the complex thing). However, AI often fails because of what I call the Mantra Hypothesis, which is the tendency of business leaders to declare, mantra style, that the business will “Put AI First” or some such moniker, but without the means to translate that intent into operational realities that might lead to a meaningful ROI.

In this post, I will outline for a business audience why many AI projects in enterprises either fail, or are likely to fail. Reasons for failure can be summarized as:

Blind adoption of AI without a clearly defined set of outcomes that can be explained beyond the level of pithy directives like “We need to adopt AI”.
Assignment of AI adoption to folks who lack experience in mapping advanced tech into tangible business outcomes.
Avoidance of vanity success measures in adoption of AI.
Avoiding the trap of hoping that hiring AI technical experts will translate into business outcomes via some kind of osmosis.
Failure to understand the data requirements to make AI training viable.
Underestimation of the operationalization of data and AI required to translate AI experiments (often by scientists) into repeatable and robust products.
Lack of a “Data as a Product” approach to AI, including the tendency to want to “supercharge” existing product features vs. finding new AI-first interpretations.

AI: The Hype is Real, Yet Insufficient

Right now, AI-hype is effervescent. The truth is that no-one knows where we are in the hype cycle — with many voices calling from the trough of disillusionment whilst others holler from the peak of Mount Olympus where they gather with sentient-like AI gods. The naysayers aren’t necessarily theorists like Gary Marcus who claims that Deep Learning has “hit a wall” due to its lack of compositionality — a blurry technical term that could mean many things (whole conference about it here). Put simply, Marcus is saying that any AI that builds a language model merely by predicting which words come next, or thereabouts, cannot retrieve deeper semantic structures (“meaning”) in the data — a capability that he argues is required to make AI usefully “intelligent” in most real-world applications.

The meaning of “intelligent” is hotly debated, often by those who have apparently never read Turing’s original paper in which he suggested that the concept of thinking machines was an “absurdity”. He meant that any agreeable definition of “thinking”, if we might construct one, is commonly understood to be a biological capability, nothing to do with machines.

However, for this discussion, I prefer to lean upon the moniker “usefully intelligent” wherein the word “usefully” is far more interesting and ought to be related to business outcomes, not arcane debates about intelligence. And, in this regard, there are all kinds of brick walls that business leaders will face when attempting to adopt AI.

Other voices are calling from within the trough of disillusionment, namely business leaders who don’t yet see any clear relationship between “adopting AI” (whatever that means) and tangible ROI. And I don’t mean leaders who picked up the airport best-seller on AI and tasked their CIOs to come up with an “AI strategy”. I mean leaders who already placed their bets, although probably not the whole farm, via real hard cash spent investing in AI, acquiring or aqui-hiring AI companies, yet nonetheless failing to translate such investments into meaningful outcomes.

This often stems from a kind of corporate arithmetic:

Business + AI = Better Business.

It apparently works for any business:

Air-Conditioning Vendor + AI = Better Air-Conditioning Vendor.

And so on. Belief in this formula rapidly translates into some kind of mantra for the company: “Put AI First”. However, whilst this formula has the potential to be true, it is not an operational formula — merely believing in the hype of AI — the formula — is insufficient. It tells us nothing about how the addition of AI could make a better business. Yet, it is surprising how many business leaders have adopted this formula without asking probing questions. Moreover, many make the fatal mistake of going about acquiring technical AI expertise as the first step as if the presence of that expertise will permeate into the business and bring about the realization of the formula. Far from it.

This happened repeatedly with the adoption of “Big Data” and “Data Science” only to find mediocre ROI, if not a negative. It is not uncommon for an enterprise to acqui-hire an AI start-up that is basically a team of research scientists, much closer to academia than industry, with scant knowledge of applied innovation and little to no experience of translating their various AI Python libraries into meaningful modifications of the current and future enterprise roadmap.

To dig further, we might consider certain useful generalities as to why the adoption of shiny new tech can easily fail. Any business initiative that proclaims to “Put AI First”, or some such declaration, is possibly doomed from the start. The lack of a meaningful scope and outcomes is the elephant in the room, condemning the program to inevitably proceed along predictable lines. One such line will be suffering a long lag between flipping the switch to “Put AI First” and eventually concluding, perhaps wrongly, that it maybe wasn’t worth it.

Without a practically useful interpretation of AI into provable customer benefits and business outcomes, AI can easily become just another fad, like when many a CEO bought into “Re-engineering” the company only to find that no one had ever really understood in tangible terms what “engineering” a company meant, never mind “Re-engineering”.

This blind approach can be radically worsened when organizations offer “sexy AI” roles to loyalists, often efficient program managers with a track record in other areas, along with their hired-in friends who desperately want the shiny-new-toy badge on their resumes. Yet they lack the depth of analytical expertise and product synthesis to translate shiny tech into outcomes — a task far harder than many imagine. Desperate to show progress, they create “AI departments” and the like, just like they created “Data Science” departments, and then set about constructing various vanity measures of success to present in monthly reviews. This is a common anti-pattern.

The antidote is to ensure that your product teams and key stakeholders throughout the business can articulate clear outcomes and benefits of AI adoption in their roadmaps, present and future. Indeed, I have advocated what I call the 100-of-10 rule. It states, at least as a starting point, that your entire leadership team and their direct reports ought to be able to clearly articulate how the entire 10% of effort within the 70-20-10 rubric could be devoted to AI-related programs. Given the massive disruption potential of AI-first start-ups and the emergence of large-scale models, the potential impact of AI is worthy of such attention and prioritization. But this must be interpreted into meaningful outcomes with detailed arguments before letting rip on the AI Adoption cord.

Many readers might want to stop here and address the above issues first. However, determined readers might read on as we explore the biggest AI elephant in the room, namely the “data”, or “lack of data” challenge.

The Real Brick Wall: Data

What is the data for?

There is often a hidden field in the AI adoption equation:

Business (+ Data) + AI = Better Business.

Many business leaders have at least heard that AI needs data. And, so, with a warm feeling that the enterprise is awash with data, AI-adoption confidence goes up. However, this is often a mistake.

Let’s begin with why AI works at all. If this basic plank isn’t well understood, then everything else rapidly becomes froth. By now, we all know that AI works by digesting data. The “works” bit can be plainly stated: a working (trained) AI can take new, previously unseen, data and reliably make some prediction that coheres with a ground truth. In other words, what it learned from specific data samples can be generalized to other samples. If you trained your AI to understand how to spot sales opportunities based upon a history of sales data, then it must generalize to new sales opportunities — and it ought to generalize in reliable and actionable ways.

AI can do this thanks to a strange mathematical reality, one that is sometimes expressed by theoreticians as the Manifold Hypothesis. For those really interested, you can look up the math, but I will explain it in much simpler terms: things in the natural world tend to obey parsimonious laws. These laws are compact whilst the things they can explain are vast. Imagine, for a moment, that we study thousands of objects falling from the sky until we can reliably predict the force with which they hit the ground — i.e. we take all our measurements, perhaps via a camera, and feed them into an AI.

Now let’s say we want to predict the next hundred objects, which could be of any material and size (i.e. mass). It would be a useless system if we could only predict a particular object because we have already seen an identical one. Indeed, such a system has no predictive power at all — it is merely looking things up from history. It would also not be so useful if we could only predict something when we see an object so similar that our predictive power is too weak to be useful. However, what if our system had somehow figured out Newtonian Mechanics: F=ma. This single “piece of information” (a formula) could be used to determine the behavior of an infinite number of objects. In effect, we have “compressed” the original data space from lots of examples to a single parameter (or maybe three: F, M and A).

The problem is that most of these kinds of symbolic understandings of the world have been discovered by humans via mysterious creative processes that we don’t know how to program, nor explain. Hence we rely upon a different approach, which is to see if an AI can take lots of examples of a behavior and condense them down into a function that is good enough to generalize to lots of unseen examples. Our hope, which we impose via design constraints within the AI architecture, is that the parameters of that function are far fewer than the size of the input space (in totality) such that we can use a reasonably sized set of examples to estimate the parameters well enough to generalize to a class of objects without having to observe every single possible example.

To put it simply, most things in the world could, in theory, be described by some underlying set of equations, if only we knew them. This is how nature is. Although there are trillions of unique leaves on all the trees, they all arise from a broadly similar set of underlying mechanisms that could be condensed into a relatively small set of information (e.g. equations) that is far, far, far smaller than trillions of unique data points. If this were not the case, no AI could learn anything because when we say that an AI is learning, we mean that it is attempting to search for that far smaller approximating function that lies beneath the data. If there were no such smaller underlying function, or principle, then the AI could not easily find it with the clues it has (some training data) and within the confines of a reasonable amount of computational effort.

What do I mean by finding? Well, that is what an AI does. It “searches” for a reduced set of data, called parameters, that not only explains the training data, but could characterize the underlying system from where the training data comes from — e.g. how leaves grow, or how objects fall, or how cancer presents in a biopsy sample image.

This principle applies to digitized systems despite the fact that digitization can give rise to mind-blowingly large datasets. For example, take a tiny image sensor that is only 16 x 16 pixels, and can only encode each pixel as black or white. There are 2^256 possible images, which is 1.15 x 10^7, known by numberphiles as 115 quattuorvigintillion. Imagine an AI had to process each combination to characterize images. Say it could process 1 Billion images per second. That would still take far longer than the age of the universe to complete.

This seems absurd, but those numbers are correct, even though our intuitions can’t relate to that much data from such a tiny image space. But, here’s the really cool thing — most of those images are unrecognizable — they are just noise. Any pixel in one of those noisy pixels doesn’t have any meaningful relationship with any other in terms of forming a recognizable structure, and so, each — and all — do not “say” much.

The actual number of useful images is ridiculously smaller. Take, say, recognition of hand-written digits. It only requires about a few tens of thousands of images of various styles to train an AI to recognize more or less any human-recognizable digit that it hasn’t seen before, no matter how varying the hand-writing style. This is because alphabets have stable morphology – i.e. the underlying rules to make a number 1 are always a vertical line with a top-left diagonal stroke no matter how a particular person writes it. What we want then is for an AI to see enough images to figure out an underlying function that ends up enacting a rule something like: pixels arranged in vertical lines with a diagonal top-left downward stroke predict a digit number 1.

What kind of data?

So this is how AI works — it exploits the natural tendency for things in the world, even when digitized, to be generated by a set of underlying parameterized principles that could be explained with relatively little data compared to all the things those principles could generate. It turns out that the so-called Neural Network is a machine that can find an approximation of those underlying parameters sufficient enough to be usefully generalizable in recognizing classes of data of which it has only seen relatively few samples during training.

So far, so good. But why does this also present a problem? The problem is still having enough data, of the right kind, to allow an AI to converge upon a useful approximating function. In many business situations, this data is really hard to come by, despite the common notion that we are all awash with “Big” data.

The “right kind” is perhaps more critical. For an AI to learn, it has to be able to observe clues about the function it’s trying to approximate. How does this happen? An easy example should suffice. Consider the following sequence:

1 → 1

2 → 4

3 → 9

4 → 16

5 → X

What is X?

Of course, you will think 25, which is correct.

You have figured out that the mapping function here is the squared operator. But if I gave you only the sequence: 1, 2, 3, 4 or only the sequence 4, 16, 9, 1, would you have guessed that these numbers came from a squared-number generator? No. You need the inputs and outputs to guide you — or supervise — when guessing the function. These outputs are called labels. Together with their relevant inputs they are the ground truth of our system. The approach is called Supervised Learning.

There are many challenges to finding this kind of data in a way that it tangibly supports the intended business goals. For example, a CMO might think that the existence of millions of sent emails ought to present a rich-enough dataset to deploy AI in the selection of subject lines that improve email open rates. However, it often turns out not to be so easy. In this case, there might not be enough variance in the data to be useful — i.e. many of the data points are very similar and so they don’t offer enough clues to the AI as to how to find the underlying “generator” function.

Another common problem is finding the ground truth. Lots of datasets in the enterprise might be large, but they lack the supervisory labels to be useful as training data. This might just be because the necessary observations were never recorded. For example, there are plenty of e-commerce datasets wherein the products lack sufficiently rich meta-data, such as keyword tags, to provide sufficient dimensionality in the input data.

It is important to recognize that the labels must represent the ground truth you are interested in. For example, if the true goal of the CMO is to map email behaviors to churn, then you need not just the email interaction data, but the churn data. Of course, in a typical enterprise, churn data might well be available, but you get the idea. Take medical applications as another example, such as the use of AI to analyze tissue samples (via slides). One ground truth might be annotations from a pathologist as to the presence of disease. However, what if the intended goal of the AI is to predict prognosis or probability of a certain treatment’s effectiveness for the patient in question. This requires a different set of ground-truth data, namely historical records of treatment outcomes.

There’s also the nature of the data itself. AI works because the underlying function-approximation assumes that the training samples are highly representative of all the possible samples there could be. This is a certain statistical assumption that does not hold in some cases, such as the so-called “Long Tail” cases where infrequent samples that might not appear in the training data, can have overwhelming effects on the outcome.

Of course, there might be solutions to these problems. For example, in the case of finding useful email headings, data outside of the enterprise can be used to train an AI on aspects of language that evoke a certain emotional response. This kind of data might be found in places like online reviews, or public datasets. Once trained upon this data, the AI can then be tuned to solve the specific AI problem, as seen here in the Oracle email platform. Generally, this technique is called Transfer Learning.

However, the specific solution to data readiness, if a solution exists, is not as important as being able to recognize that the reality of the data part of the AI formula can be very hard to achieve. We will discuss more practical solutions below.

Data in Practice: Operationalization

I could go on and on about the various data requirements needed to make AI effective. Above I have only mentioned a few of those that are somewhat related to the theory of how AI works.

There is the much more practical concern of finding the data, preparing the data, labeling the data (if required) and so on. Both experimentally (when training the AI) and operationally (when deploying in a product), these data processing issues can present a significant amount of the overall effort in bringing an AI-enhanced solution to the market. This operationalization can include the very real challenge of updating the AI with sufficient regularity to adapt to new data samples or better approximations. The gap between the experimental AI program and an operationalized one can be large. This has given rise to a whole new field called Machine Learning Operations (MLOps).

An additional key limitation might be gaining access to data that belongs to the customer. There are all kinds of privacy issues and related audit and compliance concerns to verify what data went into which AI model in order to prove data lineage etc. This might also be important in achieving a certain quality metric from the AI, like avoiding certain kinds of negatively-impacting decisions, such as denying someone a loan because of skin color.

This is related to the wider concern of “AI safety” or “AI alignment” and data governance. Put simply, if your data governance is already lacking pre-AI-adoption, as it often is in enterprises with large technical debt, then you might be in for a rough ride.

Data as a Product and AI-First Mindset

It is almost shocking to see how often the adoption of data-related disciplines, like Data Science and AI, seem to throw out everything we have learned in the modern software era about how to translate commercial intentions into commercial realities, as if the lessons of Agile and the like were never learned. Indeed, this might well be the case, but let’s be more charitable in our estimations of baseline product operations.

There is a tendency in data-related projects to plow straight into complex costly programs with the assumption they are going to work. If we have learned anything from modern software, it is that this assumption is typically incorrect, as the product won’t quite work in the way intended or expected. If those programs are largely staffed by AI researchers and technicians, then it’s often the case that very few of them understand how to manage technical product programs.

It is not inconceivable that an AI team will go about making some kind of tool, say a method to enhance sales, only to find that the sales team doesn’t use it. No stakeholders were consulted, no user tests were done, no demos were conducted of the product in progress, no tests were conceived to assure quality of its outputs. And, often worse of all, no experiments were done to verify scope etc.

It is all too common for rogue data to upset the entire apple cart, again due to lack of data governance. Sales folks changed the way they rated sales prospects via some undocumented hack in the CRM and suddenly the reality of garbage-in, garbage-out is dramatically amplified by an AI that is ingesting inverted or incorrect data.

There is no exception for data not to be treated as a product. All of what we have learned about incremental development, testing, stakeholder involvement, demoing, design-thinking, and so on, still applies, even if you’re making a dashboard.

But there is another product perspective that is often overlooked, namely the absence of critical interpretation of novel AI-first capabilities. There is a tendency for AI to be used as a means to supercharge existing product capabilities. For example, an e-commerce product recommender might be enhanced using AI, as might a website personalization module. This is the “low-hanging fruit” approach. There might not be anything wrong with such an approach, but it should be undertaken with caution.

With any such amplification, one first has to ask how far it really translates into ROI. As you might have already understood, the most precious resource in AI adoption is the AI expertise that ultimately generates the core of the innovation, notwithstanding all of the above constraints in translating that innovation into meaningful ROI. In many sectors, you will be competing, if not now, then soon, with so-called AI-first start-ups who plan to tackle your sector using a radically different approach. For example, in my own work for an e-commerce platform giant, I proposed an AI-first approach called “No Design” web authoring — the conversion of English sentences into finished web pages. The point is that this is uniquely possible because of recent AI advances (called Transformers) and potentially far more impactful within a 1-3 year ROI horizon than tinkering with, say, personalization or search.

Put simply, the “No Design” approach has the potential to radically alter how retailers think about their website strategies, opening up new ideas due to “superhuman” scaling of creative effort etc.

Summary

In going about AI adoption, you should believe the hype. AI really does have transformational potential, perhaps radical in some cases. However, as with all advanced tech, the mere adoption of it does not guarantee success. Enterprises tend to be very amnesic, so it is not unexpected that all the lessons learned from modern product development are suddenly forgotten in the face of a new paradigm like AI.

Naive approaches should be avoided. These include assigning AI as “just another program” to company loyalists who have shown proficiency with running programs. As an antidote to the mere circulation of mantras and hype, business leaders should insist upon well argued AI enhancements to the roadmap articulated by stakeholders with the backing of experiments where possible.

The greatest caution should be applied to any assumptions about data. We have explained the magic of AI (the Manifold Hypothesis) in searching for approximate functions that explain the training data, hopefully in generalizable ways, but this magic needs enough data of the right kind to be effective. It is too easy to mistake existing meta-data as a useful basis for the actual ground truth required for the magic to work in the way intended as a useful outcome.

The operationalization of AI is not to be underestimated, potentially overwhelming the scientific work to the point of rendering it ineffective in practice. Careful attention must be paid to effective data governance and AI quality assurance.

The product mindset must prevail, which could be interpreted as mapping the benefits of Agile, Lean, and so on, to all data programs. The ease with which AI researchers might write Python scripts can easily create a massive bias about how easy it is to build AI products. But these are not products. They are prototypes, or often not far from demos.

The innovation mindset must also prevail, thinking anew about product strategy in light of novel AI capabilities. This is the so-called “AI First” approach often overlooked by enterprises due to an eagerness to attain some validation of AI investments via “low hanging fruit” product enhancements. It is also overlooked when using existing program experts to manage AI, thinking as they might along familiar product lines.

The Manifold and Mantra Hypotheses: Why AI works and Why it Doesn’t.