ChatGPT is just the tip of the iceberg that enterprises find themselves crashing into even though the underlying mass – Large Language Models (LLMs) – was introduced back in 2017. In AI years, that’s a lifetime ago.
LLMs ought to have already featured in digital transformation programs, but many CIOs and technology leaders are still figuring this out.
Whilst there has been a rush of explanations of ChatGPT phenomena and hundreds of proposed applications, from copy-writing to ideation to citizen coding to science to log-processing, very few organizations have considered the strategic use of LLMs.
Stanford declared LLMs as Foundation Models via a rather long, yet instructive, paper. Due to their incredible, almost incredulous, performance, LLMs have proven to be foundational to many applications using Natural Language Processing (NLP).
This power, mostly thanks to the size of the models, has taken many by surprise. It is indeed curious that the underlying training scheme of guessing missing words from large corpuses of text should turn out to be so useful in tackling almost every NLP task imaginable.
What is even more staggering is how this supercharged word-power is easily accessible via just a few lines of Python code thanks to the rise of services like Huggingface that have made these models available beyond research labs.
Such is the power at the fingertips of many a coder, plenty of pre-LLM AI tools have been made obsolete. Here’s a quick tip: for every AI solution touted, check to see if LLMs could replace or improve it. The answer is increasingly yes.
Suppose you want to add keyword labels to call transcripts or contract clauses, in order to classify them, this is literally a few lines of code. As of the time of this writing, Huggingface has 16,757 text classification models in their library. Of these, perhaps a hundred are foundational whilst the rest are fine-tuned riffs, like this one, tuned on US Amazon product reviews.
LLMs for Digital Transformation
There are many frameworks of digital transformation, but let’s consider their context, namely the range of innovation types per Geoffrey Moore:
Agile value streams will span the entire gamut of innovation types and there isn’t one that won’t be impacted by LLMs, or even the core idea: Attention.
Let’s take a quick detour of Attention (if you want a longer code-level explanation, see my repo on Github). This simple invention (a differentiable memory) gives rise to two enterprise opportunities:
- Models that can understand the language used across your enterprise
- Scalable pattern-finding in many kinds of enterprise data besides language
Attention is powerful at finding patterns in set-based data. Indeed, the invention was motivated by mapping one kind of set to another (language translation).
It’s called Attention because it figures out how members of a set “pay attention” to each other. For example, the word He in the sentence “He ran across the road” pays close attention to the word ran, but not so much to the word road.
Attention doesn’t just figure out that he attends to ran in particular. Nor does it figure out that the first and second word are related, as they will often be. It figures out that “He ran” is a common kind of pattern, one that us humans would recognize as subject-verb correspondence.
Given enough examples and enough layers of Attention, the Transformer architecture can find many patterns, even over longer spans, like “He, the tall man, ran surprisingly swiftly.” Here it will also associate pronoun (He) with subject (man) or, more generally, subjects with predicates.
Attention doesn’t know what these patterns are, which is why an LLM doesn’t “know” the rules of grammar per se (and, arguably can never know). But grammatical structures, formal and informal, get approximately embedded into the model via this pattern-finding process, as does knowledge encoded in the source text.
Even when approximately embedded, these grammatical rules prove useful as the building blocks for so many NLP tasks built using LLMs.
But what if your enterprise is full of arcane texts specific to business, like bills of materials or technical product descriptions or specific contract legalese, etc?
Using a mechanism called Transfer Learning, the power of a foundation LLM can be transferred to a custom one fine-tuned for your enterprise. For example, the Fairmont hotel group could fine-tune a model to become the “Fairmont-LLM” that understands “hotel speak”. Silicon Valley Bank could build a “venture debt speak” model, and so on.
Let’s explore what this looks like in the context of digital transformation.
Fairmont LLM: “Hotel Speak”
Imagine a value-stream in the Fairmont hotel’s transformation programs that is linked to deeply personalized experiences across the customer journey. Here’s a journey map from GCH Hotel Group as a reference.
Source: GCH Hotel Group
What do all of these interfaces have in common? They all use language.
Given the foundational power of LLMs, it is entirely possible that the “Fairmont-LLM” could power many NLP services within myriad touch-point components:
This has obvious implications for digital transformation architecture because a unified custom-LLM potentially impacts many innovation types and related value streams. Failure to notice this will stymy efforts to produce consistent customer-facing experiences. They will cost more and deliver less.
Google’s 2023 AI and Data report confirms a trend set by software experts (like Martin Fowler) who have argued for unified Data Mesh and Data Fabric architectures that put an end to data silos. This thinking applies equally to AI, in particular LLMs.
For example, fine-tuning will require constant updates and related services, such as data-quality to ensure human alignment–i.e. making sure that any generated text is compliant with policies, brand guidelines and ethical standards.
This will require a composable data architecture and a distributed MLOps operating model, more akin to a mesh. This is a far cry from many of today’s embedded enterprise AI solutions with opaque data models–an anti-pattern for scalable value from LLM models and services.
Transformers: LLMs in Disguise
There are many ways custom LLMs could/should be deployed in the enterprise as a strategic IT resource versus a point solution. Various architectural and ops patterns are still emerging.
With different business cases and transformation mindsets to consider, it pays to have awareness of the potentially strategic role of custom LLMs before committing to yet another embedded AI point-solution that lacks flexibility or is heavily tied to a vendor.
Going a step further, let’s return to our earlier claim that Transformers can pay attention to patterns in all kinds of set-based data. It turns out that many enterprise data sources are sets that contain patterns amenable to discovery via Attention: sales forecasting, actions on a website, molecules in a drug, power trends in a wind farm etc. (I did research to apply Transformers to no-code website generation because website layouts are inherently set-based.)
Where it gets really interesting is when different sets get combined into novel multi-modal models capable of solving even larger classes of problems such as how language used in sales might predict revenue.
Many datasets within the enterprise are ripe for multi-modal modeling, which tends to suggest that Transformer-based modeling might become a horizontal data-science capability, maybe via new low-code tools, not just a feature of an embedded AI product. Certainly, the availability of Huggingface models makes this a realistic proposition.