SMITH and MUM: From Keywords to Conversations

A hyper-realistic, cinematic photograph in the style of a high-end BBC documentary feature. The scene depicts a modern, warm, and inviting British library or study room. In the foreground, a sleek, glowing digital interface hovers slightly above a wooden desk, displaying a holographic topographic map of a mountain (referencing the hiking example) alongside a cup of tea. The background blurs into rows of books that are seemingly transforming into streams of golden digital data. The lighting is soft "golden hour" sunlight streaming through a window, illuminating dust motes. The mood is sophisticated, intelligent, and bridging the gap between heritage and high-tech.

Imagine, for a moment, you’re standing in the middle of your local Library. You walk up to a librarian and ask, “I’ve hiked Ben Nevis in sturdy boots, but I’m planning to tackle Kilimanjaro next autumn. How should I prepare differently?”

Ten years ago, the librarian (let’s call him Google circa 2012) would have panicked. He would have frantically run around, grabbing a book about Ben Nevis, a brochure about boots, and a map of Kilimanjaro, dumping them all on your desk. “Here,” he’d say. “You figure it out.”

Today, thanks to two massive leaps in technology called SMITH and MUM, the librarian pauses, strokes his chin, and says, “Right then. Kilimanjaro is a different beast entirely. It’s not just about the boots; it’s the altitude. You’ll need different cardio training, and here’s a guide on acclimatisation.”

We are witnessing the most significant shift in the history of information retrieval. We’ve moved from a search engine that matches keywords to an answer engine that understands intent.

This is the story of Google’s SMITH and MUM—the two giant brains that took the baton from BERT and sprinted into the future.

Please note: The content below may contain affiliate links. If you make a purchase through these links, we could earn a commission, at no additional cost to you.

Part 1: The Family Tree

Where We Left Off: The BERT Era

To understand where we are, we must glance in the rear-view mirror. You might remember BERT (Bidirectional Encoder Representations from Transformers). We wrote about him previously. BERT was the clever chap who realised that words change meaning based on their neighbours. He understood that “bank” means something different in “river bank” versus “Barclays Bank.”

BERT was brilliant, but he had a short attention span. He was great at reading sentences or short paragraphs, but give him a 3,000-word dissertation on the socio-economic impact of the Industrial Revolution in Manchester, and he’d get a bit lost.

Enter the new generation.

Part 2: SMITH (The Deep Reader)

When Sentences Aren’t Enough

If BERT is the friend who reads the headlines, SMITH is the academic who reads the footnotes, the appendix, and the bibliography.

SMITH stands for Siamese Multi-depth Transformer-based Hierarchical Encoder. That’s a mouthful, even for a Scrabble champion. Let’s break it down simply.

BERT had a limit. He could roughly handle about 512 “tokens” (think of them as words or word-parts) at a time. If a document was longer, BERT had to chop it up, often losing the thread of the story.

SMITH changed the game. It was designed to understand long-form documents. It doesn’t just look at words in a sentence; it looks at sentences in a block, and blocks in a document. It understands the relationship between a paragraph on page one and a conclusion on page ten.

The “Siamese” Twin Concept

Why is it called “Siamese”? In the world of AI, a Siamese network consists of two identical neural networks working in tandem. Imagine two critics reading two different books at the exact same time to see if they discuss the same themes.

SMITH uses this to match long queries with long documents.

  • The Old Way: You search “history of Cornish pasties.” Google looks for the words “Cornish” and “pasty.”
  • The SMITH Way: You search “how did the tin mining industry influence the design of savoury snacks in Cornwall?” SMITH reads a comprehensive history of Cornwall, identifies the section on miners’ lunches, understands the crimped crust was a handle for dirty hands, and matches that specific deep context to your question.

Key Takeaway: SMITH helps Google understand the entire document, not just the keywords in the title. It made long, authoritative articles (like this one!) far more rankable because the engine could finally appreciate their depth.

Part 3: MUM (The Multitasking Genius)

The 1,000x Power-Up

If SMITH was an upgrade, MUM was a revolution.

Announced in 2021, MUM stands for Multitask Unified Model. Google engineers claimed it was 1,000 times more powerful than BERT.

While SMITH is a specialist in reading long texts, MUM is a polymath. It can do everything, all at once.

1. It Removes the Language Barrier

Here’s a frustrating reality: the best answer to your question might not be in English.

If you’re looking for the authentic way to cook a Japanese okonomiyaki, the best recipe is probably written in Japanese. Previously, Google ignored those pages.

MUM is trained across 75 different languages simultaneously. It doesn’t just translate; it transfers knowledge. It can learn a concept in Japanese and apply it to a query in English. It creates a global brain where knowledge isn’t siloed by borders.

2. It’s Multimodal (The Eyes and Ears)

This is the coolest bit. “Multimodal” means MUM understands information across different formats. It doesn’t just read text; it “sees” images and “watches” videos.

The Broken Hiking Boot Example: Imagine your hiking boot has split at the seam.

  • Pre-MUM: You type “how to fix split seam on leather hiking boot.” You hope the description matches.
  • With MUM: You take a photo of your boot using Google Lens and ask, “Can I use this to hike Snowdon next week?”

MUM looks at the photo, identifies the specific type of damage, understands the terrain of Snowdon (rocky, likely wet), and can answer: “Probably not. That split will let water in. However, here’s a video on how to repair it for light walking.”

3. Solving Complex “Zero-Click” Problems

MUM is designed to solve the “eight-search problem.” Google found that for complex tasks (like moving house or planning a trip), people had to do an average of eight separate searches.

  • “Weather in Edinburgh in October.”
  • “Flights to Edinburgh.”
  • “Family hotels Edinburgh.”
  • “Is Edinburgh Castle pushchair friendly?”

MUM aims to answer this in one go. “I’m taking my toddler to Edinburgh in October. What should I know?” MUM synthesises the weather (pack a mac), the flights, and the accessibility of the castle into a comprehensive answer.

Part 4: The British Context

Why This Matters in the UK

You might think algorithms are universal, but these updates have specific implications for us in Blighty.

The Nuance of the King’s English

British English is riddled with context. If you search for “chips,” are you looking for silicon chips, poker chips, or a local takeaway?

While BERT started solving this, MUM perfects it. By analysing images and wider context, if you’re searching from a mobile device in Brighton at 6 PM on a Friday, MUM knows you probably want dinner, not computer parts.

Protecting the NHS and YMYL

Google has a strict standard called YMYL (Your Money or Your Life). This refers to topics that can impact your health, finances, or safety.

In the UK, if you search “symptoms of chickenpox,” Google must prioritise the NHS or reliable medical sources over a random blog from 2005. SMITH and MUM are the gatekeepers here. Because SMITH reads the whole document, it can better determine if a medical article is truly comprehensive and backed by consensus, or if it’s just repeating a few keywords.

The End of “Pigeon English” Searching

We’ve all been trained to speak to computers like cavemen. “Weather London tomorrow rain.” MUM allows us to return to natural British civility. You can ask, “Will I need a brolly if I pop to the shops tomorrow afternoon?” and it understands the intent perfectly.

Part 5: The Technical “Under the Bonnet”

For the Geeks and Curious

Let’s quickly look at how this works without needing a degree in computer science.

Both SMITH and MUM are built on the Transformer architecture. Imagine a Transformer as a machine that pays “attention” to different words in a sentence to figure out how they relate to each other.

  • Attention Mechanism: In the sentence “The cat sat on the mat because it was soft,” the machine needs to know what “it” refers to. The mat? Or the cat? The Transformer assigns a score to the connection between “it” and “mat.”
  • SMITH’s Innovation: It uses a “hierarchical” approach. It first processes words into sentence blocks, then processes those sentence blocks into document blocks. It’s like building a LEGO house by first building the bricks, then the walls, then the house.
  • MUM’s Innovation (T5): MUM uses the “Text-to-Text Transfer Transformer” (T5). This means it treats every problem as a text transformation problem. Translation? Text-to-text. Classification? Text-to-text. This flexibility is what allows it to multitask so efficiently.

Part 6: Practical Applications

What This Means for Content Creators and Readers

If you run a website, write a blog, or just want to find things online, the rules have changed.

1. Depth Over Keywords (The SMITH Effect)

Gone are the days of stuffing the phrase “Best Plumber in Bristol” fifty times into a webpage. SMITH rewards depth. It wants to see that you’ve covered the topic exhaustively. If you’re writing about making a cup of tea, you better mention the water temperature, the brewing time, and the milk-first vs. milk-last debate.

2. E-E-A-T is King

Google evaluates content based on E-E-A-T: Experience, Expertise, Authoritativeness, and Trustworthiness.

  • Experience: Have you actually hiked Ben Nevis? (MUM can tell if your photos are stock images or real).
  • Expertise: Are you a qualified guide?
  • Authority: Do other hikers link to you?
  • Trust: Is your site secure?

3. The Visual Shift

With MUM, images are no longer just decoration. They’re information. If you’re selling a product, your images need to be clear and descriptive because Google is “reading” them.

Part 7: The Future

Beyond the Screen

So, what comes next?

We’re already seeing the children of MUM. Google Gemini and AI Overviews (formerly SGE) are the direct descendants. We’re moving towards a world where you might not “search” at all. You will simply have a conversation.

Imagine wearing smart glasses and looking at a menu in a French bistro. The AI won’t just translate the text; it will see “Escargots,” remember you have a shellfish allergy (context), and highlight it in red. That is the ultimate promise of MUM: a helpful, intelligent layer over reality.

Conclusion

The journey from Hummingbird to RankBrain, to BERT, and finally to SMITH and MUM, is a story of humanising the machine.

For the British user, it means the internet is becoming less of a digital filing cabinet and more of a knowledgeable companion. It understands our slang, respects our need for deep, accurate information, and can see the world through our eyes—literally.

So, the next time you snap a photo of a strange plant in your garden and ask Google, “Is this poisonous to my corgi?”, remember the immense, invisible machinery of MUM whirring into action, analysing pixels, translating botany papers, and delivering a life-saving “Yes” in milliseconds.

We aren’t just searching anymore. We’re finding.

Further Reading & Resources

To explore the technical and industry-standard discussions referenced in this article, we recommend the following authoritative sources:

  • Google’s The Keyword Blog: For official announcements on MUM and AI advancements.
  • Search Engine Land: The go-to publication for breaking news on algorithm updates.
  • Moz: For in-depth analysis on how these updates affect SEO strategy.
  • Google Search Central: The official documentation on E-E-A-T and search guidelines.

Leave a Reply

Your email address will not be published. Required fields are marked *