How OpenAI Stole Google’s Thunder

OM Malik-

Om Malik

Om Malik is a partner at True Ventures, a Silicon Valley-based early-stage venture capital group. Prior to joining True, he was the founder of Gigaom, a pioneering technology blog and media company.

I have been thinking about the OpenAI and Google I/O announcements — and how they represent two polar opposite strategies, and thus diametrically opposed perceptions of two entities that are in a small, elite club that is creating a direction for a set of technologies known as “artificial intelligence.”

Let’s start with OpenAI’s announcement just a day before Google I/O:

  • It introduced a GPT-4o, which it describes as a large multimodal model that accepts image and text inputs and produces text outputs. By accepting both image and text inputs, the model opens up a whole new range of possibilities for AI applications.
  • It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time (opens in a new window) in a conversation.
  • How does this happen? Previously you used Voice Mode to talk to ChatGPT. This had latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. Voice Mode was a three-step process — (a model) transcribes audio to text, (using GPT-3.5 or GPT-4) takes in text, and outputs text, and a third simple model converts that text back to audio. Now, OpenAI says, “a single new model works end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.
  • It is faster and 50% cheaper.

What does it mean? Think of a voice assistant that not only understands you and your questions, but also has the capability to have an empathetic two-way conversation without it feeling too uncanny, or feeling stilted like, say, Siri or Alexa. It also has solved the problem faced by “real-time translation.”

Their demo, which felt real, human, and full of fun and empathy, led to quick comparisons with the movie Her, nudged a bit by OpenAI’s chief honcho, Sam Altman. The buzz has been almost universally positive.

Rumors suggest a pending partnership between OpenAI and Apple—a collaboration that could provide Siri, which in tech years has grown decidedly middle-aged since its 2012 debut, with a new lease on life.

Not to sidetrack the discussion, but Apple’s device possesses the potential to marry the efficiencies of edge computing and artificial intelligence. A revamped bot-based Siri would likely lean on OpenAI’s engine for all off-device needs, while Apple remains in control of on-device voice synthesis, privacy control, and Siri-interface personas. I am sure, we will find out soon.

OpenAI’s event and announcement was a masterclass in public relations and reinforced the point we had previously seen in many Apple events — short and to-the-point events and announcements can have a much larger impact. (I am not sure anyone noticed, but ChatGPT now has real-time search results — something they didn’t even bother to highlight during the event.)

Just a day after OpenAI’s event, Google kicked off its Google I/O event.

Google, it seemed, threw a lot of spaghetti at the wall during CEO Sundar Pichai’s keynote. It was a keynote that allowed him to share Google’s grand AI vision for the attendees — developers and also for the broader world (read: investors) at large.

  • Google’s new Gemini is now a multimodal model that can reason across text, images, video, code, and more. (This is what OpenAI launched to steal Google’s thunder.)
  • Project Astra, is a real-time multimodal assistant that has understanding and real-time conversational capabilities as a universal agent.
  • The Gemini era is defined by multimodality, expanded context (1M tokens), and agents. They want to go for 2 million tokens.
  • AI Overviews, a revamped search experience powered by Gemini, is launching to everyone in the U.S. this week.
  • The Google search app on iPhone now has AI chat with Gemini built-in. If you refresh your app and look for the diamond-shaped icon at the top.
  • Ask (Google) Photos uses Gemini. Just ask natural language queries to surface specific memories, going beyond simple keywords to complex queries that mine context and metadata.
  • Trillium, a 6th-generation TPU chip, boasts nearly 5X performance over the previous generation. It powers an AI Hypercomputer architecture that combines large language models, generative AI, custom chips, liquid cooling, and global networking for extreme efficiency.
  • In Workspace, Gemini can summarize recent emails, parse hour-long recordings, and even draft replies – acting as your intern.
  • A new AI-enabled browser-based IDE, Google Gemini 1.5 flash in AI Studio and Gemini Nano for the mobile. For me, these were probably the most important of all the announcements, but that’s me.

I wouldn’t fault you for not grasping what was the most important part of the keynote. Google announced so much stuff—that despite my notes and watching the whole keynote thing twice, I can’t pick out that one “WOAH! that’s amazing” moment from the whole event.

This is a bit disappointing because, in my opinion, this was one of the most coherent Google I/O events in recent memory. The past three I/O events were so disjointed that I wondered if the management was mailing it in — which I think they were.

“AI” might galvanize the company and make it more focused, but one thing is becoming increasingly clear — Google’s days as a frontline innovator appear to be in the past. Had it not been for OpenAI, and its assault on the idea of search, software, and machine learning, Google would still be debating about all the technologies that have come out of its labs. Had it not been for Perplexity, it would still be shoveling the same shitty search experience down our throats.

The sheer amount of money they were making, and the pressure of appeasing Wall Street, plus a cadre of timid, visionless executives made them an also-ran — and vulnerable company. They still are, but for now, they can at least take some comfort in the role of a “swift imitator.” That’s the best you can expect from a company with a managerial team that echoes the company’s shift from product creators to managers.

It is unfair to compare the OpenAI and Google events. Why?

Because OpenAI made a product announcement. Google, on the other hand, is hosting its annual developer conference. They are not the same. The market, however, doesn’t see it that way. For most people, there is an equivalence.

This is precisely what happens when you let another company take control of the narrative. This is precisely what happens when you don’t have a clear vision, or ability to lead the industry. Ironically, there was a time when Google used to do what OpenAI is now doing — and make everyone a fast follower.

Of course, there are many advantages at work for OpenAI — that are in turn challenges for Google. Since OpenAI is a privately held entity, it inherently possesses more flexibility and fewer restrictions (and structure) that are imposed on publicly traded companies. Google can’t make random comments, or it’ll be exposed to liabilities.

OpenAI executives can theorize about everything from AGI to societal disruptions, without any backlash, as they are a private company. More importantly, they are very smart — most of the commentary from OpenAI’s leadership deals in abstracts, hithertos, and what-ifs. They sketch the outline, and let the rest of us fill in the blanks.

By carefully using finely tuned marketing and public relations strategies, OpenAI has set itself up as the vanguard of the new technological revolution. This must be equally galling for Facebook, which counts marketing and public relations as its core competencies.

Make no mistake — the reason OpenAI is achieving all this success (and hype) is because they have a product that for now is stellar. Nonetheless, OpenAI has created excitement that reminds me of the emergence of Palm, and later social networks. They stoked the imagination, and possibilities. Of course!

In a few weeks when Apple holds its developer conference, there is a good chance they are going to be talking about “AI.” They will talk about a new improved Siri. And if rumors are true, a partnership with OpenAI. But by then we all would know — Siri got smarter, thanks to OpenAI.

 

Related posts