Revolutionizing Daily Life: The Impact of Large AI Models

The universe in a jar, highlighting AI's vast potential.

The following excerpt is taken from The Algorithmic Bridge, a newsletter designed to educate readers on the significant effects AI has on our lives and equip them with the knowledge to navigate the future effectively.

The Algorithmic Bridge

Connecting algorithms to people. A newsletter dedicated to AI that influences your life. Read more at The…

thealgorithmicbridge.substack.com

In reflecting on the consumer technologies that have transformed our interactions and lifestyles in the 21st century, two notable examples emerge: smartphones and social media. We are now on the brink of a third transformation.

The launch of Apple's iPhone in 2007 marked a pivotal turning point in the realms of the internet, advertising, software distribution, and mobile devices (remember BlackBerry?). The iPhone gradually revolutionized our everyday engagement with the digital landscape, making it unimaginable to consider life without a smartphone. Similarly, social media platforms, from Facebook to TikTok, have reshaped our virtual connections, which now often occupy more of our time and attention than our physical interactions. The past two decades have brought changes so profound that the world would seem almost alien to anyone who experienced the 20th century.

During this same timeline, artificial intelligence (AI) has entered a new era of interest, funding, and innovation—interconnected with the rise of smartphones and social media. Since 2012, research focused on deep learning has produced remarkable advancements. Initially, breakthroughs in computer vision were followed by the development of large language models, spurred by Google’s transformer architecture's emergence in 2017. Over just ten years, the AI field has achieved a deeper comprehension of neural networks and scaling laws, constructed larger and higher-quality datasets, and devised robust hardware tailored for intensive AI tasks. From 2012 to 2022, this sector has experienced unparalleled growth.

Currently, generative large language models, along with multimodal and artistic models, dominate the scene, as tech giants, ambitious startups, and non-profits strive to harness their capabilities for either profit or to democratize their benefits.

OpenAI, a prominent startup, has played a crucial role in this landscape over the past five years. While it did not initiate the fierce competition for AI dominance, it significantly accelerated the pace with the launch of GPT-3 in mid-2020—arguably the most recognized AI model of the decade. This 175-billion-parameter behemoth outperformed its predecessors (GPT-2, BERT) by a factor of 100, showcasing the effectiveness of scaling laws for language models: larger models yielded superior results. The buzz surrounding GPT-3's capabilities captured the attention of businesses, investors, and consumers alike, leading to two years of extraordinary developments.

From GPT-3 to LaMDA and DALL·E to Stable Diffusion

Besides GPT-3, another prominent language model is LaMDA (137 billion parameters), introduced by Google in May 2021. It gained significant public interest when former Google engineer Blake Lemoine claimed it exhibited sentience earlier this year—a claim that is, of course, unfounded. Google’s latest advancement, PaLM (540B), released in April, currently holds the title for the largest dense language model and the highest performance across various benchmarks, marking it as state-of-the-art in language AI.

Google, historically a leader in AI research, is not the only tech giant making waves in this competitive arena. Meta, previously known as Facebook, has also made notable progress. Recently, the company unveiled the third iteration of BlenderBot (175B), a sophisticated language-model-based chatbot released in the US for public interaction (with mixed results). Additionally, in May, Meta introduced OPT (175B), an open-source counterpart to GPT-3. Collaborating with OpenAI, Microsoft (its primary funding source) and Nvidia (whose GPUs are vital for training and running AI models) teamed up in October 2021 to develop MT-MLG, a 530B model that, along with PaLM, dwarfs other offerings in the field.

Despite fierce competition from wealthier companies, smaller AI-centric firms like DeepMind and OpenAI have managed to maintain their leading positions. DeepMind entered the race in late 2021 with its first model, Gopher (280B), which surpassed all previous models and achieved state-of-the-art status. Shortly thereafter, in March 2022, the company announced Chinchilla (70B), which, while smaller than its counterparts, demonstrated superior performance (now only outdone by PaLM). DeepMind leveraged Chinchilla to reaffirm the significance of data alongside model size. OpenAI also continued to innovate, refining GPT-3 into a more aligned version called InstructGPT, with predictions suggesting that GPT-4 is on the horizon. Other startups have emulated OpenAI's business model, offering large language model services on a pay-as-you-go basis, with AI21 Labs and Cohere leading the charge.

In the non-profit sector, collaborative initiatives focused on open science and open-source models are also carving out their space. BigScience, in partnership with Hugging Face and supported by EleutherAI and other organizations, developed BLOOM (176B), a model built on ethical and inclusive principles. Some consider BLOOM to be "the most important AI model of the decade," a bold assertion supported by compelling arguments.

What I have outlined here is a snapshot—albeit incomplete—of how large language models have reshaped the AI landscape and influenced the industry's objectives. However, AI companies have not limited themselves to language models. Recognizing that our world is multimodal and we are multisensory beings, it is logical to incorporate this multidimensionality into AI systems. This has led to the development of multimodal models (like Google’s MUM), visual language models (DeepMind’s Flamingo), generalist agents (DeepMind’s Gato), and the currently popular trend in AI: diffusion-based generative visual models (often referred to as AI art models).

Generative visual models—many of which are partially trained on language—are significantly smaller than their language-focused counterparts but possess similar transformative potential. OpenAI was instrumental in popularizing these models with DALL·E and CLIP in early 2021, followed by GLIDE later that year, and DALL·E 2 earlier this year, which ignited the current AI art phenomenon. Meanwhile, other companies have been developing their alternatives: Microsoft released NUWA in 2021, Meta created Make-A-Scene in March 2022, and Google introduced Imagen and Parti in May and June 2022, respectively.

However, the most exciting and practical models are those we can readily utilize. Initially, users relied on Google Colab notebooks (like Disco Diffusion), but developers have since begun creating no-code, user-friendly applications built on diffusion models. Among the most recognized models besides DALL·E are Craiyon (formerly DALL·E mini), Midjourney, and the ever-popular Stability.ai’s Stable Diffusion, which I recently referred to as "the most important AI art model ever." These models, some requiring subscriptions while others are freely accessible, are redefining creative processes and challenging our perceptions of artistry.

Midjourney and Stable Diffusion, showcasing collaborative AI art.

After exploring these significant developments over the past two years, consider this: these AI models, integrating language, multimodal, and artistic capabilities, are poised to become your next virtual assistant (a smarter, more conversational alternative to Siri or Alexa), your next search engine (a more intuitive and natural version of Google Search or Bing), or your next creative tool (a more versatile and imaginative Photoshop or GIMP). Research models are on the path to becoming tangible products.

Transitioning from research to production is a complex journey. Tech companies have already gained experience, as computer vision systems have enjoyed a multi-year head start over language AI. Once firms like Google, Meta, or Amazon deemed the technology mature enough, they integrated vision-based AI systems into their existing offerings. Applications such as facial and emotion recognition, object detection, ID verification, pose estimation, and feature extraction are now commonplace across various industries and markets worldwide. A similar trajectory is anticipated for language and multimodal models, heralding a third technological revolution in this century. This will create a trifecta comprising smartphones, social media, and large AI models—an interdependent blend of technologies that will have profound impacts on society and individual lives. How will this influence the broader world and our personal experiences? In what ways will it reshape our relationships with technology and each other? What unforeseen effects will it have on our daily routines? Answers to these questions will emerge sooner rather than later.

Subscribe to **The Algorithmic Bridge* to explore the complete article.*