Skip to main content
The Keyword

2024: A year of extraordinary progress and advancement in AI

Collage showing robot arms tying sneaker laces; text saying Gemini 2.0 against a blue and black background; and a dachshund wearing goggles while swimming in a Veo 2 video still
A YouTube video paused on a screen showing a blue and black background with text that says: Gemini 2.0 Enabling the agentic era
00:00

Gemini 2.0 is built for the agentic era, bringing enhanced performance, more multimodality and new native tool use.

NotebookLM Audio Overview

In this Audio Overview, two AI hosts dive into the world of NotebookLM updates.

A table showing an example of a question and response about how to save money, based on the FACTS grounding dataset class.

The FACTS Grounding dataset comprises 1,719 examples, each carefully crafted to require long-form responses grounded in the context document provided.

Bar graph titled: Number of citations to GenAI scientific publications for the top 20 institutions, 2010-2023. Alphabet (US) is at the top with 65,703 citations.

This WIPO graph, based on January 2024 data from The Lens, illustrates more than a decade’s worth of Alphabet’s generative AI scientific publication efforts.

Animation with audio showing a MusicFX DJ creation in the making, combining genres such as classical ballet with minimal techno
00:00

MusicFX DJ generates brand new music by allowing players to mix musical concepts as text prompts.

Supercut of Veo 2-generated videos, including a cartoon girl in a 1980s kitchen talking excitedly to the camera and a car drifting through a cityscape

Veo represents a significant step forward in high-quality video generation.

Animation showing eight images, including an apple, polar bear and pumpkin, before and after edits to attributes like transparency and roughness.

In these examples of AI editing with synthetic data generation, Input shows a novel, held-out image the model has never seen before. Output shows the model output, which successfully edits material properties.

A bi-arm robot straightening shoe laces of a black and white sneaker and tying them into a bow.

With ALOHA Unleashed, our robot learned to tie a shoelace, hang a shirt, repair another robot, insert a gear and even clean a kitchen.

A chart showing robot training text and image inputs, such as text saying: Put the strawberry into the correct bowl, with a corresponding image showing the action.

Robotic Transformer 2 (RT-2) is a novel vision-language-action model that learns from both web and robotics data.

Left: Animation showing AlphaChip placing the open-source, Ariane RISC-V CPU, with no prior experience. Right: Animation showing AlphaChip placing the same block after having practiced on 20 TPU-related designs.

AlphaChip can learn the relationships between interconnected chip components and generalize across chips, letting AlphaChip improve with each layout it designs.

Animation showing the California coast behind a quantum chip and the word Willow

Willow has state-of-the-art performance across a number of metrics.

A geometric diagram featuring a triangle ABC inscribed in a larger circle, with various points, lines and another smaller circle intersecting the triangle

AlphaGeometry 2 solved Problem 4 in July 2024’s International Mathematical Olympiad within 19 seconds after receiving its formalization. Problem 4 asked to prove the sum of ∠KIL and ∠XPY equals 180°.

Colorful protein structure against an abstract pink and blue gradient background

AlphaFold 3’s capabilities come from its next-generation architecture and training that now covers all of life’s molecules.

Illustration of a predicted protein binder structure in blue interacting with a target protein in yellow

AlphaProteo can generate new protein binders for diverse target proteins.

Magnified animation of lime green and bright purple cells spinning together in a mirror image

In the deepest layer of the cortex, clusters of cells tend to occur in mirror-image orientation to one another, as shown in this brain mapping project.

Scatter plot showing how various models perform on the MedQA US Medical Licensing Exam (USMLE)-style question benchmark, with Med-Gemini achieving 91.1% accuracy

On the MedQA (USMLE-style) benchmark, Med-Gemini attains a new state-of-the-art score, surpassing our prior best (Med-PaLM 2) by a significant margin of 4.6%.

Animated map of the world showing GraphCast’s predictions across 10 days, with a blue/white color scheme for specific humidity; yellow/orange/purple for surface temperature; and blue/purple for surface wind speed

This selection of GraphCast’s predictions rolling across 10 days shows specific humidity at 700 hectopascals (about 3 kilometers above surface), surface temperature and surface wind speed.

Two maps: one showing expanded flood forecasting coverage in Google’s Flood Hub, the other showing additional shaded areas to represent virtual gauge locations on the same map

Our flood forecasting model is now available in over 100 countries (left), and we now have “virtual gauges” for experts and researchers in more than 150 countries, including countries where physical gauges are not available.

Animation on a white background showing names of new languages in Google Translate including Luo, Wolof and Veneto.

These new languages in Google Translate represent more than 614 million speakers, opening up translations for around 8% of the world’s population.

An animation showing an LLM generating a sentence reading: My favourite tropical fruits are mango and banana, with probability scores appearing for various predicted tokens.

When there’s a range of different tokens to choose from, SynthID can adjust the probability score of each predicted token, in cases where it won’t compromise the quality, accuracy and creativity of the output.

Let’s stay in touch. Get the latest news from Google in your inbox.

Subscribe