Google's SynthID: A Guide With Examples

Learn what Google’s SynthID is, how it works, and how to implement a SynthID watermark for text.

Jul 8, 2025 · 8 min read

With AI-generated content spreading rapidly and technology advancing, distinguishing between human and machine-created material is becoming harder and more important at the same time.

AI watermarking tools like SynthID aim to make the origins of digital content traceable and help users verify authenticity. In this article, I’ll explain what SynthID is, how it works, and how you can use it to apply watermarks to text.

What Is SynthID?

SynthID is a tool developed by Google DeepMind that embeds invisible watermarks into AI-generated content. These watermarks are designed to help identify whether a piece of media was created by artificial intelligence.

The goal of AI watermarking is to make digital content more transparent and traceable, especially as AI becomes more advanced. Without reliable markers, AI-generated material can easily blend in and spread unnoticed, leading to various problems, such as the spread of fake news and deepfakes or the unauthorized use of creative work.

SynthID supports watermarking across text, images, video, and audio. Its approach adapts to each media type, which we’ll explore in detail later. The watermark is designed to survive common edits like trimming, noise, compression, cropping, and filtering, making it impressively robust.

SynthID is already integrated into Google’s generative AI products: Gemini for text, Imagen for images, Lyria for audio, and Veo for video. This means that these models can embed imperceptible watermarks directly into their output. Combined with the SynthID Detector portal, Google offers a complete watermarking solution that enables users to quickly verify AI-generated content across all supported formats.

How Does SynthID Work?

How the process of watermarking looks in detail depends on the media format. I’ll explain their procedure and resilience one by one.

SynthID for images and videos

For picture data, SynthID uses two neural networks. The first one subtly modifies individual color values (pixel values) in the image so minimally that the human eye can’t perceive any difference.

The changes are selected in such a way that the watermark remains detectable for the second neural network even after typical image edits like cropping, compression, filtering, rotation, or even screenshots. This makes the watermark especially resilient to common manipulations that often occur when images are shared or reused.

SynthID for videos

Every frame of a video gets the same treatment as individual images: each frame is individually watermarked, ensuring that the mark remains detectable no matter how much the video is trimmed. This makes SynthID’s invisible video watermark robust against basic edits like trimming, compression, or minor cropping.

Determined users or commercial bypass services can still remove or obscure the watermark, especially when motivated by passing media filters on advertising platforms or for content moderation. For example, applying aggressive filters—such as color distortion or extreme contrast changes—or re-encoding the video with major adjustments to compression, frame rate, or color profile can degrade the watermark enough to make reliable detection difficult.

SynthID for audio

For audio, SynthID converts the waveform into a spectrogram, which is a visual representation of the spectrum of frequency in an audio signal as it changes over time. It then embeds the watermark into the spectrogram and finally reconstructs the audio from it. The watermark remains inaudible but is resilient to standard audio processing.

To handle conventional formats, the watermark is embedded in a way that survives lossy compression. However, extreme manipulations like pitch-shifting or time-stretching can distort the spectrogram, reducing detection accuracy. Furthermore, while being robust to MP3 compression, SynthID’s effectiveness may vary across proprietary audio formats.

SynthID for text

When an LLM generates text, it breaks down language into tokens—words, characters, or parts of words—and then predicts the next token based on probability scores. SynthID uses this characteristic to embed watermarks during the generation process by tweaking these probability scores. The modification is done in a controlled, pseudorandom way that makes certain word choices slightly more likely.

The resulting statistical pattern in the text is invisible to readers and does not affect the meaning, quality, or even creativity of the generated text. When detection is needed, SynthID analyzes the text for these subtle probability patterns, comparing them to what would be expected from watermarked versus unwatermarked content. The watermark generally survives light editing and paraphrasing, though it can be weakened by heavy rewriting or translation.

SynthID Limitations

SynthID faces significant limitations in real-world deployment. While its text watermarking performs consistently across languages—unlike post-hoc detectors that fail on untrained languages—detection rates plummet under heavy paraphrasing or translation.

For non-text modalities (images, video, audio), published accuracy metrics are notably absent, preventing quantitative validation of Google’s resilience claims.

Furthermore, watermark embedding proves less effective for factual responses, where constrained generation options limit pattern insertion without compromising accuracy.

Like other AI detection tools, SynthID does not explain how it arrives at its decisions to prevent circumvention. If the exact detection criteria or algorithms were made public, it would be much easier for users to manipulate content specifically to evade detection.

Additionally, the underlying models are complex and often operate using statistical patterns or probability scores, making their decisions difficult to interpret in simple terms, even for experts.

Currently, SynthID’s most reliable detection is achieved with content generated by Google’s own models, where the watermarking is deeply integrated. While an open-source version of SynthID-Text is available and can be used with compatible language models via Hugging Face Transformers, detection rates and robustness are generally lower compared to Google’s native implementations.

As a result, SynthID’s universal applicability remains limited, and it may not consistently identify AI-generated content from other providers such as OpenAI or Meta, especially when content is edited by different AI systems.

Hands-On Implementation: SynthID For Text

Let’s get started! You don’t need any special Google API access—SynthID works locally within the transformers framework. All you need is:

A Python environment (version 3.8 or higher) with the transformers and torch packages installed
Access to an LLM via Hugging Face.

Creating the environment

The Python environment could, for instance, be set up using Anaconda and the following commands. Please note that the version of the transformers package should be 4.46.0 or newer to support SynthID for text.

conda create -n synthid-env python=3.9
conda activate synthid-env
pip install "transformers>=4.46.0" torch

Downloading the model

For the text generation, we will use the lightweight model gemma-2b, because it is specifically designed to run efficiently on consumer hardware while still delivering strong performance across a range of language tasks. Since it is a gated model, logging in to a Hugging Face account and accepting the model’s license terms in the browser is required.

The Gemma-2b model is available in two main formats:

The safetensors variant for use with PyTorch and Hugging Face Transformers
The gguf variant for specialized inference engines like llama.cpp

For this tutorial, you only need the safetensors files, as they are required for loading and running the model in Python with transformers. However, you also need to download the associated configuration and tokenizer files, because these files contain essential information about the model architecture and how text is processed—without them, the model cannot be properly loaded or used, regardless of which weight format you choose.

huggingface-cli download google/gemma-2b model-00001-of-00002.safetensors model-00002-of-00002.safetensors config.json tokenizer.json tokenizer.model tokenizer_config.json special_tokens_map.json

Loading the model

To be able to use Gemma-2b or other gated models from Hugging Face, we also need to provide a Hugging Face access token. The access token only needs reading permissions. When creating your token in the Hugging Face settings, select the “read” role; no write or admin permissions are required.

# Logging in to Hugging Face
from huggingface_hub import login
login("<your_huggingface_token>")

We need to import AutoTokenizer and AutoModelForCausalLM from the transformers package, and initialize both with the model of our choice.

# Loading the models
from transformers import AutoTokenizer, AutoModelForCausalLM
 model_name = "google/gemma-2b"
tokenizer = AutoTokenizer.from_pretrained(generator_model_name)
model = AutoModelForCausalLM.from_pretrained(generator_model_name)

Creating the SynthID configuration

To apply SynthID watermarks to AI-generated text, you need to create a watermark configuration object using SynthIDTextWatermarkingConfig. This object controls how watermarks are embedded during text generation.

The keys parameter is a list of random integers (typically 20-30 values) that serve as your private digital signature. They determine how word-choice probabilities are subtly modified during text generation to create a detectable pattern. Please note that those keys are secret—if they are exposed, attackers could forge watermarks or generate undetectable AI text.

The ngram_len parameter defines the word-sequence length used for watermark pattern analysis. The lower the values are, the more likely the watermarks are to survive heavy editing, but the harder they are to detect. Conversely, higher values improve detection but break under minor edits. A value of 5 is recommended as the best trade-off for real-world use.

from transformers import SynthIDTextWatermarkingConfig
# SynthID configuration
watermark_config = SynthIDTextWatermarkingConfig(
    keys=[634, 300, 846, 15, 310, ...], # list of 20-30 integer numbers, keep this secret!
    ngram_len=5
)

Applying a watermark to text

The prompt for the text to be watermarked first needs to be tokenized into a PyTorch tensor, so the model gets the numeric tokens it understands. This tokenized_prompt is then passed to the model’s generate() method alongside the configuration object. The do_sample parameter has to be set to True to enable watermark-compatible sampling.

tokenized_prompt = tokenizer(
    ["Answer in two sentences: What is AI?"],
    return_tensors=”pt”
)
output_sequences = model.generate(
    **tokenized_prompt,
    watermarking_config=watermark_config,,
    do_sample=True
)

This modifies word-choice probabilities during generation, embedding an invisible pattern while preserving text quality. To ensure reusability, use the same keys across sessions, so watermarks can be detected consistently.

In this example, we asked the model to tell us its definition of AI. Let’s first see the response by converting the tokens back into human-readable text using the `batch_decode()’ method.

watermarked_text = tokenizer.batch_decode(
    output_sequences , # Model's tokenized output
    skip_special_tokens=True # Removes control tokens like [BOS], [EOS], etc.
)
print(watermarked_text)

['Answer in two sentences: What is AI?

Answer:
Artificial intelligence (AI) is a branch of computer science that deals with the development of intelligent machines that can perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.
AI applications include autonomous vehicles, chatbots, and virtual personal assistants.']

How to Detect AI Watermarks in Text

When it comes to detecting SynthID watermarks in AI-generated text, there are currently three practical options available.

For demonstration and experimentation, Hugging Face Transformers provides a Bayesian detector class. This class enables testing the detection workflow in Python code using an open-source dummy model. However, it’s important to note that this model is intended for demo purposes only and does not offer reliable accuracy or production-grade robustness.

For organizations needing tailored detection, the Bayesian detector class can be used to build custom SynthID-compatible detectors. This end-to-end example shows how it is possible to train on watermarked text generated with a specific configuration and shared tokenizer to enable consistent watermark verification across multiple internal models.

Once trained, detectors can be uploaded to a private Hugging Face Hub repository for secure organizational access, and Google’s Responsible GenAI Toolkit offers supplementary guidance for production deployment.

For authentic detection of SynthID watermarks in content generated by Google’s AI models (such as Gemini or Imagen), Google offers a cloud-based SynthID Detector Portal. The service is user-friendly and does not require coding, but it is proprietary, currently only available via a waitlist, and limited to verifying content from Google’s own ecosystem. There is no public API or local version for developers at this time.

Further Discussions on SynthID

Standardization and data privacy

SynthID has not been adopted as an industry-wide standard, as major players like Microsoft and Meta continue to use own proprietary watermarking systems, creating a fragmented ecosystem where cross-platform detection remains ineffective. Currently, no standardized evaluation protocol exists for watermark robustness across text, image, audio, and video modalities.

Industry partnerships—like the one with Nvidia, which uses SynthID in their Cosmos platform—exist, but are rare. So, despite its promising outlook, SynthID is still quite far from being considered a universally accepted standard.

Contrary to some concerns, SynthID does not inherently enable surveillance or content tracking by Google, as the watermark is a passive identifier embedded at generation and requires voluntary uploads to the SynthID Detector portal for verification. There is no evidence that Google monitors content distribution or usage beyond the portal’s scan-and-delete workflow. Google has not disclosed whether uploaded files are stored long-term or reused, creating potential privacy gaps.

Removing watermarks from AI-generated content complicates copyright enforcement, especially since laws like the EU AI Act don’t clearly define ownership of AI outputs. While the US COPIED Act (2024) criminalizes such removal, attackers exploit loopholes—like translating text or re-editing media—to strip watermarks without legal consequences. This ambiguity leaves creators vulnerable.

Tools like SynthID, limited to Google’s ecosystem, fail to address the scale of the problem, since only a small fraction of posts carry detectable watermarks. Static watermarks alone cannot combat AI misinformation due to technical vulnerabilities, false positives, and platform incompatibility. In the worst case, tools like SynthID might risk fostering dangerous overconfidence by implying reliable detection while failing against non-Google content or edited media.

Generative watermarks like SynthID-Text are not a comprehensive answer to AI detection; they are rather a tactical tool that complements other strategies. Their effectiveness depends on collaboration among providers who actually embed the watermark during text generation. For identifying AI-generated text from sources that do not use watermarking, alternative methods—such as post hoc detection—are still necessary.

Conclusion

SynthID offers a way to watermark and verify AI-generated text, making content origins more transparent in an age of synthetic media. However, its effectiveness is limited by the lack of industry-wide standards and the ease with which watermarks can be removed or bypassed.

To learn more about AI watermarking and other responsible AI practices, feel free to check out these resources:

Course: Generative AI Concepts
Course: Responsible AI Practices
Blog: AI Watermarking: How It Works, Applications, Challenges
Blog: What Are Deepfakes? Examples, Applications, Ethical Challenges

Author

Tom Farnschläder

Topics

Artificial Intelligence

Generative AI

Learn AI with these courses!

Course

AI Ethics

1 hr

59.4K

Explore AI ethics focusing on principles, fairness, bias reduction, and trust in AI design.

See Details

Start Course

Course

Introduction to Data Security

2 hr

8.2K

Discover how to become a data defender and keep data safe and secure with this beginner-friendly interactive course.

See Details

Start Course

Course

Introduction to AI Agents

1 hr 30 min

31.6K

Learn the fundamentals of AI agents, their components, and real-world use—no coding required.

See Details

Start Course

blog

AI Watermarking: How It Works, Applications, Challenges

Learn about AI watermarking, a technique for embedding signals into AI-generated content to make it traceable and protected.

Natasha Al-Khatib

8 min

blog

ChatGPT Search: A Guide With Examples

Learn how to use ChatGPT Search for various query types (informational, navigational, commercial, and transactional) and explore all its features.

Dr Ana Rojo-Echeburúa

8 min

Tutorial

Gemini Diffusion: A Guide With 8 Practical Examples

Learn what Google's Gemini Diffusion is and how it works through eight practical examples in text generation, game development, simulations, and more.

Aashi Dutt

Tutorial

Google's Veo 3: A Guide With Practical Examples

Learn how to use Veo 3 to create a spec ad, maintain character consistency across different shots, and gain modular control with the Ingredients feature.

Alex Olteanu

Tutorial

Google Jules: A Guide With 3 Practical Examples

Learn what Google Jules is and how to use it to automate real-world development tasks for your GitHub repository.

Aashi Dutt

Tutorial

Synthetic Data Generation: A Hands-On Guide in Python

Learn everything you need to know about synthetic data generation. Discover the techniques and tools that make synthetic data essential for AI and machine learning with practical Python code examples to help you get started!

Moez Ali

See More See More

What Is SynthID?

How Does SynthID Work?

SynthID for images and videos

SynthID for videos

SynthID for audio

SynthID for text

SynthID Limitations

Hands-On Implementation: SynthID For Text

Creating the environment

Downloading the model

Loading the model

Creating the SynthID configuration

Applying a watermark to text

How to Detect AI Watermarks in Text

Further Discussions on SynthID

Standardization and data privacy

Legal and social implications

Conclusion

AI Watermarking: How It Works, Applications, Challenges

ChatGPT Search: A Guide With Examples

Gemini Diffusion: A Guide With 8 Practical Examples

Google's Veo 3: A Guide With Practical Examples

Google Jules: A Guide With 3 Practical Examples

Synthetic Data Generation: A Hands-On Guide in Python

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}AI Ethics

Introduction to Data Security

Introduction to AI Agents

AI Watermarking: How It Works, Applications, Challenges

ChatGPT Search: A Guide With Examples

Gemini Diffusion: A Guide With 8 Practical Examples

Google's Veo 3: A Guide With Practical Examples

Google Jules: A Guide With 3 Practical Examples

Synthetic Data Generation: A Hands-On Guide in Python

AI Ethics