Google launches SynthID Text, a tool for adding watermarks to texts generated using artificial intelligence
Google has developed SynthID Text, a new technology that allows text generated by artificial intelligence (AI) algorithms to be identified using “invisible” watermarks. The tool is available to developers as open source and free of charge. The company says its proposal will encourage the ethical and responsible development of more automated systems.
big tech Mountain View argues that this recognition mechanism is necessary to combat the abuse of large language models (LLMs). “AI can generate an ever-wider range of content on a scale previously unimaginable. While much of this use is for legitimate purposes, there are concerns that (the trailer) may contribute to issues of misinformation and copyright misattribution. “Watermarking is a useful method for mitigating these potential impacts.”
The company has worked on various resources that allow audiences to differentiate synthetically created materials from human-made materials. Last year, the company released an image tagging system. Later, he expanded the resource’s video validation capabilities. The company’s latest offering marks an important milestone in industry and regulatory efforts to curb malicious use chatbots.
Experts note that it is more difficult to apply watermarks to text than to apply watermarks to an image. In the first case, the choice of words or symbols is the only variable that can be manipulated. Google was able to implement digital printing for the first time on a large scale and in a real environment for works created using artificial intelligence. “The biggest news is that (the system) is actually being used in practice,” says Scott Aaronson, a computer scientist at the University of Texas at Austin and a former OpenAI employee.
How does Google’s SynthID Text work?
Pushmeet Kohli, vice president of research at Google DeepMind, recalls that LLM breaks language down into signs, words, or sentence parts to predict which element is most likely to follow another in a sentence. Each of these components or tokens It is scored based on its ability to be used appropriately in the results. The higher the rating, the more likely the model is to use it.
SynthID Text compares the expected probability score for words in watermarked and non-watermarked texts. This way, you will be able to differentiate content generated by an algorithm from content created without AI intervention. Google’s work published in the journal Nature, explains that “the tool’s sampling algorithm uses a cryptographic key to assign random scores to each token possible. tokens Candidates are selected from the distribution in numbers proportional to their likelihood and put forward to a “competition.” The system compares the results in a series of races until only token with the highest value. This is selected for use in the text.” He adds that this method makes it much more difficult to erase, counterfeit or reverse engineer an identification badge.
The findings are the result of a large-scale experiment conducted on the Gemini platform. The AI model turned on the identification system in May. Kohli and his colleagues analyzed ratings from more than 20 million responses from chatbots with and without watermark. They found that SynthID Text did not affect the quality, accuracy, creativity, or speed of LLM text production.
The tool still has shortcomings in recognizing short, rewritten or translated texts, as well as in processing answers to factual questions. “Achieving a reliable and undetectable watermark for AI-generated text is challenging because master’s student responses are nearly deterministic,” Soheil Feizi, an assistant professor at the University of Maryland, points out in the publication MIT Technology Review. However, he acknowledges that Google DeepMind’s work “allows the community to test these detectors and evaluate their reliability in different environments, which helps to better understand the limitations of these methods.”