Nightshade: Poisoning on text-to-image models

Brain Titan
2 min readOct 26, 2023

--

Nightshade: Poisoning on text-to-image models

Nightshade: A attack on text-to-image generative models

This tool makes subtle changes to the pixels of images uploaded online. Then when these images are scraped for model training, these models will make confusing and unpredictable errors when generating images.

The purpose of this tool is to fight back against AI companies that use other people’s works for model training without permission.

The method is very hidden and hard to detect! 😂

Training data “poisoned” with this tool may damage future versions of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion. For example, a model might recognize a dog as a cat, or a car as a cow.

When an attacker performs data poisoning on multiple concepts related to a specific prompt word (such as “cat”, “animal”, “pet”, etc.), these different poisoning attacks can also be used on the model. A cumulative or additive effect occurs in responses to that particular cue word.

This tool can also be integrated with another tool called “Glaze”, which allows artists to “mask” their own personal style to prevent being crawled by AI companies.

Related reports: https://t.co/IQrsepFEJl

Paper: https://t.co/KZhGv5vbQS

Attack principles, steps and key technologies:

The author selected a variety of popular text-to-image generation models as experimental subjects, including Stable Diffusion SDXL and Dalle-3.

New attack method: This paper proposes a data poisoning attack called “Nightshade” for the text-to-image generation model for the first time. The uniqueness of this attack method is that it does not require a large number of poisoning samples, but achieves the attack purpose through a carefully designed small number of samples (

--

--

No responses yet