New way to ensure the consistency of content features in image generation of StableDiffusion

Brain Titan
2 min readNov 18, 2023

--

Google has introduced a way to ensure the consistency of content features in StableDiffusion image generation.

This is actually a very important issue in image generation now. Story visualization, game development asset design, advertising, etc. all need to have character or content consistency during generation.

Judging from the demonstration, the effect is very good and the characteristics of the characters and other content are maintained very well. They have an example of a man’s life with more than a dozen pictures of different age groups, all of which look very similar to the same person.

Moreover, this project can also be combined with SD’s existing control methods, such as partial redrawing and Controlnet. The following is a specific introduction:

Implementation:

Identity clustering: This step involves first generating a sequence of images and then embedding these images into a semantic space. Next, a clustering algorithm is used to group these images, with each group representing a possible character identity. This process aims to identify a visually consistent set of images that will determine the character’s main visual characteristics.

Identity extraction: After identifying a set of images with high cohesion, the next step is to refine a more consistent character identity by training a model on these images. This means that the model will learn the key visual characteristics of a specific character in order to more accurately reproduce these characteristics in future generations.

Convergence: The final step of the method is an iterative process that stops when a certain convergence criterion is reached. In each iteration, the model generates new images based on the latest training data and performs clustering and identity extraction again. This process is repeated until the model can reliably generate a character with a consistent visual identity.

Effect verification:

Qualitative and quantitative comparison: In this section, the authors compare their approach with other personalized text-to-image generation techniques. This includes assessing the consistency and quality of the generated images through visual and numerical metrics.

User study: The authors also conducted a user study to evaluate the effectiveness of their approach in real-world use. This includes having users rate the consistency and attractiveness of the generated images.

Ablation study: This part evaluates the effect of different components of the method. By modifying or removing parts of the method, the authors were able to understand the contribution of each component to the final result.

Paper

More AI News

Artificial Intelligence Article

New AI Technology

--

--

No responses yet