UniPortrait: Maintain identity consistency and perform style transfer and free editing in single and multi-person scenes

Brain Titan
6 min readAug 27, 2024

--

UniPortrait is a unified framework for portrait image personalization, focusing on providing highly editable image generation while maintaining identity consistency in single-person and multi-person scenarios. The framework was developed by the research team of Alibaba Group.

It is able to:

  • Single and multi-person image personalization: Unify the generation of personalized single and multi-person images to ensure identity consistency in complex scenarios.
  • High-fidelity identity preservation: When generating images, the facial features and identity information in the reference image can be accurately maintained.
  • Extensive facial editability: Allows users to flexibly edit and customize images based on text descriptions without losing original identity features.
  • Free-form input description: Supports the use of a variety of text prompts without pre-set layout or formatting restrictions.

Key Features:

High Fidelity

UniPortrait draws portraits that are extremely lifelike, clearly showing each person’s unique facial details.

Highly editable

You can modify the portrait drawn by UniPortrait according to your own preferences, such as changing the hairstyle, expression, etc. UniPortrait can meet your needs very well.

Free creation

You can give full play to your imagination and describe in words what kind of portrait you want UniPortrait to draw. UniPortrait will try its best to understand your meaning and transform it into a vivid picture.

What problem does UniPortrait solve?

Challenges of identity preservation

Traditional image personalization methods often have difficulty in accurately maintaining the facial shape and texture details of the original reference image when generating new images. These methods usually either lose spatial information or fail to focus on the facial area, resulting in poor identity consistency in the generated images.

By introducing an innovative ID embedding module and decoupling strategy, UniPortrait can flexibly edit and generate high-fidelity personalized images while maintaining facial shape and texture details.

Mixed identity issues

In multi-person image generation, traditional methods are prone to identity mixing, that is, the same generated face may contain multiple identity features at the same time. This will lead to ambiguity and inconsistency of identities in the generated images.

Through its ID routing module, UniPortrait adopts an adaptive identity assignment strategy to ensure that each facial region only receives specific identity information, thereby avoiding identity mixing and improving the identity fidelity of the image.

Layout and hint constraints for generated images

Many existing methods require users to follow a specific format when entering prompts, and often require the layout of the generated images to be pre-set, which limits the user’s creative freedom.

UniPortrait supports free-form text description input, breaking this limitation and allowing more flexible and diverse image generation.

Key Features

Personalized generation of single and multi-person images

  • Single-person image personalization: UniPortrait can generate personalized images of single people, ensuring that the generated image is highly consistent with the facial identity information in the reference image. It supports extensive editing of images while maintaining facial details such as shape and texture. This enables users to generate personalized portrait images with a specific style, expression or background based on text descriptions. For example, you can upload a photo of yourself and then generate a photo wearing different clothes or in a different scene through text description, but the person is still you.
  • Personalization of multi-person images: UniPortrait is not only able to process single-person images, but also to generate personalized images in multi-person scenarios. In this case, UniPortrait can ensure that each facial region only receives the corresponding identity information, avoiding the problem of multiple identities being mixed on the same face. This allows users to generate images containing multiple personalized characters, and the identity of each character can be accurately maintained and distinguished. For example, you can upload several photos of your friends, and then generate a group photo of everyone together against different backgrounds, and everyone’s facial features can be accurately preserved without confusion.

High-fidelity identity preservation:

  • Identity Embedding Module: This module extracts editable high-fidelity facial features through a decoupling strategy and embeds these features into the context space of the diffusion model. It is able to capture detailed facial structure information while maintaining the ability to edit facial identities.
  • By introducing an innovative ID embedding module, UniPortrait is able to highly preserve facial features, including key details such as face shape and skin texture, thereby generating images that are highly consistent with the original identity. This is especially important for generating portrait images that need to maintain realism (such as personal photos and portraits).

Multi-reference image fusion

UniPortrait supports extracting identity features from multiple reference images and fusing them to generate more representative and high-fidelity personalized images. This is particularly useful when dealing with identities that require the integration of multiple angles or expressions. That is, you can upload multiple photos of the same person, and UniPortrait will combine the characteristics of these photos to generate more accurate and realistic personalized photos.

High Fidelity and Editability

  • Through its unique architecture and training methods, UniPortrait is able to provide rich facial editing capabilities, such as modification of facial expressions, poses, etc., while maintaining the authenticity of facial identity.
  • Free Editing:UniPortrait supports free-form text input, allowing users to freely define the content, style, and layout of generated images through descriptive text without having to follow a specific format or preset. This feature greatly enhances the user’s creative freedom and the diversity of image generation.
  • Face editability: UniPortrait allows users to make a variety of edits to the faces of people in the image through text prompts, such as changing expressions, adding accessories, adjusting age, etc. Despite these edits, the system is still able to keep the identity characteristics of the people unchanged, which provides great flexibility for personalized image generation.
  • Identity Gradient: UniPortrait supports interpolation between different identities, generating images that gradually transition from one identity to another through linear interpolation. This is very helpful for generating composite images with multiple identity features or exploring the gradient effect of identity features. You can use UniPortrait to perform gradient operations between two characters, such as generating a transitional photo that has both your features and those of your friend, which is very interesting.
  • Diverse layouts: UniPortrait can generate photos with different layouts based on your description. Whether you want single-person or multi-person photos, it can satisfy you, and the generated photos have rich layouts and content.

Compatibility with existing build control tools

UniPortrait has good extensibility and is compatible with existing generation control tools such as ControlNet and IP-Adapter, which makes it more flexible in generating controllable personalized images, allowing you to more precisely control the details in the photo, such as adjusting the person’s posture or background.

Multiple application scenarios

UniPortrait can not only generate high-fidelity personalized portraits, but can also be used in a variety of scenarios such as facial feature modification, identity interpolation (smooth transition between multiple identities), and stylized generation of multi-identity images.

Text To Single ID

Text To Multi-Identity

……

For more info ↓

More about AI: https://kcgod.com

👍🏼Customize your server to fit your exact needs

--

--