Skip to content

Text-to-image generation using Stack-GAN model which generates accurate images based on text description. Developed specifically for fashion domain, it provides clothing options for wide range of variety.

Notifications You must be signed in to change notification settings

vedant1100/StyleSphere

Repository files navigation

Problem Statement

Fashion Designing is a very big industry which has a lot of contribution to the world economic growth. The reason for that is majority of the world population like to wear fashionable clothes and accessories. And that is why it has become a tedious job for the fashion designers to draw interesting and attractive fashion designs which can appeal the audience. When we take a look on the available solutions we see traditional approaches of fashion designs like pen and paper method or some GUI software. This makes it hard for the designer to visualize those images and materialize those designs. There is no instant image generation method available in the market which can assist the fashion designers in fashion designing. A beginner designer would want something which is easy to use, not a complicated GUI which they would require to learn in order to use them.

Solution

To solve this problem we introduce StyleSphere, a web platform integrated with an artificial intelligence model. Using this platform, the user can simply enter the textual description of the clothing item he wishes, and our integrated AI model will generate a fashion image based on that description.

Generative AI Model:

The name of our generative ai model is StackGAN, a variant of Generative Adversarial Networks. This is a brief overview of how this model actually works. The model consists of two neural netwoks: The generator and the discriminator. The job of the generator is to generate a fake image from the text embedding vector and the discriminator tries to identify and differentiate the fake image from its corresponding real image. During the training process, the generator keeps generating fake images and tries to fool the discriminator by creating more realistic images. This process continues till the discriminator is unable to differentiate the two classes of image. At this point our model gets trained.

Mid Training Images:

10th Epoch

image

80th Epoch

image

So to give help you understand the training process visually, these are the images generated by the generator in the middle of the training process. As you can see the generator is trying to extract features like facial structures, body structures and the clothing structures. But these features are quite distored during the initial stage of training. As the process continues, the generator becomes good at it and you get these images, where generator becomes a little better at identifying features. At the end of the training epochs we get near-realistic images like these, where generator becomes a master of image generation.

Final Result

image image

Implementation Details:

We used DeepFashion dataset which contains Image-Caption Pairs for clothing items. After data pre-processing, we utilized pre-trained OpenAI CLIP model and fine-tuned it on the DeepFashion dataset to generate text embeddings for the captions. Finally, this data was fed to the StackGAN model to generate clothing images using the images as well as their text embeddings. For more information on the implementation details please refer to our research paper.

About

Text-to-image generation using Stack-GAN model which generates accurate images based on text description. Developed specifically for fashion domain, it provides clothing options for wide range of variety.

Topics

Resources

Stars

Watchers

Forks