February 19, 2025

Insight

Predictions & Trends: Future developments in the e-commerce industry assisted by AI - Part II

Written by Ricardo Sousa, PhD (not AI)
SeeOnMe, Chief Artificial Intelligence & Co-Founder

In our previous article, we covered the influences and developments that are shaping the fashion world. Regarding Virtual Try-On, many of the breakthroughs have been due to renewed and influential changes in Generative AI.

But what are these changes? How are they disrupting what previously failed to satisfy customer interest and adoption? Which technical innovations are enabling these advances? And what role does multi-modality play in these developments? In this article, we will attempt to answer some of these questions.

Generative AI at the Forefront

At the core of this transformation are groundbreaking new approaches that are paving the way forward. While not all limitations have been overcome, recent advances have significantly improved garment fidelity and fit accuracy — key issues that previously prevented this technology from meeting user needs.

Foundational Models and Following Generation

One aspect worth reference is the remarkable evolution of the deep learning field which will continue to play out as the the leading force of innovation. As a sub-domain of Machine Learning, the growth of foundational models architectures such as the Convolutional Neural Network (CNN) coupled with the increasing computational power provided by GPUs and large volumes of data opened the door to a set of new possibilities such as human-like performance on specific tasks or the realistic generation of garments (also known as scaling laws [1]).

Traditional VTON has been accomplished by taking an in-shop garment photo and a human fashion model based on a single pose. This was soon unlocked with the development of the first Multi-Pose generation through adversarial methods [5] and diffusion models breakthrough with its first models with UNets as backbone to generate high-definition images [6]. At this stage, developments have focused on improving the quality, removal of garment warping or scaling existing models through pre-training and data adaptation. Yet, mainstream research line continues to make its stride in diffusion models.

With this comes into play key aspects in the Machine Learning field: training and inference.

  1. Training: during this stage, where model weights are adjusted to fit the given observations, the number of times the model processes the training data remains essential. During training, diffusion models rely on a critical parameter: timesteps. Despite new sampling methods like DDIM being introduced, Denoising Diffusion Probabilistic Models (DDPM) continue to be the standard approach, though they remain time-consuming.

  2. Inference: While training might use 1000 steps, inference can often be done with fewer steps (50-100), though this can impact generation quality. To provide a realistic and useful implementation of the model, particularly from a user perspective, the number of steps is crucial. Although significantly faster than training, inference time is still considerable.

Rapid model iteration demands shorter experimentation cycles to accelerate innovation, and this principle extends to user experience. While users may tolerate brief waiting periods of 20 seconds, the challenge deepens as initial generated images often contain imperfections. This has sparked interest in developing efficient search strategies during inference to ensure optimal results before user presentation [4].

The future of virtual try-on technology is poised for dramatic transformation, following patterns we've witnessed in Large Language Models (LLMs). We anticipate new approaches that will leverage synthetically generated data combined with few-shot or even zero-shot learning techniques. This evolution will help scale up training processes and make these technologies more accessible and affordable for a broader range of applications and startups.

The established methods won't stand still either. We'll see significant streamlining of both training and inference processes, with the elimination of time-consuming stages like garment warping and person parsing. This simplification will go hand in hand with faster dataset generation and reduced inference times, making the technology more practical for real-world applications.

Perhaps most excitingly, we're on the cusp of achieving unprecedented levels of realism in generated images. Future systems will capture the subtle interplay of fabric textures and folds as they respond to human poses, while faithfully preserving individual characteristics like body shape and distinctive features such as sunspots, scars or tattoos paving the way to a personalized diffusion models. This leap in quality will make virtual try-on experiences increasingly indistinguishable from reality.

The ripple effects of these advances will extend far beyond static images. We'll see the emergence of sophisticated video generation and editing capabilities, opening new doors for creative and commercial applications. The impact on clothing simulation will be particularly profound, with potential to revolutionize both the textile industry and gaming sector. From virtual prototyping in fashion design to enhanced character customization in video games, these technologies will reshape how we interact with and experience clothing in digital spaces.

Operating Costs

Ongoing developments are downplaying previous concerns of the applicability of this technology. However, other factors are coming into play. In spite of the very promissory results, training and serving these models are becoming a hurdle that only very few are being able to overcome [2]. Diffusion models are costly and data-intensive, with training on large datasets taking several days to weeks. For example, training DDPM on eight V100 GPUs for the LSUN-Bedroom dataset can take up to two weeks. For that, it will be key to implement techniques that can save time during training. One such approach is few-shot or even zero-shot learning, which reduces the data requirements and training time [3].

While diffusion models excel at capturing dataset patterns, their performance can be enhanced through additional computational steps during inference, particularly by optimizing noise patterns in the sampling process ([4] in pre-publication at the date of this article release). However, this improvement comes at the cost of increased computational demands, creating a delicate balance between result quality and user experience constraints.

Algorithmic innovation will evolve in parallel with computing capabilities, as new methods emerge and gain widespread adoption. While premium GPU hardware remains accessible to select users, the market will increasingly shift toward cost-effective yet capable solutions, accompanied by advances in GPU-optimized numerical techniques that maximize computational efficiency.

Success in this domain will hinge on two key factors: delivering a high-quality foundational model that sets new standards for image generation, while simultaneously developing an efficient, scalable platform that can maintain performance at scale. Companies that achieve both elements will establish a significant competitive advantage in the market.

Final Remark

The journey toward mastering realistic human image generation presents an intricate challenge, particularly in achieving precise correspondence with real-world measurements. While emerging computing capabilities will help address these challenges, they also bring increased focus on sustainability and business profitability. The substantial computing requirements and vast data needs for training competitive diffusion models have historically limited accessibility, and as we look ahead, similar constraints will emerge at inference time, representing the next frontier in democratizing this technology.

References

Sources used to support the content in this article.

[1] Kaplan, Jared, et al. "Scaling laws for neural language models." *arXiv preprint arXiv:2001.08361* (2020).

[2] https://petewarden.com/2023/09/10/why-nvidias-ai-supremacy-is-only-temporary/

[3] Wang, Zhendong, et al. "Patch diffusion: Faster and more data-efficient training of diffusion models." *Advances in neural information processing systems* 36 (2024).

[4] Ma, Nanye, et al. "Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps." *arXiv preprint arXiv:2501.09732* (2025).

[5] Dong, Haoye, et al. "Towards multi-pose guided virtual try-on network." *Proceedings of the IEEE/CVF international conference on computer vision*. 2019.

[6] Choi, Seunghwan, et al. "Viton-hd: High-resolution virtual try-on via misalignment-aware normalization." *Proceedings of the IEEE/CVF conference on computer vision and pattern recognition*. 2021.

SeeOnMe’s AI-powered API enables customers to visualize apparel on themselves while shopping online—enhancing engagement, minimizing returns, and increasing sales.

©2024 SeeOnMe LLC. All rights reserved. SeeOnMe, the SeeOnMe logo, and “Goodbye models, hello me.” are trademarks or registered trademarks of SeeOnMe LLC in the United States and/or other countries. Unauthorized use is strictly prohibited. All other trademarks mentioned are the property of their respective owners. This website and its content—including text, graphics, logos, and software—are the property of SeeOnMe LLC and protected by United States and international copyright laws. Reproduction, distribution, or transmission of any content without prior written consent from SeeOnMe LLC is strictly prohibited and subject to prosecution to the fullest extent of the law. SeeOnMe’s proprietary AI models, developed in-house, are protected under intellectual property rights and relevant patents, ensuring the exclusivity and innovation of our technology. For investor inquiries, reach us at spam@spam.ai. SeeOnMe images are for entertainment purposes only. The fit, color, texture, and other visual elements are AI-generated approximations and may not accurately reflect actual products. These images are not endorsed by or affiliated with any retailer, nor do they imply any partnerships. SeeOnMe’s proprietary machine learning models combine your Persona with garment photos from the retail sites you browse at the time of creation. SeeOnMe does not store or reproduce this content but generates likenesses for entertainment purposes only. For accurate product details, please visit the retailer’s website.

SeeOnMe’s AI-powered API enables customers to visualize apparel on themselves while shopping online—enhancing engagement, minimizing returns, and increasing sales.

©2024 SeeOnMe LLC. All rights reserved. SeeOnMe, the SeeOnMe logo, and “Goodbye models, hello me.” are trademarks or registered trademarks of SeeOnMe LLC in the United States and/or other countries. Unauthorized use is strictly prohibited. All other trademarks mentioned are the property of their respective owners. This website and its content—including text, graphics, logos, and software—are the property of SeeOnMe LLC and protected by United States and international copyright laws. Reproduction, distribution, or transmission of any content without prior written consent from SeeOnMe LLC is strictly prohibited and subject to prosecution to the fullest extent of the law. SeeOnMe’s proprietary AI models, developed in-house, are protected under intellectual property rights and relevant patents, ensuring the exclusivity and innovation of our technology. For investor inquiries, reach us at spam@spam.ai. SeeOnMe images are for entertainment purposes only. The fit, color, texture, and other visual elements are AI-generated approximations and may not accurately reflect actual products. These images are not endorsed by or affiliated with any retailer, nor do they imply any partnerships. SeeOnMe’s proprietary machine learning models combine your Persona with garment photos from the retail sites you browse at the time of creation. SeeOnMe does not store or reproduce this content but generates likenesses for entertainment purposes only. For accurate product details, please visit the retailer’s website.

SeeOnMe’s AI-powered API enables customers to visualize apparel on themselves while shopping online—enhancing engagement, minimizing returns, and increasing sales.

©2024 SeeOnMe LLC. All rights reserved. SeeOnMe, the SeeOnMe logo, and “Goodbye models, hello me.” are trademarks or registered trademarks of SeeOnMe LLC in the United States and/or other countries. Unauthorized use is strictly prohibited. All other trademarks mentioned are the property of their respective owners. This website and its content—including text, graphics, logos, and software—are the property of SeeOnMe LLC and protected by United States and international copyright laws. Reproduction, distribution, or transmission of any content without prior written consent from SeeOnMe LLC is strictly prohibited and subject to prosecution to the fullest extent of the law. SeeOnMe’s proprietary AI models, developed in-house, are protected under intellectual property rights and relevant patents, ensuring the exclusivity and innovation of our technology. For investor inquiries, reach us at spam@spam.ai. SeeOnMe images are for entertainment purposes only. The fit, color, texture, and other visual elements are AI-generated approximations and may not accurately reflect actual products. These images are not endorsed by or affiliated with any retailer, nor do they imply any partnerships. SeeOnMe’s proprietary machine learning models combine your Persona with garment photos from the retail sites you browse at the time of creation. SeeOnMe does not store or reproduce this content but generates likenesses for entertainment purposes only. For accurate product details, please visit the retailer’s website.