Imagine a world where customer support feels more human than ever before, where digital avatars respond with empathy, understanding, and a touch of personality. D-ID has turned this imaginative vision into a reality, harnessing the magic of generative AI and the capabilities of Azure OpenAI Service.
Through their innovative chat.D-ID app, built using core Azure components, D-ID lets companies combine personalized and realistic digital avatars, putting a human face on support, account management, sales enablement, agents, and more for some of today’s top companies, including MyHeritage, Homa Games, and BurdaForward.
Making this all happen instantly and seamlessly for the user isn’t simple, but thanks to easily integrated Azure components, D-ID was able to develop their platform faster, saving 42% of development time. And with Azure Cloud’s scalability, D-ID was able to handle more than 750,000 users in their first 3 months alone, with thousands of new users added daily. Let’s see how Azure has helped D-ID build their platform quickly and operate it at scale.
About D-ID: Pioneering Generative AI since 2017
As a pioneer in generative AI-based products since 2017, D-ID has been at the forefront of avatar technology long before it became known as generative AI. To accelerate development and leverage the benefits of Azure services, D-ID joined the Microsoft for Startups Founders Hub, which provides startups with free resources like Azure credits and extensive support. In September 2021, D-ID released its self-service avatar-creation platform, Creative Reality™ Studio, which quickly gained traction and reached millions of users within six months.
With early consumer-facing customers on board, meeting customer SLAs was crucial, so D-ID had to choose a powerful and reliable framework on which to build the AI portion of their platform. After considering alternatives, they chose to build D-ID’s text-to-speech capabilities using Azure Cognitive Services.
The D-ID Solution: Revolutionizing Customer Experience with Azure OpenAI
The potential uses for AI-based chat with video avatars are endless. Any customer experience interaction, such as technical support, sales calls, learning and development, entertainment, and more, can benefit from this technology—essentially providing a new way to interface with any human-facing application.
Most academic researchers agree: The most useful digital avatars for providing effective, personalized service that augments the existing workforce and reduces costs are those that capture both the look and behavior of an actual human agent. In addition, a recent McKinsey report estimates that generative AI could potentially deliver up to $1 trillion of additional value each year in global banking alone, in part, through revamped customer service; generative AI improves the customer experience, reduces costs, and increases sales—boosting value over the entire customer lifetime.
But connecting conversational AI, powered by a large language model (LLM), to human faces demands advanced image processing and deep learning algorithms to create realistic and convincing facial expressions and movement. This takes significant computing power and machine learning techniques to analyze human behaviors and facial motor movement.
To future-proof their company and ensure they were able to realize the growth they sought, D-ID needed to build their platform around two rock-solid components:
- High Availability & Low Latency: Today’s LLMs-as-a-service are often unreliable. To create a viable offering, D-ID needed an AI that was lightning fast and offered the reliability and uptime to meet their customers’ SLAs.
- Text-to-speech. D-ID also needed a broad variety of voices and language options to appeal to enterprises and end users all over the world, along with a range of options for customization and localization.
By taking advantage of Microsoft for Startups Founders Hub, D-ID was able to achieve both of their goals using Azure components.
About the Azure Services Featured
As part of the Microsoft for Startups Founders Hub, D-ID’s team received access to Azure credits, support, technical enablement, and close partnership. This allowed them to build their infrastructure around industry-leading Azure components, speeding development time while allowing them to reap the benefits of features like cutting-edge AI.
Two services from Azure Cognitive Services comprise the core of D-ID’s platform.
- Azure OpenAI Service: An Azure-managed service, this provides access to state-of-the-art machine learning tools and algorithms, including ChatGPT. It gives D-ID generative AI capabilities without the hassle of establishing infrastructure and performing maintenance along with early preview access to GPT4 to provide more accurate results based on more sophisticated reasoning and stronger safeguards. With the REST API, Azure OpenAI Service integrates easily into existing and custom components for a seamless generative AI experience. Plus, Azure OpenAI Service includes tools and services for data analysis to help develop and improve AI models.
- Azure Text-to-Speech: This service brings text to life with over 460 natural sounding neural voices available in over 140 languages. Choosing Azure TTS has given D-ID the flexibility to choose prebuilt voices or create unique custom neural voices. The TTS component was especially critical. According to Or Gorodissky, D-ID’s vice-president of research and development, “We tested a variety of TTS platforms for both quality and variety, and we chose Azure Cognitive Services, as it provided the solution we needed for both.”
The Power of Azure OpenAI Service
D-ID’s solution goes beyond simple chatbot functionality. It incorporates Azure OpenAI Service as its large language model (LLM) and Azure TTS as its speech-generation core to create a more natural conversational experience for the user.
Here are the steps involved in the conversation process:
- The user sends a chat message to the D-ID chat platform (frontend).
- The D-ID platform forwards the message to the LLM (Azure OpenAI).
- Azure OpenAI processes the request and provides the answer to the D-ID backend.
- The D-ID platform sends the answer to Azure TTS.
- Azure TTS returns the audio to the D-ID backend.
- The D-ID backend combines the text and audio into a complete animation. Proprietary animation technology matches the audio input to the corresponding facial expression and movement, creating a realistic video in real-time of a speaking avatar.
- The D-ID streaming layer then sends the animation to the user via the D-ID chat platform (frontend).
Because users are notoriously impatient, an interface designed to improve the user experience must deliver results that are both as helpful as those they’d receive from a human agent and at lightning speed to rival hyper-efficient chatbots.
Here’s a simplified diagram to demonstrate this process:
Thanks to support from Microsoft for Startups Founders Hub, the D-ID team had the support and assistance they needed to deploy this solution using cutting-edge Azure components, achieving far better results than they could have working alone.
“Azure was critical to reducing latency and for providing a variety of voices. No other provider could have enabled us to ensure the experience our customers expect.”
Or Gorodissky, Vice-President, Research and Development, D-ID
Benefits of Azure Components for D-ID
Integrating Azure components while leveraging other benefits of the Microsoft for Startups Founders Hub, such as a dedicated point person for personalized support to get up and running, has delivered a number of concrete development and business benefits to D-ID’s team so far, including:
- Plug and play components. Azure OpenAI was simple to connect using the REST API and worked seamlessly to meet expectations along with SLAs. The actual transition from the previous LLM provider to the Azure OpenAI service was accomplished in less than one day.
- 42% faster development. With ready-to-go components like Azure OpenAI and Azure TTS, D-ID was up and running with Azure Cognitive Services within seven weeks, saving months of development work.
- Scalability. Because Azure Cognitive Services was built on Microsoft Azure Cloud, D-ID was able to handle more than 750,000 users in its first 3 months alone, with thousands of new users added daily, totaling millions of chat sessions, with little extra effort or maintenance. Azure OpenAI’s scalability gives D-ID near-infinite expandability and global availability for greater efficiency to handle these extensive compute resource needs.
- High uptime. Azure Cognitive Services’ five-nines reliability provides high uptime and low latency, meaning D-ID can be confident in meeting its own customer SLAs.
- Faster AI. Up to 2.2x faster processing using Azure OpenAI as compared to the open-source OpenAI offering. And increased processing power and improved data throughput results in reduced latency.
Azure OpenAI Service – Powering the Future of Customer Engagement
D-ID’s success story exemplifies the transformative potential of Azure OpenAI Service in revolutionizing customer engagement. By combining hyper-realistic avatars with generative AI, D-ID has redefined how companies interact with their customers. With Azure OpenAI Service, startups like D-ID can build their platforms quickly, achieve scalability, and provide unparalleled customer experiences. Embracing Azure technology can empower startups to shape the future of customer engagement, delivering exceptional value and innovation to their businesses.
Microsoft for Startups Founders Hub members receive Azure cloud credits that can be used toward Azure OpenAI Service or OpenAI to help build their product. Sign up now.