Here at Wayfair, we're passionate about creating a seamless and enjoyable shopping experience for every customer. And that experience goes beyond just browsing our vast selection of home goods. This commitment to the customer experience continues even when personalized help is needed – whether they have a quick question about a specific product or want to collaborate with one of our agents to find the perfect pieces for their home. To elevate the service we provide to our customers and empower our sales agents, we've developed an innovative experience powered by cutting-edge generative AI: Wayfair's Agent Co-pilot.
Unlike rule-based robo-chats, Agent Co-pilot is a sophisticated AI system that works side-by-side with our digital sales agents. While agents are chatting with customers, Co-pilot provides live, contextually relevant chat response recommendations. Co-pilot considers product information, Wayfair policies, and the ongoing conversation history in order to craft useful response suggestions. This level of contextual understanding allows Co-pilot to provide truly valuable support, helping agents quickly and confidently address customer needs, and save them time by immediately surfacing information they would typically have to hunt down.
Wayfair's Agent Co-Pilot Interface
Under the Hood of Agent Co-pilot
Let's delve into how Co-pilot works and how it's transforming the way we interact with our valued customers.
Anatomy of an Agent Co-pilot Prompt
Co-pilot's begins with a carefully crafted "prompt" – a set of instructions and information that guides the LLM in generating appropriate responses. The prompt consists of several crucial components:
- Task Description: This clearly defines what the LLM should do, such as "provide product information," "clarify a return policy," or "suggest alternative products."
- Guidelines: These outline internal rules and processes agents need to follow during customer interactions, ensuring consistency and adherence to the highest standards of service.
- Policies: This provides the LLM with up-to-date information about Wayfair's internal policies related to shipping, returns, assembly services, and more.
- Product Information: When a customer inquires about a specific product, the relevant product data is included in the prompt, empowering Co-pilot to answer questions accurately.
- Conversation History: Co-pilot isn't limited to single-turn interactions. It analyzes the entire conversation history to provide contextually relevant suggestions, ensuring a smooth and natural flow of communication.
The Power of Large Language Models (LLMs)
At the core of Co-pilot lies the power of Large Language Models (LLMs). These AI models have been trained on a massive amount of data, allowing them to understand and generate human-like text with impressive accuracy. Co-pilot leverages the power of LLMs to analyze customer messages, understand their intent, and provide sales agents with a range of helpful responses.
From Prompt to Response
Once the prompt is formulated, it's fed into the LLM, which predicts the most likely response. Based on the patterns and relationships it learned during its extensive training on massive internet-scale text data, the LLM assigns probabilities to different possible tokens that could follow the sequence present in the prompt. The token with the highest probability is selected, and this process is repeated iteratively, adding one token at a time, until a complete and contextually relevant response is generated. This response is then presented to the sales agent where it might be sent to the customer as is, or modified to further refine its accuracy and suitability for the conversation.
Measuring Response Quality
We are constantly monitoring the performance of the Co-pilot. We leverage both quantitative and qualitative data to understand its strengths and areas for improvement. Our team regularly reviews agent feedback, analyzes conversation logs, and even employs a second LLM – a "QA LLM" – to assess the quality of Co-pilot's responses. We're constantly refining the system to provide the best possible support to our agents and customers.
Here's a breakdown of the quality metrics we use to assess Co-pilot's performance:
- Prompt Instruction Adherence: We track how closely Co-pilot’s response adheres to the specific instructions provided in its prompt. These instructions can range from specifying the desired length of a response to requiring the use of certain predefined greetings or closing templates. Looking at the fraction of rule breaks over time, we gain insight into common failure modes and the stability of our response recommendation system.
- Factuality: We evaluate the accuracy of the information provided by Co-pilot, checking for factual errors or hallucinations. This includes verifying the correctness of product information, policy details, and other relevant data. Our goal is to ensure that Co-pilot provides accurate and reliable information to agents, minimizing the need for corrections and enhancing trust in its recommendations.
- Edit Reason: We track the reasons why agents choose to edit Co-pilot's responses. This metric is crucial for understanding the specific areas where Co-pilot is falling short, allowing us to prioritize improvements. Common edit reasons include stylistic changes, missing product information, policy adherence, and ensuring data correctness.
- Message Purpose: We analyze the intended purpose of Co-pilot's responses, categorizing them based on tasks like answering customer questions, providing product information, offering alternative products, or guiding the customer along the buying path. By comparing the distribution of Co-pilot’s message purpose with the Agent’s we can understand changes in behavior over time and identify areas for improvement in co-pilot response quality.
Measuring Product Success
We aim to become more efficient in answering customer questions by reducing the average handle time (AHT) of our agents. In addition to AHT, we are also tracking order conversion rate, adoption rate (both contact-level and response-level), and the edit distance between our recommended responses and the text sent by the agent (for example, by calculating
Levenshtein Distance, or the total number of characters inserted, deleted and substituted to change the message).
Initial test results of Co-pilot are very encouraging and show a 10% reduction in handle time, allowing our customers to be served more quickly. From these results, we have also identified a number of promising improvement areas to enrich the conversational context to co-pilot further. By continually monitoring these metrics and making adjustments, we will ensure Co-pilot remains valuable for both customers and sales agents, playing a crucial role in the future of Wayfair’s live sales assistance.
The Future of Agent Co-pilot
We're excited about the future of Co-pilot and its potential to further enhance the customer experience at Wayfair. Here are some of the exciting developments on the horizon:
- Retrieval Augmented Generation (RAG): We're exploring ways to enhance Co-pilot's contextual understanding by leveraging RAG techniques. This involves connecting the LLM to a vast database of Wayfair data, allowing it to access real-time information about product reviews, internal policies, customer preferences, and more.
- Fine-tuning for high quality conversations: We're also working on fine-tuning Co-pilot's language model to better mimic the tone, style, and salesmanship of our top-performing agents. This will enable the system to provide even more engaging and persuasive responses, ultimately driving customer satisfaction and sales.
Agent Co-pilot represents our commitment to innovation and our dedication to providing the best possible service to our customers. By harnessing the power of generative AI, we're empowering our sales agents to create truly remarkable shopping experiences. As Co-pilot continues to learn and evolve, we're confident that it will play an increasingly important role in shaping the future of live shopping assistance at Wayfair.