Show HN: Fine-Tuning AI Agents for Customer Support

Did you know that 70% of customers expect a response within five minutes when reaching out for support? As customer expectations continue to rise, businesses are turning to AI agents to bridge the gap. AI agents for customer support have the potential to revolutionize how businesses interact with their customers by enhancing efficiency, reducing costs, and boosting satisfaction. This guide delves into the intricacies of fine-tuning AI agents to ensure they deliver optimal performance and measurable ROI.The Current State of Customer Support: Challenges and OpportunitiesTraditional customer support faces several pain points, including long wait times, inconsistent responses, and a lack of personalization. With customers demanding 24/7 support and seamless experiences, existing models often fall short. Miscommunications and uninformed support agents further exacerbate these issues, highlighting the need for innovative solutions like AI agents to meet evolving customer expectations.Limitations of AI Agents for Customer Support?While AI agents offer numerous benefits, generic solutions often fall short in delivering exceptional customer support. These agents can struggle with complex or unique queries that fall outside their programming, leading to inaccurate or irrelevant responses.A key limitation is the lack of human empathy and emotional intelligence, which are crucial for understanding and responding appropriately to customer sentiment. For example, a DPD chatbot swore at a customer and created poems criticizing the company after the customer became frustrated with its inability to provide parcel information or connect them to a human. Similarly, an Air Canada chatbot provided a customer with incorrect refund information, which the airline initially refused to honor until a tribunal ruled in favor of the customer. Generic AI agents may also provide impersonal interactions, potentially frustrating customers.McDonald’s AI has hilariously gotten orders wrong, such as offering a customer butter and ketchup when they tried to order plain vanilla ice cream. Furthermore, relying on third-party AI agents can raise data security and control issues, especially when handling sensitive customer information. The effectiveness of AI agents depends on the quality and accuracy of the training data; biases or errors in the data can lead to unfair or incorrect responses. For instance, Cursor, an AI-powered code editor, had an AI support bot that falsely stated the company had a policy limiting users to a single device, resulting in confusion and subscription cancellations.Fine-Tuning AI Agents for Customer SupportCustomers expect personalized and consistent communication, making it imperative for brands to ensure their AI agents reflect their unique voice and tone. Fine-tuning AI models is a strategic approach that aligns automated interactions with a brand’s specific communication style and personality, ensuring that AI-driven customer support resonates with the audience and maintains brand consistency across all channels. By adapting pre-trained language models with proprietary data, such as customer service transcripts and marketing materials, businesses can create more meaningful engagements, improve customer satisfaction, and enhance brand perceptionIn this article, we will show step-by-step how to fine-tune an LLM on multi-turn conversations focused primarily on empathic interaction.Step-by-Step Guide1. Data Collection and PreprocessingSuccessful AI agents begin with extensive data collection from sources like CRM systems, help desks, and social media. This data is then cleaned and prepared for training, ensuring the AI model has a solid foundation. For our tutorial, we are going to use this publicly available data from Huggingface: https://huggingface.co/datasets/Salesforce/dialogstudio/tree/main/open_domain/EmpatheticTo prepare the data for instruction-tuning, the first step is to create a CSV file containing System prompt, User prompt, Input (containing dialog history in this case), and Response:2. Model Fine-TuningThis process, known as supervised fine-tuning, involves training the model on a human-labeled dataset where the desired outputs are predefined. In this instance, the training data consists of pairs of user queries and correct answers.During supervised fine-tuning, the model learns to map the input (the customer query) to the correct output (the response), adjusting its internal parameters to minimize errors. This method ensures that the model not only understands the context of the questions but also aligns its responses with the company’s specific guidelines and tone. Additionally, aligning the AI agent’s tone and brand voice ensures consistency in customer interactions.One of the most popular and efficient methods for supervised fine-tuning is LoRA (Low-Rank Adaptation). This technique allows for fine-tuning large language models with significantly fewer parameters, making the process faster and more cost-effective.LoRA works by freezing the majority of the pre-trained model’s parameters and adding small, trainable low-rank matrices to certain layers. During fine-tuning, only these low-rank matrices are updated, while the rest of the model remains unchanged. This approach reduces the number of parameters that need to be trained, resulting in lower memory usage and faster training times.We are going to use the UbiAI platform to fine-tune our LLM since it provides a no-code solution to easily load our data and train our model.3. Model EvaluationEvaluation of LLMs typically focuses on two main aspects: accuracy and robustness. Accuracy measures how well the model performs on specific tasks, while robustness assesses its ability to handle edge cases and adversarial inputs. Common evaluation metrics for LLMs include:ROUGE (Recall-Oriented Understudy for Gisting Evaluation): This metric is widely used for evaluating summarization and translation tasks. It measures the overlap of n-grams (contiguous sequences of n items) between the model’s output and reference text. ROUGE-N, for example, calculates the recall of n-grams, while ROUGE-L considers the longest common subsequence, which helps evaluate the fluency and coherence of the generated text.BLEU (Bilingual Evaluation Understudy): Originally developed for machine translation, BLEU measures the precision of n-grams in the model’s output compared to reference translations. It uses a modified precision score that penalizes short outputs (via a brevity penalty), making it effective for tasks where exact phrasing matters. BLEU is particularly useful for evaluating tasks that require fidelity to a specific format or style, such as technical writing or formal correspondence.Using UbiAI, we immediately get the evaluation scores after training on a test dataset. The platform automatically splits the data into a training and a test set.The score we are getting is 0.23 on ROUGE-L, which means that only 23% of the longest matching sequences of words between the generated response and the reference text overlap. A score of 0.23 indicates that there are some common sequences, but they are relatively short and sparse, suggesting that while there is some overlap, there is still some room for improvement to get responses that exactly matches the tone of the training data.4. Deployment and MonitoringOnce the model is fine-tuned, we can now deploy it as an AI agent with a simple API call using the UbiAI platform.To create our AI agent customer support, we are going to use the crewAI framework to integrate our fine-tuned LLM. For more details on how to create custom LLM in crewAI, checkout this documentation.Evaluating the fine-tuned AI agentNext, we evaluate the agent in a real multi-turn conversation engaging with a user who is looking for a dinner recipe. We observe how the agent handles multiple turns of dialogue, responds to clarifications, and maintains context throughout the interaction. This evaluation allows us to assess the agent’s ability to provide relevant information, adapt to user preferences, and handle follow-up questions, which are crucial for a successful conversational experience.Here is an example of a multi-turn conversation before fine-tuning:Now, let’s test using the same user queries after fine-tuning:The first observation is that before fine-tuning, the model’s responses were largely generic and unresponsive to the user’s specific needs. It often provided cookie-cutter answers without trying to understand the underlying problem first, even after multiple interactions where the user attempted to clarify their issue. This pattern indicated a lack of understanding on the model’s part, as it failed to adapt its responses to the user’s situation.However, after fine-tuning, there was a noticeable shift in the model’s approach. It became more inquisitive, actively seeking to understand the problem at hand. Rather than jumping to conclusions, the model began to ask relevant questions, demonstrating a focus on resolving the issue through a more interactive and user-centered dialogue. This transformation highlights the importance of adaptability and engagement in AI systems, as the ability to understand and respond to user needs is crucial for effective problem-solving.ConclusionFine-tuning AI models allows for accurate interpretation of customer queries, enabling effective and personalized responses that drive customer satisfaction. By training AI on your company’s specific communication style, you ensure a consistent voice across all touchpoints, building trust and loyalty.This consistency not only enhances the customer experience but also reduces operational costs by minimizing the need for extensive human intervention. When AI systems are finely tuned to your business’s unique needs, they handle a higher volume of inquiries with precision, allowing human agents to focus on complex issues that require a personal touch. The result is a more efficient workflow, faster response times, and ultimately, a significant boost to your bottom line.Get started today at ubiai.tools.