{"id":1002646,"date":"2021-07-30T12:35:05","date_gmt":"2021-07-30T16:35:05","guid":{"rendered":"http:\/\/875c638a55c13712f0f4f9a72b0735e1aea3d0fe"},"modified":"2021-07-30T12:35:05","modified_gmt":"2021-07-30T16:35:05","slug":"analyze-customer-churn-probability-using-call-transcription-and-customer-profiles-with-amazon-sagemaker","status":"publish","type":"station","link":"https:\/\/platodata.io\/plato-data\/analyze-customer-churn-probability-using-call-transcription-and-customer-profiles-with-amazon-sagemaker\/","title":{"rendered":"Analyze customer churn probability using call transcription and customer profiles with Amazon SageMaker"},"content":{"rendered":"\n

Regardless of the industry or product, customers are the most important component in a business\u2019s success and growth. Businesses go to great lengths to acquire and more importantly retain their existing customers. Customer satisfaction links directly to revenue growth, business credibility, and reputation. These are all key factors in a sustainable and long-term business growth strategy.<\/p>\n

Given the marketing and operational costs of customer acquisition and satisfaction, and how costly losing a customer to a competitor can be, generally it\u2019s less costly to retain new customers. Therefore, it\u2019s crucial for businesses to understand why and when a customer might stop using their services or switch to a competitor, so they can take proactive measures by providing incentives or offering upgrades for new packages that could encourage the customer to stay with the business.<\/p>\n

Customer service interactions provide invaluable insight into the customer\u2019s opinion about the business and its services, and can be used, in addition to other quantitative factors, to enable the business to better understand the sentiment and trends of customer conversations and to identify crucial company and product feedback. Customer churn prediction using machine learning (ML) techniques can be a powerful tool for customer service and care.<\/p>\n

In this post, we walk you through the process of training and deploying a churn prediction model on Amazon SageMaker<\/a> that uses Hugging Face Transformers<\/a> to find useful signals in customer-agent call transcriptions. In addition to textual inputs, we show you how to incorporate other types of data, such as numerical and categorical features in order to predict customer churn.<\/p>\n

Prerequisites<\/h2>\n

To try out the solution in your own account, make sure that you have the following in place:<\/p>\n

<\/a>The JumpStart solution launch creates the resources properly set up and configured to successfully run the solution.<\/p>\n

Architecture overview<\/h2>\n

In this solution, we focus on SageMaker components. We use SageMaker training jobs to train the churn prediction model and a SageMaker endpoint to deploy the model. We use Amazon Simple Storage Service<\/a> (Amazon S3) to store the training data and model artifacts, and Amazon CloudWatch<\/a> to log training and endpoint outputs. The following figure illustrates the architecture for the solution.<\/p>\n

<\/a><\/p>\n

Exploring the data<\/h2>\n

In this post, we use a mobile operator\u2019s historical records of which customers ended up churning and which continued using the service. The data also includes transcriptions of the latest phone call conversations between the customer and the agent (which could also be the streaming transcription as the call is happening). We can use this historical information to train an ML classifier model, which we can then use to predict the probability of customer churn based on the customer\u2019s profile information and the content of the phone call transcription. We create a SageMaker endpoint to make real-time predictions using the model and provide more insight to customer service agents as they handle customer phone calls.<\/p>\n

The dataset we use is synthetically generated and available under the CC BY 4.0 license. The data used to generate the numerical and categorical features is based on the public dataset KDD Cup 2009: Customer relationship prediction<\/a>. We have generated over 50,000 samples and randomly split the data into 45,000 samples for training and 5,000 samples for testing. In addition, the phone conversation transcripts were synthetically generated using the GPT2 (Generative Pre-trained Transformer 2<\/a>) algorithm. The data is hosted on Amazon S3.<\/p>\n

More details on customer churn classification models using similar data, and also step-by-step instructions on how to build a binary classifier model using similar data, can be found in the blog post Predicting Customer Churn with Amazon Machine Learning<\/a>. That post is focused more on binary classification using the tabular data. This blog post approaches this problem from a different perspective, and brings in natural language processing (NLP) by processing the context of agent-customer phone conversations.<\/p>\n

The following are the attributes (features) of the customer profiles dataset:<\/p>\n