Last fall, a couple of my colleagues (Kristiane Skiolmen, Scott Lau) and I presented Change’s machine learning email optimization approach as a lecture in Stanford’s Human Computer Interaction Seminar for CS grad students.
The video gives an overview of how Change.org uses email to drive petition engagement from the business and social perspective to the specific technical optimization we made. It starts with an overview of Change and examples of petitions that have literally improved and saved lives.
As of the date of the video, here are some stats we presented:
- 77M total users globally
- 1.2M users visiting the site daily
- 450M signatures total
- 10K declared victories in over 120 countries
Our most successful source of engaging users to sign petitions is email. It’s not an ideal channel and we know that and want to change that. Still since it drives the most response at this time we did take steps to optimize that channel with machine learning. I’m sharing this video and a little about the project so you can see a real world application of machine learning. Below are a couple summary points from the video.
We have an email team that specializes in helping put petitions in front of users who would connect with them. We have groups that define certain petitions to showcase every week through email and the email team was using a cause (topic) filtering model to determine what petitions to send users. It was a manual process of tagging petitions to causes and comparing them to our user base that had been grouped by causes based on petitions they signed.
There are a lot of limitations with this approach from scaling for data size as well as adapting culturally and internationally. Also, the challenge with the manual approach is that some causes had much smaller audiences and lower rates of responses; thus, certain petitions were doomed to fall short of signatures because their cause had a smaller audience.
Our data team built a model to help improve email targeting. Basically, we identified over 500 features (e.g. # petitions signed in the past, etc.) that were predictive of signatures and we tried out a couple classification algorithms to come up with a predictive model to use. The accuracy scores were pretty close on the models we investigated. So we went with a random forest algorithm because we didn’t need to binarize our data, our data is unbalanced (which random forest handles well) and it was the most transparent in feature detection if we wanted to dig into the results.
How it works is each time the email team gets a set of petitions to showcase, they send emails to a sample set of users. Based on the signature response to one petition, a random forest model is developed and then all users are run through the model to predict her/his signature response to that one petition. A random forest model is built per petition the email team showcases that week and we run signature predictions on all users for each of the showcased petitions. Each random forest model produces a probability of signature response per user and then our program sorts the probabilities and identifies the petition with the highest success rate for each user (filtering out ones the user has already received in email). The email team gets back a list of users per petition to send their showcased petitions to for that week.
In the video, I go into more detail around how a random forest works as well as the way it was implemented. Also, Scott provides an overview of how we used Amazon Web Services to implement this data product.
Note there are other ways to approach this problem, but for what we needed, this solution has increased our sign to send rate by 30% which is substantial. On one petition, for example, we would have had 4% signature response out of a pool of 2M people to email, but our new approach with machine learning enabled us to target 5M users with a 16% signature response rate.
As mentioned, I don’t see email as the best communication source and even though we can and will improve on our current solution, we are working to incorporate more effective means of engagement.