The application let users to track their diet. With the paid model a coach is assigned to monitor paid users progress and make improvement. Within the data-driven organization, data anlysis is critical. Based on analysis result, the team modify their market rebalancing, targeted promotion to obtain or maintain VIP users, and performance analysis. We are going to start it with data processing.
The chances are we are not going to need all the columns in the data. The columns selected for this one are:
It's just an aesthetic, but we don't need gender to be capitalized. Going to lowercase it.
We can tell women are way more involved in this application then men.
This is weird result because there is no one in the world whose height is -1 cm. The chances are that they probably entered the value when the users left out the information when they created their profiles. So, the mean here is likely distorted. We are going to change the data whose value is -1 cm in Heigh to NaN.
We can see the mean value changed. Further investigate among men and women.
It's resonable to assume that no infant would join the diet service and no one in the service is age of 173 since no one has been reported to live that long. Going to clean it to nan. Also from the data it's reasonable to think most of the users are in their 20 and 30s looking at the percentile data. We are going to convert data whose age is more than 60 to NaN
We are going to identify two kind of user groups.
Among paid users, VIP users
The columns we are going to user are as following.
BMI is commonly used in masuring body composition even though there are criticism that it simplifies it too much. However for the sake of simplicity, we are going to use this to identifies those two groups that we wanted to identify.
The new columns to be added are:
Since we are only considering about paid users, we are going to exclude unpaid users which we can identify it by Status. Paid users' status is completed
The rules we are going to use to identify them are: 1. NaN in Age, Height, Initial Weight, Lowest Weight, or Target Weight 2. Too low (less than 140cm) or high (more than 200)height 3. Too low (less than 18.5) or too high (30.0) BMI 4. Weight Loss(Goal) negative (usually people want to loss weight not gain)
We can send notification to the users who are indenfitied as entered invalid data.
To find the VIP users we are going use the following rule.
Those users could be VIP users and be targeted with promotions.
At the end of the profitability is the most important as business entity. We are going to analyze who are the most profittable demographics to find out:
First we are going to age group.
The result of the analysis is:
Now we are going to see if there's difference in conversion ratio time or day. If there is we can focus more on those time frame.
There doesn't seem like any patterns with what time they purchased the service other than around midnight purchases spike up. It might be a good time to focus more on.
Again there don't seem to have any patterns. However, more people total number of purchase made Monday ~ Thursday then Friday ~ Sunday. Monday to Thursday might be a good time to do more targeted marketing.
The result shows that facebook is the number one channel where users come from.
This could be useful and very sensitive too as it shows the coaches performance directly. However it's important to know that who are doing well and who are not. If we can provide better training to underperformed one we can increase the performance or bench mark overperformed ones.
Finally conversion and cancellation rate. If a coach has high conversion rate, he or she is doing something well, if a coach has high cancellation rate, he or she is not doing so well. We will find the top performing ones and bottom performing ones, so that we can give right feedback. To be fair, who has less than 100 trainees, we will exclude from the result.
Now we know how well they are performing to give the right feedback.