Diet Coaching Application Data Analysis

#Diet-Coaching-Application-Data-Analysis

The application let users to track their diet. With the paid model a coach is assigned to monitor paid users progress and make improvement. Within the data-driven organization, data anlysis is critical. Based on analysis result, the team modify their market rebalancing, targeted promotion to obtain or maintain VIP users, and performance analysis. We are going to start it with data processing.

Data Loading

#Data-Loading
Loading output library...

Data Processing

#Data-Processing

1. We are going to start from selecting columns

#1.-We-are-going-to-start-from-selecting-columns

The chances are we are not going to need all the columns in the data. The columns selected for this one are:

  • Access Code(Index column)
  • Name
  • Gender
  • Age
  • Height
  • Initial Weight
  • Lowest Weight
  • Target Weight
  • Status
  • Price
  • Purchased At
  • Channel
Loading output library...
Loading output library...

2. lower case.

#2.-lower-case.

It's just an aesthetic, but we don't need gender to be capitalized. Going to lowercase it.

Loading output library...
Loading output library...

We can tell women are way more involved in this application then men.

3. Height

#3.-Height
Loading output library...

This is weird result because there is no one in the world whose height is -1 cm. The chances are that they probably entered the value when the users left out the information when they created their profiles. So, the mean here is likely distorted. We are going to change the data whose value is -1 cm in Heigh to NaN.

Loading output library...
Loading output library...

We can see the mean value changed. Further investigate among men and women.

Loading output library...
Loading output library...
Loading output library...

Age

#Age

For the similar reason cleaning Age column.

Loading output library...

It's resonable to assume that no infant would join the diet service and no one in the service is age of 173 since no one has been reported to live that long. Going to clean it to nan. Also from the data it's reasonable to think most of the users are in their 20 and 30s looking at the percentile data. We are going to convert data whose age is more than 60 to NaN

Loading output library...

User Identification

#User-Identification

We are going to identify two kind of user groups.

  • Among paid users, who entered their information wrong.
    • If they entered the information wrong, the coach can't provide right coaching.
  • Among paid users, VIP users

    • VIP users who have achieved their goals. In diet services, nothing is more effective than showing the before and after result. We can find them and give the promotion to use their story.

The columns we are going to user are as following.

  • Name
  • Age(clean)
  • Height(clean)
  • Initial Weight
  • Lowest Weight
  • Target Weight
  • Status
Loading output library...

6. Adding new information

#6.-Adding-new-information

BMI is commonly used in masuring body composition even though there are criticism that it simplifies it too much. However for the sake of simplicity, we are going to use this to identifies those two groups that we wanted to identify.

The new columns to be added are:

  • Weight Loss(goal): Initial Weight - Target Weight
  • Weight Loss(current): Initial Weight - Lowest Weight
  • BMI: Weight / (height(meter) / height(meter))
Loading output library...

7. Identifying users who entered invalid data

#7.-Identifying-users-who-entered-invalid-data

Since we are only considering about paid users, we are going to exclude unpaid users which we can identify it by Status. Paid users' status is completed

The rules we are going to use to identify them are: 1. NaN in Age, Height, Initial Weight, Lowest Weight, or Target Weight 2. Too low (less than 140cm) or high (more than 200)height 3. Too low (less than 18.5) or too high (30.0) BMI 4. Weight Loss(Goal) negative (usually people want to loss weight not gain)

Loading output library...
Loading output library...
Loading output library...

We can send notification to the users who are indenfitied as entered invalid data.

VIP Check

#VIP-Check

To find the VIP users we are going use the following rule.

  • Status == "completed" (paid user)
  • Weight Loss(goal), Weight Loss(current), BMI not NaN
  • Weight Loss(current) more than 10kg
  • BMI > 30
  • Weight Loss(Current) > Weight Loss(goal) (who met their diet goal)
Loading output library...
Loading output library...

Those users could be VIP users and be targeted with promotions.

Profitability Analysis

#Profitability-Analysis

At the end of the profitability is the most important as business entity. We are going to analyze who are the most profittable demographics to find out:

  • Who are easier to acquire (Customer Acquisition Cost)
  • Who are the most profitable (Customer Lifetime Value)

Paid, cancelled, refunded ratio

#Paid,-cancelled,-refunded-ratio
Loading output library...

Paid, cancelled, refunded ratio by Gender and Age

#Paid,-cancelled,-refunded-ratio-by-Gender-and-Age

First we are going to age group.

  • Under 17
  • 18 ~ 24
  • 25 ~ 35
  • 36 ~ 44
  • 45 ~ 54
  • Over 55
Loading output library...
Loading output library...
Loading output library...

The result of the analysis is:

  • women between 25~35 have most frequently paid for the service. Their conversion rate is also considerably high.
  • women between 36~54 and men between 25~44 also have high conversion rate. However the total is lower compare to women 25~35 group. Which means the marketting strategy for those groups might not be optimal.

Conversion Rate by time

#Conversion-Rate-by-time

Now we are going to see if there's difference in conversion ratio time or day. If there is we can focus more on those time frame.

Loading output library...

There doesn't seem like any patterns with what time they purchased the service other than around midnight purchases spike up. It might be a good time to focus more on.

Loading output library...

Again there don't seem to have any patterns. However, more people total number of purchase made Monday ~ Thursday then Friday ~ Sunday. Monday to Thursday might be a good time to do more targeted marketing.

Channel

#Channel

Next one is channel. Where are the gateways to the service?

Loading output library...

The result shows that facebook is the number one channel where users come from.

Conversion Rate by Coach

#Conversion-Rate-by-Coach

This could be useful and very sensitive too as it shows the coaches performance directly. However it's important to know that who are doing well and who are not. If we can provide better training to underperformed one we can increase the performance or bench mark overperformed ones.

Loading output library...
Loading output library...
Loading output library...

Conversion / Cancellation rate

#Conversion-/-Cancellation-rate

Finally conversion and cancellation rate. If a coach has high conversion rate, he or she is doing something well, if a coach has high cancellation rate, he or she is not doing so well. We will find the top performing ones and bottom performing ones, so that we can give right feedback. To be fair, who has less than 100 trainees, we will exclude from the result.

Loading output library...
Loading output library...
Loading output library...

Now we know how well they are performing to give the right feedback.