The aim of this project is to examine whether the number of rooms a property has or the overall value of the deal determines how quickly the deal is signed (velocity).
Below is a summary of the data after removing the outliers:
The average time to sign a deal is 96 days
The quickest deal was in 5 days, the longest took 912 days
There's a large variance in both velocity and number of rooms for a property
Let's see scatter plots with each of the parameters to have an idea of the connection between them:
Value seems to be nicely correlated with number of rooms - makes sense, the more rooms a property have, the better value (more mrr) the deal has.
Number of rooms have no effect on deal velocity: we see small properties going fast, small properties going slow, or large properties signed in 30 days, or large properties signed in more than a year.
Same goes for value: large value deals can go fast or slow, and cheap deals can take years as well.
As velocity is so randomly distributed, we can't really conclude anything meaningful whether property size or deal value affects deal signing time.
The categories are as follows:
After categorising the velocities, let's see a color-coded scatter plot:
If the deal categories were really different they would have very separate clusters.
It's very difficult to separate the categories based on deal value and number of rooms alone.
If you randomly add a property, let's say 50 rooms and 500 Euros deal value, you'd have no idea whether it's a slow, medium or fast deal based on this graph: there are all 3 colors (fast/medium/slow deals) clustered in that region.
To further confirm this, we can observe that the distribution of fast/medium/slow deals across different property-sizes are pretty much overlapping.
Note: The Fast (green) deal distribution sees a spike as it has a few extra properties between 20-25 rooms compared to the total properties.
However, let's see if any simple machine learning models can actually predict with decent accuracy.
Firstly, let's stay with the categorical data. We can use K-means clustering, and specifying we expect k=3 clusters (slow, medium, fast).
Below is the accuracy of the fitted model, calculated by how many of the model-predicted categories match the actual category:
Accuracy: 0.0 %
Even sklearn (the machine learning library I use) has difficulty telling them apart.
Let's see if we get more luck with sticking to numerical values, and applying a linear regression model to predict the actual deal velocity.
Below I plotted how the model does at predicting the velocities vs the actual velocities for these test values:
Ideally we'd like to see a linear line going across a diagonal, ie. a predicted 200 value pairs with an actual velocity of 200. In our graph, we have predicted a 200 day velocity to a 30 day one, or a 100 day velocity to an actual 800 day one. I'd hardly call it precise or usable.
In summary, our models are incredibly inaccurate and we just can't use number of rooms and deal value to classify or predict deal velocity in this dataset
The aim of this section is to discover maybe some types of properties have better statistics for deal velocity.
For example, looking at properties with 30 or less rooms, the average time to sign a deal is 69 days, compared to the 95 we've seen above.
Average velocity for smaller properties:
20 rooms or less: 54.088235294117645 days
30 rooms or less: 68.90163934426229 days
40 rooms or less: 76.7051282051282 days
50 rooms or less: 81.39226519337016 days
60 rooms or less: 84.02072538860104 days
First of all, it's important to note that this is an average of a really messy data. But what we can see here is that even if we triple the number of rooms a property has, the deal time doesn't even change by 50%. There's no reason why we should only go for very small properties if the much larger ones barely take more time on average.
It's also worth noting that in this data, 80% of the properties are 60 rooms or less:
percentage of <= 60 room properties to all properties: 79.75206611570248 %
Looking at the deals which were signed under 30-60-90 days, we get the following average property sizes expressed by number of rooms:
average property size for deals signed in 30 days: 34.28787878787879 rooms
average property size for deals signed in 60 days: 37.46610169491525 rooms
average property size for deals signed in 90 days: 41.4640522875817 rooms
Overall, it does not seem concluded that smaller properties get significantly faster deals. The average property sizes for both <30 and <90 day velocities are very similar.
From this short analysis we've seen that:
1) the data is insufficient and to base any accurate predictions on
2) the data does not suggest any insights that can be used for operational purposes or strategy making
It would be wise to wait for more datapoints to actually see what might affect deal velocity, or how we could target properties better to get faster deals.