Volleyball Position Classifier : A Machine Learning Approach

#Volleyball-Position-Classifier-:-A-Machine-Learning-Approach

If I would be asked “what is the best volleyball match ever played?”, the 2012 gold medal match between Brazil and Russia would definitely be my answer. Brazil (the team i was rooting for) had the upper hand in this game. They were up 2 sets to none. The Russian team's strategy was not working in the first two sets until coach Vladimir Alekno shifted Dmitriy Muserskiy's position. Come third set of the match, Muserskiy, a seven foot two giant, played as an opposite spiker.

alt text

This unexpected strategy surprised Brazil and the olympic medal was won by Russia. Volleyball fans were in shock with the come from behind win because Brazil's olympic lineup was solid (LEGENDARY EVEN!!!!). It had the likes of Murilo, Giba, de Souza, Sergio and Rezende. Read more on 2012 men's volleyball olympics' iconic match

Russia outstrategized Brazil in this match. In volleyball, it is expected that the tallest players go in the middle position mainly to strengthen the blocking. Going against this status quo helped russia claim the first spot. By using a tall player NOT ONLY for blocking BUT ALSO for scoring surprised Brazil. This genius tweak in positioning to confuse the defense of the opponent made me think about player positioning in volleyball. In this project we try to see if machines can help coaches position their players in the team. This can help teams sort players (especially versatile ones) into a position they can fit in.

Could not see image of usual arrangement of team. Maybe because I was reading on mobile?

Data - where did you get it? What does it look like?

Average height analysis: incomplete sentence in the hyperlink?

Dataset

#Dataset

The dataset used was Volleyball Nation League 2018's score and statistics of all teams that participated. This was scraped using BeautifulSoup (an HTML parser). Physical features include:

1
2
3
4
5
(1) Height
(2) Weight
(3) Age
(4) Block Reach
(5) Spike Reach

Different in-game statistics were also obtained namely :

1
2
3
4
5
(1) Blocking
(2) Scoring
(3) Spiking
(4) Digging and Receiving
(5) Setting

(If you are interested to have a copy of the scraping code/dataset just contact me and i can give it to you)

Player positioning in Volleyball

#Player-positioning-in-Volleyball

Just like other sports, volleyball is composed of players specialized in different tasks. There are a total of 5 unique positions that a player can fit in.

(1) Open spiker - a versatile position. They generally have good floor defense and have decent attack percentage. This is because this position is where the ball is easiest to set to (especially when the ball was not perfectly dug/received).

(2) Opposite spiker - this position requires good blocking and spiking abilities. They are responsible in shutting down the opponent's open spiker.

(3) Middle blocker - the taller ones dominate this position. High reach and good timing is required in this position. Their main objective is to deflect the hits of the other teams' spiker.

(4) Setter - they orchestrate the teams' offense. This position requires precision and good reading ability. They are responsible in distributing the ball to different attackers. As much as possible, they try to fool and outspeed the blockers of the other team making it easier for spikers to get clean hits.

(5) Libero - the player with a different colored jersey. Agility, fast reaction time and high tolerance for pain(hahahjk) makes up a good libero. They are responsible in manning the floor defense of the team. They dig powerful spikes and receive high velocity serves.

The image below shows the usual arrangement of a team :

alt text

Some Cool Statistics!

#Some-Cool-Statistics!

We try to look at different qualities of players playing different positions in volleyball. Simple statistics help us debunk/confirm accepted volleyball concepts.

Each position seem to have different average height

#Each-position-seem-to-have-different-average-height

It was found that the average height of volleyball players playing in the competitive level is ~ 197cm (roughly 6 foot and 4 inches). By plotting the distribution of height per position, it was found that average height seems to be different per position.

Average Height:

Loading output library...
Loading output library...

Middleblockers were found to be the tallest. This is because height is somewhat a requirement to efficiently block spikes. Having height advantage also gives blockers room for error in case they get delayed in blocking. Opposite spikers are the second in terms of height ranking. This also makes sense because the opposite position directly face the opponent's open hitter (which frequently gets set, especially if the ball was not dug or received properly). Open spikers and setters seem to be well in the average in terms of height. This may be because their respective position do not require height as compared to the first two positions (but then again, this is volleyball, EVERY INCH COUNTS!!!). Lastly Liberos are observed to be smallest compared to the other positions. Click on the link for more explanation.

Shortest and Tallest Players in the League

#Shortest-and-Tallest-Players-in-the-League

It is also interesting to highlight that there seem to be outliers in the lower and upper end of the distribution. Upon filtering it was found that most of the outliers with shorter height belong to the Japanese team. Their coaching staff has admitted that height is one factor that their team needs to improve in. Two giants were found to be included in the roster of players namely Muserskiy (Russia) and Lemanski (Poland). Both of them play middle (as expected).

Loading output library...
Loading output library...

Height and Reach

#Height-and-Reach

It was found that height and reach are positive correlated. This result is somewhat expected. TALLER PEOPLE HAVE BETTER REACH, lol this is kind of obvious tho

Correlation of Height and Block Reach

Loading output library...
Loading output library...

Correlation of Height and Spike Reach

Loading output library...
Loading output library...

Skills

#Skills

There are two main categories in volleyball statistics namely scoring and non-scoring statistics. We check these numbers to see if there are any interesting patterns that our data can show.

Scoring

#Scoring

Hitting : Spreading attacks is the key

#Hitting-:-Spreading-attacks-is-the-key

Hitting - this is the usual third and last contact of the ball before sending it in the opponents' field. It is usually fast and targeted in an area where no defender (blocker or receiver) can go after it.

Using a stacked bar plot, it was observed that all teams' hitting attempts are done by open spikers. This is a usual pattern in volleyball and this is because setting to open is the easiest compared to middle and opposite. During bad receives and reception, it is already expected that the ball will go to the open. Another observation is that Bulgaria did not field in any of their opposite spiker. All of their attempts were made by middle and open spikers.

In terms of spread, it seems that a lot of teams focus their sets to open spikers. Teams that set to open spikers more than half of the time are :

1
2
3
4
5
6
7
(1) Canada
(2) Poland
(3) Italy
(4) Japan
(5) Bulgaria
(6) Germany
(7) Korea

This information would be hlepful in strategy making. If going against these teams, it would be smart to focus blocking and floor defense against open spikers (no matter where they are in the court).

Loading output library...

Looking at the standard deviation or spread of the attacks per team, it seems that teams that distributed attacks reached higher ranks in the competition. Teams with low attacking standard deviation means that the ball is distributed to different positions. Distributions that are skewed or are concentrated to a certain position entails high standard deviation (as observed with Bulgaria.). Distributing the ball is a good strategy because it makes it more difficult for the opponents to set up defense (both in blocking and floor defense). This is a usual tactic of top teams. Coach Ramil De Jesus implements this system of almost equal distribution of sets to different position. DLSU's lineup is known to be competitive in all positions and everyone in the team is expected to contribute. This makes it difficult for opposing teams to guard different spikers. For coaches, this information is useful in strategizing. Going against teams with high standard deviation means ball distribution is skewed (i.e. Ateneo whose sets are mostly concentrated to A. Valdez). Putting the best defensive players in front of the most set position of the opponent may help the team win.

(IMAGINE A MAJOY BARON IN THE MIDDLE, KIM KIANNA DY IN THE OPPOSITE AND A DAWN MACANDILI DEFENDING AGAINST AN OPEN SPIKER, THIS WOULD MAKE ANY SPIKER'S LIFE DIFFICULT, EVEN FOR AN ALYSA VALDEZ). CASE IN POINT:

alt text

Loading output library...

Blocking : Height is might (still.)

#Blocking-:-Height-is-might-(still.)

Blocking - this is intended as a defensive skill. The main objective is to deflect the ball but this skill can also help the team earn points by deflecting the ball straight to the floor (usually called a roof or a monster block).

Correlation of Height and Blocking Efficiency

Loading output library...
Loading output library...

Correlation of Block reach and Blocking Efficiency

Loading output library...
Loading output library...

Blocking heavily relies on height as evidenced by the correlation values obtained below. The graph also supports this common knowledge. Comparing height with block reach, it was observed that block reach is less correlated with average efficiency in blocking. This highlights the importance of height in blocking. A person may have high block reach but height is still king in the blocking department. As explained in Haikyuu (forgot what episode lol), Hinata (a really short character) has incredible block reach but finds it difficult to block compared to other middles. This is because it takes time for shorter people to reach the upper region of the net. Taller people have the natural advantage of reaching this area faster (longer limbs). HEIGHT ALLOWS FOR SMALL DELAYS IN BLOCKING (TALLER PEOPLE HAVE THE ADVANTAGE OF BEING ABLE TO COMPENSATE DELAYS IN BLOCKING BECAUSE THEY CAN REACH THE TOP OF THE NET FASTER THAN SMALLER PEOPLE).

alt text

Non-Scoring

#Non-Scoring

Floor defense : Small players' redemption

#Floor-defense-:-Small-players'-redemption

We now look at the non offensive skills in volleyball. Floor defense is the bread and butter of any team. This is the first contact of the ball after serving or spiking.

Receiving - the first defensive contact after service. Allows the ball to be passed to the setter. Perfect receives will make it easier for setters to distribute the ball.

Digging - the first defensive contact after an opponent hits/spikes the ball. Similar to receiving setters heavily depend on good digs for better distribution of balls.

Here we show the relationship of height, receiving and digging. It was found that height has negative correlation with both defensive skill. This is commonly observed in volleyball. Taller players are disadvantaged in digging and receiving because it takes more effort for them to bend down (it also takes time!). Also as stated by S&P(n.d.) "Shorter players tend to have faster reaction times, greater ability to accelerate body movements, stronger muscles in proportion to body weight, and a greater ability to rotate the body faster."

This is the reason why i also liked volleyball. There is a place for everyone. No matter how tall or short you are, you can specialize in a position that you have advantage.

Correlation of Height and Digging Efficiency

Loading output library...
Loading output library...
Loading output library...

Correlation of Height and Receiving Efficiency

Loading output library...
Loading output library...
Loading output library...

Ball distribution and Floor defense

#Ball-distribution-and-Floor-defense

Digging and receiving were combined and averaged to get a floor defense measure. This metric was correlated with the attacks' distribution (standard deviation). Doing this it was found that floor defense and attacks' distributions' standard deviation is negatively correlated. This means that steady defense allows better distribution of balls to different attackers. A team with good defensive skill is a setters' dream. A setter can create plays with different tempos if first-balls are received/dug accurately.

Correlation of Floor Defense Average and Attack Distribution Standard Deviation

Loading output library...
Loading output library...
Loading output library...

Setters : Good defense makes everyones' life easy

#Setters-:-Good-defense-makes-everyones'-life-easy

Setting - usually, this is the second contact of the ball after receiving/digging. This allows the ball to be distributed to the hitter. The main goal is to fool the blockers of the opposing team. Delaying the blocking stance of opponents makes it easier for spikers to find an open spot to hit the ball.

To further emphasize the importance of floor defense to ball distribution, we look at the correlation of floor defense average with the number of still sets made by different players. Still sets mean that the setter did not have to move to set up the ball. Precise sets and plays are more deadly if the setters are not setting while moving. This allows the setter to use decoys and clinch the ball longer. True enough, it was found that floor defense is positively correlated with still sets.

Correlation of Floor Defense Average and Still Sets

Loading output library...
Loading output library...
Loading output library...

Strategize!

#Strategize!

Given these statistics, coaches and teams can now start strategizing. One of the most crucial part in this phase is player positioning. Oftentimes coaches finds it difficult to position players (especially players that can play any position). This is why i thought of creating a model that can help in player positioning. The application of the models built can help different teams. It can also adjust to the playstyle of different coaches (just train it with different data).

Features/Variables Used

#Features/Variables-Used

Physical and skill-based features are used to train different machine learning models. Physical attributes are : height, jumping ability (composed of blocking and spiking reach) while the skill based features are : blocking efficiency, spiking efficiency, floor defense efficiency and setting efficiency. Preprocessing and scaling were applied to the data.

Machine Learning

#Machine-Learning

Unoptimized traditional machine learning models are used in this project. Parameters were not tweaked and it is expected that better results can still be obtained (in terms of accuracy). It was found that gradient boosting method had the best test accuracy for this given data.

Loading output library...

Future usage of the model

#Future-usage-of-the-model

I can imagine tryouts where coaches ask for "resumes" from applicants. With this kind of model, a coach can immediately guage how these potential team members would be positioned in the team. This model can be also used to see what better position should versatile players take. I wonder how would a model classify Jaja Santiago. People are kind-of divided where to put the Filipina giant. She excells in almost every scoring skill and she perfectly fits the middle position, however coaches also tries to utilize her in the open position to maximize her hitting abilities (again this might be because opens generally get more set).

alt text

Room for future studies

#Room-for-future-studies

Things where this project could improve on :

(1) BETTER DATA - More granular and game specific data will allow for better analysis (this will allow for better analysis). Note that the data used here might have skewed the analysis because some positions do not have statistics in some areas (i.e. most liberos do not have spiking efficiency)

(2) OPTIMIZE MODELS - The models in this project are still unoptimized (but we already got 75% test accuracy)