Oct. 24th

- Make sure you push on GitHub your notebook with all the cells already evaluated.
- Note that maps do not render in a standard Github environment. You should export them to HTML and link them in your notebook.
- Don't forget to add a textual description of your thought process, the assumptions you made, and the solution you implemented.
- Please write all your comments in English, and use meaningful variable names in your code.

- Are you curious to know what the political leanings of the people of Switzerland are?
- Do you wake up in a cold sweat, wondering which party won the last cantonal parliament election in Vaud?
- Are you looking to learn all sorts of visualizations, including maps, in Python?

If your answer to any of the above is yes, this assignment is just right for you. Otherwise, it's still an assignment, so we're terribly sorry.

The chief aim of this assignment is to familiarize you with visualizations in Python, particularly maps, and also to give you some insight into how visualizations are to be interpreted. The data we will use is the data on Swiss cantonal parliament elections from 2007 to 2018, which contains, for each cantonal election in this time period, the voting percentages for each party and canton.

For the visualization part, install Folium (*Hint: it is not available in your standard Anaconda environment, therefore search on the Web how to install it easily!*). Folium's README comes with very clear examples, and links to their own iPython Notebooks -- make good use of this information. For your own convenience, in this same directory you can already find one TopoJSON file, containing the geo-coordinates of the cantonal borders of Switzerland.

One last, general reminder: back up any hypotheses and claims with data, since this is an important aspect of the course.

Loading output library...

Loading output library...

Loading output library...

**A)** Display a Swiss map that has cantonal borders as well as the national borders. We provide a TopoJSON `data/ch-cantons.topojson.json`

that contains the borders of the cantons.

**B)** Take the spreadsheet `data/communes_pop.xls`

, collected from admin.ch, containing population figures for every commune. You can use pd.read_excel() to read the file and to select specific sheets. Plot a histogram of the population counts and explain your observations. Do not use a log-scale plot for now. What does this histogram tell you about urban and rural communes in Switzerland? Are there any clear outliers on either side, and if so, which communes?

**C)** The figure below represents 4 types of histogram. At this stage, our distribution should look like Fig.(a). A common way to represent power-laws is to use a histogram using a log-log scale -- remember: the x-axis of an histogram is segmented in bins of equal sizes and y-values are the average of each bin. As shown in Fig.(b), small bins sizes might introduce artifacts. Fig.(b) and Fig.(c) are examples of histograms with two different bin sizes. Another great way to visualize such distribution is to use a cumulative representation, as show in Fig.(d), in which the y-axis represents the number of data points with values greater than y.

Create the figures (b) and (d) using the data extracted for task 1B. For Fig.(b), represent two histograms using two different bin sizes and provide a brief description of the results. What does this tell you about the relationship between the two variables, namely the frequency of each bin and the value (i.e. population in case of the communal data) for each bin?

The figure is extracted from [this paper](https://arxiv.org/pdf/cond-mat/0412004.pdf) that contains more information about this family of distributions.

**A)** Display a Swiss map that has cantonal borders as well as the national borders. We provide a TopoJSON `data/ch-cantons.topojson.json`

that contains the borders of the cantons.

Loading output library...

**B)** Take the spreadsheet `data/communes_pop.xls`

, collected from admin.ch, containing population figures for every commune. You can use pd.read_excel() to read the file and to select specific sheets. Plot a histogram of the population counts and explain your observations. Do not use a log-scale plot for now. What does this histogram tell you about urban and rural communes in Switzerland? Are there any clear outliers on either side, and if so, which communes?

Let's have a look at the excel file containing the communes

Loading output library...

So the first column contains names, the second the population (as of Jan 1st, 2017)

We are only interested in communes, i.e. rows starting wit '......'. Furthermore we only care about total population, so we can omit the other columns

Loading output library...

Let's plot a histogram of the population in communes

Loading output library...

Loading output library...

This histogram looks pretty distorted, we can smell outliers. It seems that there must be big sparse gaps. Further investigation:

Loading output library...

So at the 75% quantile there is a population 3605, but this is less than the mean of (rounded) 3759. The std deviation of over 12306 is very high compared to the mean.

The maximum is a massive commune of over 400000 people. Let's take a look at the names of the 20 biggest communes

Loading output library...

As you may have guessed, the outliers forming the top 10 are well-known swiss cities, topped by Zurich, Geneva, Basel, Lausanne and Bern.

There is also a massive gap of over 200000 between Zurich and Geneva

Loading output library...

These 5 communes already make up for more than 12% of the swiss population, leaving the rest for the other 2235 communes

Now we look, if there are especially small ones, let's say in the bottom 20

Loading output library...

Again, we observe an outlier, a small commune named Corippo, with only 14 inhabitants, followed by Kammersrohr and Bister with already 30 each.

Loading output library...

So there's 404 communes (9% of all communes) with less than 300 inhabitants, so small (<10% of mean) communes are not uncommon

**C)** The figure below represents 4 types of histogram. At this stage, our distribution should look like Fig.(a). A common way to represent power-laws is to use a histogram using a log-log scale -- remember: the x-axis of an histogram is segmented in bins of equal sizes and y-values are the average of each bin. As shown in Fig.(b), small bins sizes might introduce artifacts. Fig.(b) and Fig.(c) are examples of histograms with two different bin sizes. Another great way to visualize such distribution is to use a cumulative representation, as show in Fig.(d), in which the y-axis represents the number of data points with values greater than y.

Create the figures (b) and (d) using the data extracted for task 1B. For Fig.(b), represent two histograms using two different bin sizes and provide a brief description of the results. What does this tell you about the relationship between the two variables, namely the frequency of each bin and the value (i.e. population in case of the communal data) for each bin?

The figure is extracted from [this paper](https://arxiv.org/pdf/cond-mat/0412004.pdf) that contains more information about this family of distributions.

So for Fig(b) we are plotting loglog histograms of the population, with two different bin numbers: 100 and 2000

Loading output library...

TODO Explanation

Now we go for the cumulative visualization as in Fig(d)

Loading output library...

We provide a spreadsheet, `data/voters.xls`

, (again) collected from admin.ch, which contains the percentage of voters for each party and for each canton. For the following task, we will focus on the period 2014-2018 (the first page of the spreadsheet). Please report any assumptions you make regarding outliers, missing values, etc. Notice that data is missing for two cantons, namely Appenzell Ausserrhoden and GraubÃ¼nden, and your visualisations should include data for every other canton.

For part B, you can use the `data/national_council_elections.xslx`

file (guess where we got it from) to have the voting-eligible population of each canton in 2015.

**A)** For the period 2014-2018 and for each canton, visualize, on the map, **the percentage of voters** in that canton who voted for the party `UDC`

(Union dÃ©mocratique du centre). Does this party seem to be more popular in the German-speaking part, the French-speaking part, or the Italian-speaking part?

Loading output library...

**B)** For the same period, now visualize **the number of residents** in each canton who voted for UDC.

Loading output library...

**C)** Which one of the two visualizations above would be more informative in case of a national election with majority voting (i.e. when a party needs to have the largest number of citizens voting for it among all parties)? Which one is more informative for the cantonal parliament elections?

The more informative in case of national election is the second visualization. But for a cantonal parliament elections, tendencies shown in the first visualization are most important.

In this section, we focus on two parties that are representative of the left and the right on the Swiss political spectrum. You will propose a way to visualize their influence over time and for each canton.

**A)** Take the two parties `UDC`

(Union dÃ©mocratique du centre) and `PS`

(Parti socialiste suisse). For each canton, we define 'right lean' in a certain period as follows:

@@0@@

Visualize the right lean of each canton on the map. What conclusions can you draw this time? Can you observe the rÃ¶stigraben ?

Loading output library...

Loading output library...

**B)** For each party, devise a way to visualize the difference between its 2014-2018 vote share (i.e. percentage) and its 2010-2013 vote share for each canton. Propose a way to visualize this evolution of the party over time, and justify your choices. There's no single correct answer, but you must reasonably explain your choices.

Loading output library...

Loading output library...

Loading output library...