Create a data set with N = 500 points from two mixed Gaussian distributions (each distribution has five bivariate Gaussian distributions). The elements of the first mixed distribution have a maximum average value of 0 and a minimum average of -5 and a variance of 1. The elements of the second mixed distribution have a maximum mean value of 5, the minimum average is 0 and the variance is 1. Draw decision boundary (Bayes boundary) between N points of the first mixture distribution and N points of the second mixture distribution without using any machine learning models.

• We assume the distribution of mean of the 5 gausian distributions is uniform
• For simplicity, we assume each variable of a bivariate distribution is independent of each other => covariance matrix is diagonal matrix [1, 0, 0, 1]

A mixture gaussian model can be defined as: @@0@@ where α

i
is the weight of the component distributions g
i
and M is the number of component distribution in the mixture distribution. In this case we have M = 5 and α
i
= 1/M =0.2.

To draw samples from G we can simply draw samples from each g

i
seperately with the number of samples from each component: n
i
= α
i
N = 0.2 500 = 100 and combine them into 1 set. Drawing samples from a multivariate gaussian distribution can be done using np.random.multivariate_normal function.

We have 2 prior distribution: π

0
=0.5,π
1
=0.5 since the number of sample from 2 set are equal

Let y{0,1} be the output, and x

2
be the input. Using Bayes' theorem: @@0@@

The decision boundary is the line in

2
where this two conditional probabilities are equal: @@1@@ or @@2@@

Since π

0
= π
1
= 0.5, we can simplify (1) to: @@3@@ Let's solve (2): @@4@@ It's hard to solve this equation manually but we don't have to, we just need to plot it. Fortunately, we can do so using matplotlib.

First, we need a function to calculate either sides of (3): @@5@@

The decision boundary or bayes boundary is a line that connects every points that satisfy (3). We can draw this line using contour function provided by matplotlib: