- sex: male or female.

-age: age of the patient.

- education: levels coded 1 for some high school, 2 for a high school diploma or GED, 3 for some college or vocational school, and 4 for a college degree.

- currentSmoker: whether or not the patient is a current smoker

- cigsPerDay: the number of cigarettes that the person smoked on average in one day.

- BPMeds: whether or not the patient was on blood pressure medication .

- prevalentStroke: whether or not the patient had previously had a stroke .

- prevalentHyp: whether or not the patient was hypertensive.

- diabetes: whether or not the patient had diabetes

- totChol: total cholesterol level .

- sysBP: systolic blood pressure.

- diaBP: diastolic blood.

- pressure BMI: Body Mass Index .

- heartRate: heart rate.

- glucose: glucose level .

- TenYearCHD: 10 year risk of coronary heart disease CHD .

**- Anaconda 5.0.1 .**

**- progrmming languge python 3.6 .**

**- work's enveroment Jupyter Notebook 5 .**

* we use anaconda becouse it include both inveroment and paython ØŒÂ

read the dataset: .heart_data.csv

**sample from data in our project**

Loading output library...

**number of data in each column**

Loading output library...

**data details**

**std : Â Â StandardÂ deviation**

**mean :Â arithmeticÂ mean**Â Â

**count : number of element**

Loading output library...

0 refers to the number of pepole doesnot sufer from heart attak disease in the last ten year

1 refers to people whom sufer from heart attack disease in the last ten year

Loading output library...

order data and review**remove all data that exist incomplete information**

**now number of exist value become equal small colum**

Loading output library...

**Sample**

Loading output library...

analysis the association mean the study between two variables , the basic topic for this is to determine the relationship between that's variables , from 0 (no correlation ) to 1 that is (perfect correlation.

Loading output library...

Loading output library...

**Distribute study level by age**

Loading output library...

Loading output library...

**Behavior smoke for gender**

Loading output library...

Loading output library...

**number of cigarettes by age levels**

Loading output library...

**study level for gender**

Loading output library...

Loading output library...

**Sample**

Loading output library...

**Analysis the association between Medical data**

Loading output library...

Loading output library...

**Age distributed for patient that was injured heart disease cornary in last ten year**

Loading output library...

Loading output library...

**Age distributed for diabetes patients.**

Loading output library...

Loading output library...

**Heart disease cornary for genders**

Loading output library...

Loading output library...

**Avoide to prolongation on the review of data limit ourselves to this we go to machine learning**

decision tree is the non-supervision learning way,it Used to classification and regression ,the objective from it make model to prediction value of variable goal bu learning rules of simple decision to extracted from features,we apply classification process by set of rules or conditional that determine path start root and ends of final root that represents symbol to classified thing and at all infinite node must be mack decision about path to next node.

**Preparation of data**

will be separated and division of data to the matrix'x' which will contain data features that will be used for training, and the matrix'y' only contain the column values'diabetes' any target, this means that the x will contain features every person and y is a matrix of a single column and each row in the y will contain the value of either 1 if the person may injured diabetes or 0 If you do not hurt. algorithm half- life will compare values or advantages of each row in the group x with the value of the corresponding in the matrix y to find out certain patterns for the reasons which can be affected in the injured person diabetes

**properties that used in X**

'male' , 'age' , 'cigsPerDay' , 'BMI' , 'glucose' , 'totChol'

**training and testing data**

will divide each of the matrix x and y to the data for Training and data test. We will use 80% of the matrix'x' and the matrix'y' training and 20% will be used to test.

**Decision Tree Classifier**

specifying the maximum depth of possible branches of the tree in the 10.

**feed model decision tree by training data**

This is process called " training "

Loading output library...

**test the effectiveness of Model**

decision tree ready, it will be we can test their effectiveness'score' using the training data and testing to know its accuracy.

Results showed that decision tree succeeded in expectation% 98 of the data set the stomach test properly, which means that their quality high

we will enter data for new person to predict his diabetes

prediction result

**add anthor persone to make sure form works**

seeing as the result is the 0 Unlike the person former, here was not expected injury new person diabetes

**Preparation of data**

we will repeat the previous steps our form expected to diabetes with the change the target to TenYearCHD

**Decision Tree Classifier**

**feed model decision tree by training data**

This is process called " training "

Loading output library...

**test the effectiveness of Model**

Results showed that decision tree succeeded in expectation% 83 of the data set the stomach test properly, good.

**data for new person**

**prediction result**

has been expected to disease coronary heart.

**add data of new person to make sure that the form works well:**

seeing as the result is the0 Unlike the person former, here was not expected injury new person diabetes

Â

algorithm to the forest random derived from the decision tree( Classification and Regression Trees), one of ways to machine learning in order to building model prediction of data, as it is obtained models through the division of data and build a simple form to predict the inside each section

apply forest random in the prediction to coronary heart disease

**Model Forest random classifier**

Loading output library...

**test the effectiveness of Model:**

the forest random succeeded in expectation 84% of the total data stomach test properly, better than the decision tree.

**data for new person**

prediction result

algorithm for Category" Gradient boosting" will generate many of the trees expectation weak and then integrate it or improvement it the model a strong.

Loading output library...

**test the effectiveness of Model:**

Gradient boost succeeded in expectation 86% of the total data ready- test is Saheehs

**data for new person**

prediction result

algorithm" Voting" apply models voting multi- such as models that our work by the on the data, will be tested more than algorithm identifies the result of the expectation best

**test the effectiveness of Model:**

algorithm voting succeeded in expectation 86% of the total data ready- test

**data for new person**

prediction result

in our work in these algorthims that use for machin learing and data mining the result of accurance in desion tree only 83% and random forest 84% and voting algorithm 85%