· We got the data set having 150000 observations.
· The project life cycle includes the following steps: Data processing, cleaning, EDA, Model building, deployment.
· The programming language used is python and we uploaded the data from CSV.
· The data cleaning consists of dropping of NA values and resetting of index.
· In EDA part we compared the relationship between our outcome variable with independent variables using various plots and checked the collinearity among the independent variables.
· In model building we tried the models: Logistic Regression, Decision Tree, Random Forest and XGB and we got the highest accuracy with XB boost with train accuracy: 95% and test accuracy: 93%.
We deployed the model using python flask and Heroku