Imported the dataset from local to hadoop, performed SQL queries in HDFS and pulled records of drivers with certain risk factors, performed visualization on TABLEAU to create dashboards based on some events. Finalized some drivers, cities, miles driven which possess highest risk.
Streamlined two machine learning classification models in R Studio using KNN & Logistic Regression algorithms to predict loan defaulters. Analyzed historical data and predicted defaulters using Decision tree and Logistic regression models. Successfully analyzed two variables installment and revolving balance- one unit increase in them, increase the chances of default by 18% and 12.2%.
Cleaned and preprocessed the dataset using the Pandas library for data accuracy, addressing missing values and outliers.
•
Promoted statistical analysis using Python and generated interactive graphic representations, using various machine learning models like XG boost, using evaluation metrics and finally made recommendations that seasonality and destination were highest impact factors.
Studied price elasticities of air tickets with air passenger’s numbers and their time trend over the years but before the 2001 recession period. We aim to reveal how this sensitivity changes as a country approaches recession.
•
Using empirical methods, OLS & PLM statistical concepts show price inelastic over the time, reduced responsiveness to changes in ticket prices.
This section include my Professional data analyst Certification related to the field of Machine learning | SQL | Python | Data cleaning | Validation | visualization | Interpretation
This section include my Associate data analyst Certification related to the field of Machine learning | SQL |