Data Analysis and Data Science Projects
2020 Kaggle Machine Learning and Data Science Survey
Exploratory Data Analysis
Since 2017, Kaggle has been conducting annual industry-wide surveys to get an overview of the state of data science and machine learning worldwide. The collected (and cleansed) data then becomes the base for a competition for the Kaggle community to explore and glean data stories out of, whether about a section or about an aspect of the data science community. 2020 marks the third year since the initiation of the Annual Data Science Survey Challenge.
The largest proportion of participants in the survey sample were from India (29%) and USA (11%). So, this project was initiated with the intent to study the contrasts between the overall scenario and the India and US trends.
College dataset
Simple Data Analysis
This project is based on the dataset containing statistics for a large number of US Colleges from the 1995 issue of US News and World Report, which is part of the curriculum of the book Introduction to Statistical Learning with R. Focus of this project was on simple data analysis. The insights will later on be compared with analysis based on statistical-techniques.
Customer Segmentation for Subscription Service
Segmentation : Clustering
This dataset contains a few simulated demographic features for a consumer segmentation project for a subscription based service. The project is about segmenting the target customer base using cluterization techeniques.
Boston dataset
Multiple Linear Regression
This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston, Massachusetts. It was collected in 1978 and each of the 506 entries represent aggregated data about 14 features for homes from various suburbs in Boston, Massachusetts.
The project aims at building a regression model for prediction of Median House Prices.