Google Apps Analysis

PROJECT DESCRIPTION

Project Purpose: Analyzing Google Apps’ Rating, Price, Reviews, installs against their respective categories, and they are: art and design, auto and vehicles, beauty, books and reference, business, comics, communication, dating, education, entertainment, events, finance, food and drink, health and fitness, house and home, libraries and demo, lifestyle, game, family, medical, social, shopping, photography, sports, travel and local, tools, personalization, productivity, parenting, weather, video players, news and magazines, maps and navigation.

Data Source:

Google Play Store Apps https://www.kaggle.com/lava18/google-play-store-apps

PORJECT PROCESS

Importing & Exploring:

Import suicide rates dataset
Explore the dataset o see the first 5 rows (head()) o see the number of columns & rows (shape) o see the statistical summary (describe()) o Check the missing values (info()) o create a boxplot & histogram to inspect the distribution of the data and inspect the outliers

As it is shown in the boxplot, there is an outlier value reflects a rating score approximately equals to 19, and this rating score is not true (Rating score in this case is from 0 to 5)

Data Cleaning

1- Inspecting Outliers

Inspecting the outliers value by calling google_data[google_data.Rating > 5] and droping the outliers

2- Removing 90% Empty Rows

Create a threshold to help in dropping 90% empty rows, and use the variable to drop the 90% empty rows

threshold = len(google_data)*0.1 #10% of the rows (10840)

Data Imputation & Manipulation

1- Filling the Missing Values in Rating Column

Define a function that will impute median for the missing values and then apply it to Rating column

'''def impute_median(series): return series.fillna(series.median()) '''

2- Filling the Missing Values of the Categorical Type Column

Fill the missing values in Type, Current Ver, Andorid Ver Columns (because they are categorical) with mode

3- Converting Columns into Numerical Values

Convert Price, Reviews, and Installs columns into numerical values

After finishing the imputation and manipulation, take a look at the dataset using head() & describe()

Data Visualization

The point is to visualize & analyze rating, price, reviews, and installs against their respective categories. To do so, we need to group by category and aggregate the values by mean for Rating, Reviews, & Installs and by sum for Price.

grp = google_data.groupby('Category')
Rating = grp['Rating'].agg(np.mean)
Price = grp['Price'].agg(np.sum)
Reviews = grp['Reviews'].agg(np.mean)
Installs = grp["Installs"].agg(np.mean)

After grouping, each variable was plotted against the respective category as it is shown below

RESULTS

According to the analysis:

Education & Event categories have the highest rating and Dating category has the lowest rating.

Finance, Family, Lifestyle, and Medical Categories score the highest in terms of price respectively.

Communication, Social & Game categories score the highest number of reviews.

Communication category has the highest number of installs.

SPECIAL THANKS TO DIVYA THANKUR FOR HER GUIDANCE

https://github.com/DivyaThakur24

https://www.youtube.com/channel/UCou9uUFltDazqO4I6Pu7Cpg

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Google Apps Analysis .html		Google Apps Analysis .html
Google Apps Analysis .ipynb		Google Apps Analysis .ipynb
README.md		README.md
googleplaystore.csv		googleplaystore.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Apps Analysis

PROJECT DESCRIPTION

PORJECT PROCESS

About

Releases

Packages

Languages

Reemalraeai/Google-Apps-Analysis

Folders and files

Latest commit

History

Repository files navigation

Google Apps Analysis

PROJECT DESCRIPTION

PORJECT PROCESS

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages