Skip to content

Visualization & Analysis of Google App's rating, reviews, price, and installs against their respective categories.

Notifications You must be signed in to change notification settings

Reemalraeai/Google-Apps-Analysis

Repository files navigation

Google Apps Analysis

PROJECT DESCRIPTION

Project Purpose: Analyzing Google Apps’ Rating, Price, Reviews, installs against their respective categories, and they are: art and design, auto and vehicles, beauty, books and reference, business, comics, communication, dating, education, entertainment, events, finance, food and drink, health and fitness, house and home, libraries and demo, lifestyle, game, family, medical, social, shopping, photography, sports, travel and local, tools, personalization, productivity, parenting, weather, video players, news and magazines, maps and navigation.

Data Source:

Google Play Store Apps https://www.kaggle.com/lava18/google-play-store-apps

PORJECT PROCESS

Importing & Exploring:

  • Import suicide rates dataset
  • Explore the dataset o see the first 5 rows (head()) o see the number of columns & rows (shape) o see the statistical summary (describe()) o Check the missing values (info()) o create a boxplot & histogram to inspect the distribution of the data and inspect the outliers

Boxplot 1

Hist !

As it is shown in the boxplot, there is an outlier value reflects a rating score approximately equals to 19, and this rating score is not true (Rating score in this case is from 0 to 5)

Data Cleaning

1- Inspecting Outliers

Inspecting the outliers value by calling google_data[google_data.Rating > 5] and droping the outliers

Boxplot 2

Hist 2 with Comment

2- Removing 90% Empty Rows

Create a threshold to help in dropping 90% empty rows, and use the variable to drop the 90% empty rows

threshold = len(google_data)*0.1 #10% of the rows (10840)

Data Imputation & Manipulation

1- Filling the Missing Values in Rating Column

Define a function that will impute median for the missing values and then apply it to Rating column

'''def impute_median(series): return series.fillna(series.median()) '''

2- Filling the Missing Values of the Categorical Type Column

Fill the missing values in Type, Current Ver, Andorid Ver Columns (because they are categorical) with mode

3- Converting Columns into Numerical Values

Convert Price, Reviews, and Installs columns into numerical values

After finishing the imputation and manipulation, take a look at the dataset using head() & describe()

Data Visualization

The point is to visualize & analyze rating, price, reviews, and installs against their respective categories. To do so, we need to group by category and aggregate the values by mean for Rating, Reviews, & Installs and by sum for Price.

grp = google_data.groupby('Category')
Rating = grp['Rating'].agg(np.mean)
Price = grp['Price'].agg(np.sum)
Reviews = grp['Reviews'].agg(np.mean)
Installs = grp["Installs"].agg(np.mean)

After grouping, each variable was plotted against the respective category as it is shown below

Category wise Installs Category wise Price Category wise Rating Category wise Reviews

RESULTS

According to the analysis:

Education & Event categories have the highest rating and Dating category has the lowest rating.

Finance, Family, Lifestyle, and Medical Categories score the highest in terms of price respectively.

Communication, Social & Game categories score the highest number of reviews.

Communication category has the highest number of installs.

SPECIAL THANKS TO DIVYA THANKUR FOR HER GUIDANCE

https://github.com/DivyaThakur24

https://www.youtube.com/channel/UCou9uUFltDazqO4I6Pu7Cpg

About

Visualization & Analysis of Google App's rating, reviews, price, and installs against their respective categories.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages