credit-card-fraud-using-SMOTE

Project Overview

Here we, handle imbalanced dataset by applying SMOTE and remove the extreme outliers
Compare which technique fits best to find feasible and resource efficient patterns in the dataset
Logistic Regression, Decision tree, K-Nearest Neighbor, Naive Bayes, Support Vector Machine, Random Forest Regressors and XGBoost to reach the best model.

Code and Resources Used

Python Version: 3.9
Packages: pandas, matplotlib, seaborn, numpy, sklearn, xgboost, imblearn

Dataset

Source : Kaggle - https://www.kaggle.com/mlg-ulb/creditcardfraud
The datasets contains transactions made by credit cards in September 2013 by european cardholders that occurred in two days.
It contains only numerical input variables which are the result of a PCA transformation. The only features which have not been transformed with PCA are 'Time' and 'Amount'.

Project Detail

Handling Imbalanced Dataset: For this project we use SMOTE (Synthetic Minority Over-sampling Technique) technique to handle imbalanced dataset.

Machine Learning Models: For this project I will implement the data to these models and determine which one fits the best:

Logistic Regression
Decision Tree
Naive Bayes
K-Nearest Neighbor
Support Vector Machine
Random Forest
XGBoost

Model Parameter: To determine which machine learning model fits the dataset the best, I will use this parameter to decide:

Precision-Recall
F1-Score
Accuracy

Step-by-Step Project:

Data Wrangling

Scaling the 'Amount' and 'Time' columns
Use SMOTE to handle imbalanced dataset and see the correlation of each columns
Determine which feature have a negative and positive correlation
Remove extereme outliers from features that have a high correlation with the classes

Data Analysis

Decide which technique fits the best for the imbalanced dataset.

Handling Skewness

The data is heavily skewed therefore we use a power transformer to remove the skewness. Dataset before using power transformer- Dataset after using power transformer-

Model Building

Use Logistic Regression, KNN, SVC, and Decision Tree as the model and compare each one of them

Results

The dataset is an imbalanced dataset with the class distribution like the above figure

Conclusion

The best technique to handle this imbalanced dataset is SMOTE
The best fitting model for this dataset is surprisingly KNN

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
README.md		README.md
credit_card_fraud_detection (1).ipynb		credit_card_fraud_detection (1).ipynb
self_utils.py		self_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

credit-card-fraud-using-SMOTE

Project Overview

Code and Resources Used

Dataset

Project Detail

Step-by-Step Project:

Data Wrangling

Data Analysis

Handling Skewness

Model Building

Results

Conclusion

About

Releases

Packages

Languages

divyansh10100/credit-card-fraud-using-SMOTE

Folders and files

Latest commit

History

Repository files navigation

credit-card-fraud-using-SMOTE

Project Overview

Code and Resources Used

Dataset

Project Detail

Step-by-Step Project:

Data Wrangling

Data Analysis

Handling Skewness

Model Building

Results

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages