Skip to content

Working with an industrial scale data set to build a classification model to predict credit card default, and help creating a better customer experience for cardholders.

Notifications You must be signed in to change notification settings

m3redithw/american-express-default-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

American Express Default Prediction

Team Member: Jarad Angel, Meredith Wang

Date: Aug - Sep 2022

Python Pandas NumPy seaborn sklearn SciPy

Whether out at a restaurant or buying tickets to a concert, modern life counts on the convenience of a credit card to make daily purchases. It saves us from carrying large amounts of cash and also can advance a full purchase that can be paid over time. How do card issuers know we’ll pay back what we charge? That’s a complex problem with many existing solutions—and even more potential improvements, to be explored in this competition.

Credit default prediction is central to managing risk in a consumer lending business. Credit default prediction allows lenders to optimize lending decisions, which leads to a better customer experience and sound business economics. Current models exist to help manage risk. But it's possible to create better models that can outperform those currently in use.

Business Goals

▪️ Apply our machine learning skills to predict credit default.

▪️ Leverage an industrial scale data set to build a machine learning model that challenges the current model in production.


Timeline

▪️ May 25, 2022 - Start Date.

▪️ August 17, 2022 - Entry Deadline. You must accept the competition rules before this date in order to compete.

▪️ August 17, 2022 - Team Merger Deadline. This is the last day participants may join or merge teams.

▪️ August 24, 2022 - Final Submission Deadline.


Data Context

Training, validation, and testing datasets include time-series behavioral data and anonymized customer profile information.


Data Context

The objective of this competition is to predict the probability that a customer does not pay back their credit card balance amount in the future based on their monthly customer profile. The target binary variable is calculated by observing 18 months performance window after the latest credit card statement, and if the customer does not pay due amount in 120 days after their latest statement date it is considered a default event.

The dataset contains aggregated profile features for each customer at each statement date. Features are anonymized and normalized, and fall into the following general categories:

- D_* = Delinquency variables
- S_* = Spend variables
- P_* = Payment variables
- B_* = Balance variables
- R_* = Risk variables

Process

1️⃣ Data Acquisition

acqure.py

2️⃣ Data Preparation

Data Cleaning

3️⃣ Exploratory Analysis

4️⃣ Statistical Testing & Modeling

5️⃣ Modeling Evaluation


Steps to Reproduce

  • [x]
  • Clone the repo
  • [x]
  • [x]
  • [x]

Key Findings

▪️

▪️

▪️

▪️


Recommendations

▪️

▪️

▪️

About

Working with an industrial scale data set to build a classification model to predict credit card default, and help creating a better customer experience for cardholders.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages