Housing-ML

1. Rental Price Development

Summary: Building a pricing engine for housing market.

There are 2 approaches tried: regression and time series forecasting.

Regression

The regression predictor is meant to be used inline to predict the price of an estate, e.g. by an user introducing the required parameters (space size, type of buidling etc.)

The forectasting is done using Pycaret, for automatized model selection and feature engineering.

Potential improvements:

add more data (both samples and features), possibly from other sources
improve the current feature extraction
try more methods/models
use a gridsearch for hyperparameter tuning
determine the metric that presents the most interest to the business/user and optimize for it

Time Series Forecasting

The time series forecasting is meant to be used both in-house and by potential clients, for future insights for better decision-making.

The forectasting is done using Pycaret, for automatized model selection and feature engineering. The winning model was Auto Arima. The metrics and forecast plots are present in the price_forecast.ipynb, Time Series section.

Potential improvements:

try multivariate forecasting
try prediction LSTM/RNN for forecasting (requires more samples/features)

2. Review-based User Recommendations

Recommendations notebook viewer

The work is split in 2 parts:

Sentiment extraction from user reviews
Recommender System

Sentiment extraction from user reviews

In this part, user comments are cleaned with pandas and python functions, tagged with a language using langdetect and the sentiment is extracted using nltk.

The sentiment analyzer returns a score from -1 to 1, representing the rating of the listing by a user.

Potential improvements:

use ChatGPT/LLMs for comment tagging (example provided in the notebook ChatGPT Tagging section). More granular sentiment per topic ca be extracted (e.g. a user might like the cleaniness but no the location)
use a faster language detection algorithm and/or parallelize the run (took >1h to run)

Recommender System

The recommender system is a comparison for different implementations of model based collaborative filtering. I eneded up using the surprise package, after trying also the SVDS from scikit-learn and FUNK_SVD packages.

It compares 2 models: KNNWithMeans and SVD. Further testing is required to determined the better model (for this test set they were very similar).

Potential improvements:

try more methods for recommender systems
try classical ML for ranking
user a more powerful machine that can run on the whole dataset

Next Steps

move the code from jupyter notebooks to python files (for deployment)
build inferrence pipeline (model consumption)
build the respective dashboard/interfaces to make the results accessible by everyone

Running instructions:

Create a new and enviroment, and run:

pip install -r requirements.txt

Download and unzip the recommendations.zip and forecasting.zip inside the $project_root/data/ folder.
Then you can run the 2 jupyter notebooks with jupyter notebook command.
Run mlflow ui in the terminal to see the history of model comparisons and runs (for the price forecasting only)
To run the UI:

cd dashboard
streamlit run pice_model.py

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dashboard		dashboard
mlruns		mlruns
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
helpers.py		helpers.py
housing-price-regression.pkl		housing-price-regression.pkl
price_forecast.html		price_forecast.html
price_forecast.ipynb		price_forecast.ipynb
recommendations.html		recommendations.html
recommendations.ipynb		recommendations.ipynb
requirements.txt		requirements.txt
ts-price-forecast.pkl		ts-price-forecast.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Housing-ML

1. Rental Price Development

Regression

Time Series Forecasting

2. Review-based User Recommendations

Sentiment extraction from user reviews

Recommender System

Next Steps

Running instructions:

Streamlit app:

MLFlow UI:

About

Releases

Packages

Languages

andreicap/housing-ml

Folders and files

Latest commit

History

Repository files navigation

Housing-ML

1. Rental Price Development

Regression

Time Series Forecasting

2. Review-based User Recommendations

Sentiment extraction from user reviews

Recommender System

Next Steps

Running instructions:

Streamlit app:

MLFlow UI:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages