Price forecast notebook viewer
Summary: Building a pricing engine for housing market.
There are 2 approaches tried: regression and time series forecasting.
The regression predictor is meant to be used inline to predict the price of an estate, e.g. by an user introducing the required parameters (space size, type of buidling etc.)
The forectasting is done using Pycaret
, for automatized model selection and feature engineering.
Potential improvements:
- add more data (both samples and features), possibly from other sources
- improve the current feature extraction
- try more methods/models
- use a gridsearch for hyperparameter tuning
- determine the metric that presents the most interest to the business/user and optimize for it
The time series forecasting is meant to be used both in-house and by potential clients, for future insights for better decision-making.
The forectasting is done using Pycaret
, for automatized model selection and feature engineering.
The winning model was Auto Arima. The metrics and forecast plots are present in the
price_forecast.ipynb, Time Series section.
Potential improvements:
- try multivariate forecasting
- try prediction LSTM/RNN for forecasting (requires more samples/features)
Recommendations notebook viewer
The work is split in 2 parts:
- Sentiment extraction from user reviews
- Recommender System
In this part, user comments are cleaned with pandas
and python functions, tagged with a language using langdetect
and the sentiment is extracted using nltk
.
The sentiment analyzer returns a score from -1 to 1, representing the rating of the listing by a user.
Potential improvements:
- use ChatGPT/LLMs for comment tagging (example provided in the notebook ChatGPT Tagging section). More granular sentiment per topic ca be extracted (e.g. a user might like the cleaniness but no the location)
- use a faster language detection algorithm and/or parallelize the run (took >1h to run)
The recommender system is a comparison for different implementations of model based collaborative filtering.
I eneded up using the surprise
package, after trying also the SVDS
from scikit-learn
and FUNK_SVD
packages.
It compares 2 models: KNNWithMeans
and SVD
. Further testing is required to determined the better model (for this test set they were very similar).
Potential improvements:
- try more methods for recommender systems
- try classical ML for ranking
- user a more powerful machine that can run on the whole dataset
- move the code from jupyter notebooks to python files (for deployment)
- build inferrence pipeline (model consumption)
- build the respective dashboard/interfaces to make the results accessible by everyone
- Create a new and enviroment, and run:
pip install -r requirements.txt
-
Download and unzip the
recommendations.zip
andforecasting.zip
inside the$project_root/data/
folder. -
Then you can run the 2 jupyter notebooks with
jupyter notebook
command. -
Run
mlflow ui
in the terminal to see the history of model comparisons and runs (for the price forecasting only) -
To run the UI:
cd dashboard
streamlit run pice_model.py