Skip to content

nomomon/bachelor-thesis

Repository files navigation

ML Project Template 🛠️

Welcome to Machine Learning Practical! This template is designed to streamline the development process and boost the quality of your code.

Before getting started with your projects, we encourage you to carefully read the sections below and familiarise yourselves with the proposed tools.

Prerequisites

Make sure you have the following software and tools installed:

  • PyCharm: We recommend using PyCharm as your IDE, since it offers a highly tailored experience for Python development. You can get a free student license here.

  • Pipenv: Pipenv is used for dependency management. This tools enables users to easily create and manage virtual environments. To install Pipenv, use the following command:

    $ pip install --user pipenv

    For detailed installation instructions, click here.

  • Git LFS: Instead of committing large files to your repository, you should store and manage them using Git LFS. For installation information, click here.

Getting Started

Setting up your own repository

  1. Fork this repository.

  2. Clone your fork locally.

  3. Configure a remote pointing to the upstream repository to sync changes between your fork and the original repository.

    git remote add upstream https://github.com/alina-dima/mlp-template.git

    Don't skip this step. We might update the original repository, so you should be able to easily pull our changes.

    To update your forked repo follow these steps:

    1. git fetch upstream

    2. git rebase upstream/main

    3. git push origin main

      Sometimes you may need to use git push --force origin main. Only use this flag the first time you push after you rebased, and be careful as you might overwrite your teammates' changes.

Git LFS

  1. Set it up for your user account (only once, not each time you want to use it).
    git lfs install
  2. Select the files that Git LFS should manage. To track all files of a certain type, you can use a wildcard as in the command below.
    git lfs track "*.psd"
  3. Add .gitattributes to the staging area.
    git add .gitattributes

That's all, you can commit and push as always. The tracked files will be automatically stored with Git LFS.

Pipenv

This tool is incredibly easy to use. Let's install our first package, which you will all need in your projects.

pipenv install pre-commit

After running this command, you will notice that two files were created, namely, Pipfile and Pipfile.lock. Pipfile is the configuration file that specifies all the dependencies in your virtual environment.

To uninstall a package, you can run the command:

pipenv uninstall <package-name>

To activate the virtual environment, run pipenv shell. You can now use the environment as you wish. To deactivate the environment run the command exit.

If you already have access to a Pipfile, you can install the dependencies using pipenv install.

For a comprehensive list of commands, consult the official documentation.

Unit testing

You are expected to test your code using unit testing, which is a technique where small individual components of your code are tested in isolation.

An example is given in tests/test_main.py, which uses the standard unittest Python module to test whether the function hello_world from main.py works as expected.

To run all the tests developed using unittest, simply use:

python -m unittest discover tests

If you wish to see additional details, run it in verbose mode:

python -m unittest discover -v tests

Pre-commit

Another good coding practice is using pre-commit hooks. This is used to inspect the code before committing to ensure it matches your standards.

In this course, we will be using two hooks (already configured in .pre-commit-config.yaml):

  • Unit testing
  • Flake8 (checks your code for errors, styling issues and complexity)

Since we have already configured the hooks, all you need to do is run:

pre-commit install

Now pre-commit will automatically run whenever you want to commit something to the repository.

Get Coding

You are now ready to start working on your projects.

We recommend following the same folder structure as in the original repository. This will make it easier for you to have cleaner and consistent code, and easier for us to follow your progress and help you.

Your repository should look something like this:

├───data  # Stores .csv
├───models  # Stores .pkl
├───notebooks  # Contains experimental .ipynbs
├───project_name
│   ├───data  # For data processing, not storing .csv
│   ├───features
│   └───models  # For model creation, not storing .pkl
├───reports
├───tests
│   ├───data
│   ├───features
│   └───models
├───.gitignore
├───.pre-commit-config.yaml
├───main.py
├───train_model.py
├───Pipfile
├───Pipfile.lock
├───README.md

Good luck and happy coding! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages