Digit Recognizer

A "from-the-ground-up" implementation of the Kaggle Digit Recognizer problem in machine learning.

This is a Rust implementation of the initial hand-coded C# solution provided by Mathias Brandewinder in his book, Machine Learning Projects for .NET Developers.

From his description:

What we have is a dataset of 50,000 images. Each image is a single digit, written down by a human, and scanned in 28 x 28 pixels resolution, encoded in grayscale, with each pixel taking one of 256 possible shades of gray, from full white to full black. For each scan, we also know the correct answer, that is, what number the human wrote down. This dataset is known as the training set. Our goal now is to write a program that will learn from the training set and use that information to make predictions for images it has never seen before: is it a zero, a one, and so on.

Technically, this is known as a classification problem: Our goal is to separate images between known "categories," a.k.a. the classes (hence the word "classification"). In this case, we have ten classes, one for each single digit from 0 to 9.

The solution provided here (See src/main.rs) does not use any libraries, hence the designation "from-the-ground-up."

It makes use of cost functions. Two are implemented here - Manhattan Distance and Euclidean Distance.

Here is a formal definition of Manhattan distance.

The cost function is a measure of how close the model's prediction is to the actual value. So the closer to zero the above function evaluates the more accurate the prediction.

The output will look something like:

Digit: 8 - Match

Digit: 7 - Match

Digit: 2 - Match

Digit: 6 - Match

Digit: 3 - Match

Digit: 1 - Match

Digit: 2 - Match

...

Digit: 4 - Mismatch

Correctly classified: 96.30%

Here is a plot of one of the images.

Performance

How does the performance (timings) compare with the Python and C# versions? The C# version is from the Machine Learning Projects for .NET Developers book. The Python version you can find here.

Python - 12.32s (3.11.1 64-bit)

C# - 1.03s (.NET 4.8)

C# - 0.52s (.NET 6.0)

Rust Debug - 5.72s

Rust Release - 0.28s

Rust Release (with parallelised 'Correctly Classified' calculation) - 0.20s

Rust release builds are much faster than debug builds.

A release build typically runs much faster than a debug build. 10-100x speedups over debug builds are common!

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
digit.png		digit.png
manhattan-distance.png		manhattan-distance.png
trainingsample.csv		trainingsample.csv
validationsample.csv		validationsample.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Digit Recognizer

Performance

About

Releases

Packages

Languages

kevinmcfarlane/rust-digit-recognizer

Folders and files

Latest commit

History

Repository files navigation

Digit Recognizer

Performance

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages