Skip to content
This repository has been archived by the owner on May 2, 2022. It is now read-only.

Latest commit



186 lines (142 loc) · 7.31 KB

File metadata and controls

186 lines (142 loc) · 7.31 KB

Facenet w/ Darknet in Pytorch

Work in progress. Course project for Computer Vision.

A PyTorch implementation of the Facenet model for face recognition. A port of facenet-darknet-inference to PyTorch.


  1. Download weights and extract.
  2. Put facenet.weights, haarcascade_frontalface_alt2.xml and shape_predictor_68_face_landmarks.dat in weights/.
  3. Install dependencies using conda or pip. If you are using conda:
    conda create -n facenet python=3.6
    conda activate facenet
    conda install pytorch torchvision cuda100 -c pytorch  # I am using CUDA 10.0
    conda install opencv -c conda-forge
    conda install dlib -c menpo
    conda install scikit-image matplotlib
  4. Create an empty file names in data/ for storing known name labels.
  5. Run python Type a to register new face, r to recognize face from camera, or q to quit. (The keys fail to work occasionally (frequently 😩), we are looking for a fix (perhaps multithreading).)


We used a lot of code from facenet-darknet-inference and PyTorch-YOLOv3. To be more specific, we used the Facenet config file (facenet.cfg) from facenet-darknet-inference and used their test.cpp and face_io.c as a reference for implementing our and We used most darknet code from PyTorch-YOLOv3, with slight modifications to fit the config file.

Below are the README files copied from these two original repos. Thank you!


Minimal implementation of YOLOv3 in PyTorch.

Table of Contents


YOLOv3: An Incremental Improvement

Joseph Redmon, Ali Farhadi

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at

[Paper] [Original Implementation]


$ git clone
$ cd PyTorch-YOLOv3/
$ sudo pip3 install -r requirements.txt
Download pretrained weights
$ cd weights/
$ bash
Download COCO
$ cd data/
$ bash


Uses pretrained weights to make predictions on images. Below table displays the inference times when using as inputs images scaled to 256x256. The ResNet backbone measurements are taken from the YOLOv3 paper. The Darknet-53 measurement marked shows the inference time of this implementation on my 1080ti card.

Backbone GPU FPS
ResNet-101 Titan X 53
ResNet-152 Titan X 37
Darknet-53 (paper) Titan X 76
Darknet-53 (this impl.) 1080ti 74
$ python3 --image_folder /data/samples


Evaluates the model on COCO test.

$ python3 --weights_path weights/yolov3.weights
Model mAP (min. 50 IoU)
YOLOv3 (paper) 57.9
YOLOv3 (this impl.) 58.2


Data augmentation as well as additional training tricks remains to be implemented. PRs are welcomed! [-h] [--epochs EPOCHS] [--image_folder IMAGE_FOLDER]
                [--batch_size BATCH_SIZE]
                [--model_config_path MODEL_CONFIG_PATH]
                [--data_config_path DATA_CONFIG_PATH]
                [--weights_path WEIGHTS_PATH] [--class_path CLASS_PATH]
                [--conf_thres CONF_THRES] [--nms_thres NMS_THRES]
                [--n_cpu N_CPU] [--img_size IMG_SIZE]
                [--checkpoint_interval CHECKPOINT_INTERVAL]
                [--checkpoint_dir CHECKPOINT_DIR]


  title={YOLOv3: An Incremental Improvement},
  author={Redmon, Joseph and Farhadi, Ali},
  journal = {arXiv},


Face recognition using facenet

1. Intro

Facenet is developed by Google in 2015, the result of the net is the Euclidean embedding of human face.

By careful defined triplet loss function, facenet achieves high accuracy on LFW(0.9963) and FacesDB(0.9512).

Darknet is a fast, easy to read DL framework. Yolo is running based on it.

2. Dependencies

OpenCV for video i/o, face detection, image resizing, warping, and 3D pose estimation.

Dlib for facial landmark detection.

NNPACK for faster neural network computations.

Zenity for text input.

3. Installation and run

sudo apt-get install zenity
cd facenet-darknet-inference
#edit makefile
mkdir data
cd data
touch name
cd ..
mkdir model

download weights and extract in facenet-darknet-inference folder

cd facenet-darknet-inference

4. Note

OpenCV VJ face + Dlib landmark detection is used rather than MTCNN. VJ method is faster, but the unstable cropping may slightly influence recognition accuracy.

KNN is the final classification method, but it is suffered for openset problem. The 1792-d feature before bottleneck layer with normalization is used for KNN, because it has better result in openset than original facenet model, but you can still try the original network configure yourself just replacing facenet.cfg to facenet_full.cfg

The facenet.weight is converted from facenet inception-resnet v1 20180402-114759 model

5. Result

peek 2018-04-19 14-11