Skip to content

IoT firmware vulnerability analysis tool based on binary code similarity analysis (BCSA)


Notifications You must be signed in to change notification settings


Repository files navigation


FirmKit is an IoT vulnerability analysis tool based on binary code similarity analysis (BCSA). FirmKit includes ground truth vulnerabilities in custom binaries, such as CGI binaries, for the top eight wireless router and IP camera vendors.

Currently, the FirmKit utilizes TikNib, which is a simple interpretable BCSA tool. In addition to TikNib's numeric presemantic features, FirmKit implements two additional features based on heuristic knowledge. Through empirical analysis of diverse binaries in our previous studies, we discovered that IoT binaries frequently contain function names. Thus, rather than comparing abstracted numeric features, we can directly compare caller and callee names. For example, we can use the names of internal and library functions instead of the numbers of callers and callees. Additionally, we discovered that data strings contained in IoT binaries often contain useful information. CGI binaries include hard-coded strings for parsing URLs, such as HTTP, POST, answer, or password. Therefore, we can use these strings to compute the similarity score. Therefore, we chose to use 1) the strings to which the function refers and 2) the names of the callee functions.

Firmkit computes the string similarity score between two heuristic features using a Jaccard index. Then, the similarity scores of these two heuritic features are averaged with the score obtained by numeric presemantic features.

For more details, please check my thesis paper (Chapter 6).

Ground Truth Vulnerability Dataset

To build the ground truth vulnerability dataset, we manually marked the addresses of vulnerable functions identified in previous studies, FirmAE and BaseSpec.

For each vulnerability, we manually analyzed the binaries in the firmware images using IDA Pro and obtained the addresses of the vulnerable functions. We excluded functions that IDA Pro was unable to analyze. As a result, the final number of vulnerabilities differs from the number discovered in the previous studies.

For more information, please check the ground_truth directory or Ground Truth Results.xlsx. Baseband binary names are anonymized upon the vendor's request.

You need to keep the format correctly!

For the OpenSSL vulnerabilities, we utilized the ASE dataset of BinKit.

How to use

Extract firmware images using FirmAE (Optional)

First, target firmware images should be unpacke using FirmAE.

Prepare presemantic features using BinKit and TikNib (Optional)

Next, we train TikNib to select numeric presemantic features for target compiler options and architectures. In our experiments, we used three architectures (arm, mips, x86 at 32 bits), two optimization levels (O2, O3), and four compilers (gcc-4.9.4, gcc-8.2.0, clang-4.0, calng-7.0). Please check /config/firmae_gcc.

For building the cross-compiling environment and dataset, we used BinKit.

To see the scripts used for this step, please check The shell scripts are used when running TikNib. Please setup right paths for them.

Run FirmKit

We assume that binaries in a target dataset is already unpacked.

Please replace the values in the config file correctly. For example, please check /config/config_openssl_heartbeat.yml. For the target vulnerable functions in this configuration, we used the OpenSSL binaries in the ASE dataset of BinKit.

First, FirmKit extracts .tar.gz files for processing the FirmAE dataset. For the BaseSpec dataset, it does not conduct this.

Then, FirmKit processes the binaries in a target dataset using IDA Pro. For this, we slightly modified the of TikNib. Please check We used IDA Pro v7.6.

Next, FirmKit extracts the features selected in the previous step, from the binaries in a target dataset. If you skipped the previous step, you need to select your own features in the config file. Please check config_firmae_gcc.yml for example.

Finally, FirmKit calculates the similarity score.

All above steps is done by running below commands.

# For testing firmae vulnerabilities
$ python \
    --image_list helper/images_firmae.txt \
    --outdir output \
    --config config/config_firmae_gcc.yml

# For testing openssl CVE-2014-0160 (Heartbleed) in the firmae dataset
$ python \
    --image_list helper/images_firmae.txt \
    --outdir output \
    --config config/config_openssl_heartbeat.yml

# For testing openssl CVE-2015-1791 in firmae dataset
$ python \
    --image_list helper/images_firmae.txt \
    --outdir output \
    --config config/config_openssl_vulseeker.yml

# For testing basespec vulnerabilities
$ python \
    --image_list helper/images_basespec.txt \
    --outdir output_basepsec \
    --config config/config_basespec.yml

To check the similarity scores in a nice format, please check the below commands. Notably, averages all similarity scores and computes the results.

# For checking firmae vulnerabilities
$ python \
    --image_list helper/images_firmae.txt \
    --outdir output \
    --config config/config_firmae_gcc.yml \
    --ground ground_truth/ground_truth_firmae.csv

# For checking openssl CVE-2014-0160 (Heartbleed) in the firmae dataset
$ python \
    --image_list helper/images_firmae.txt \
    --outdir output \
    --config config/config_openssl_heartbeat.yml

# For checking openssl CVE-2015-1791 in firmae dataset
$ python \
    --image_list helper/images_firmae.txt \
    --outdir output \
    --config config/config_openssl_vulseeker.yml

# For checking basespec vulnerabilities
$ python \
    --image_list helper/images_basespec.txt \
    --outdir output_basepsec \
    --config config/config_basespec.yml \
    --ground ground_truth/ground_truth_basespec.csv


The results will be stored in a comma separated file in the output directory. An example of the result file would be example_104.csv.

For the full results, please check Similarity Matching Results.xlsx.


Currently, to fully utilize FirmKit, the experimental environment should be set up first. In the next release we need to automate this procedure with more kind descriptions.


Tested environment

We ran all our experiments on a server equipped with four Intel Xeon E7-8867v4 2.40 GHz CPUs (total 144 cores), 896 GB DDR4 RAM, and 4 TB SSD. We setup Ubuntu 18.04.5 LTS with IDA Pro v7.6 and Python 3.8.9 on the server.


This project has been conducted by the below authors at KAIST.


We would appreciate if you consider citing the previous papers, FirmAE, BaseSpec, BinKit & TikNib.

  author = {Mingeun Kim and Dongkwan Kim and Eunsoo Kim and Suryeon Kim and Yeongjin Jang and Yongdae Kim},
  title = {{FirmAE}: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis},
  booktitle = {Annual Computer Security Applications Conference (ACSAC)},
  year = 2020,
  month = dec,
  address = {Online}

  author = {Eunsoo Kim and Dongkwan Kim and CheolJun Park and Insu Yun and Yongdae Kim},
  title = {{BaseSpec}: Comparative Analysis of Baseband Software and Cellular Specifications for L3 Protocols},
  booktitle = {Proceedings of the 2021 Annual Network and Distributed System Security Symposium (NDSS)},
  year = 2021,
  month = feb,
  address = {Online}

  author={Kim, Dongkwan and Kim, Eunsoo and Cha, Sang Kil and Son, Sooel and Kim, Yongdae},
  journal={IEEE Transactions on Software Engineering (TSE)}, 
  title={Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned}, 


IoT firmware vulnerability analysis tool based on binary code similarity analysis (BCSA)







No packages published