Skip to content

noahfl/motif-site_grouping

Repository files navigation

ABOUT

The premise of this project is to examine the C functions used by web browsers and see what information can be extrapolated from them. Variations on this concept include running based on limited information (small windows of functions).

This repo contains the standard program that reads an entire strace file as well as the program that reads small chunks of strace files and determines which site they're generated from.

#Requirements#

  • SciPy
  • NumPy
  • Scikit-learn

#Project Notes#

sites:

Command to use:

strace -o ./[etc.] wget -e robots=off --wait 1 --page-requisites [link]

Soundcloud test links (streaming site example):

CERN test links (lightweight site example):

//TODO:

group websites into categories, i.e. university sites, streaming sites, news sites, wikipedia pages (lists vs articles), etc.

can it differentiate between website types?

Search site terms:

  • homepage
  • wake forest
  • nyu
  • linux
  • computer science

University sites:

  • wfu.edu
  • nyu.edu
  • duke.edu
  • unc.edu
  • utexas.edu
  • berkeley.edu
  • usc.edu
  • ucla.edu
  • cornell.edu
  • uchicago.edu

Valgrind on server:

  • run strace on valgrind runs
  • see if you can see the difference between high and low memory usage

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages