Skip to content

A Web Scraping Task to collect detailed information on faculty members from selected departments with a PhD in Economics.

Notifications You must be signed in to change notification settings

Guganesan-Ilavarasan/Econ-Faculty_Scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Econ-Faculty_Scrapper

A Web Scraping Task to collect detailed information on faculty members from selected departments with a PhD in Economics.


The study aims to collect data from websites of select departments of economics, namely, Economics faculty from She Research Network in India (SheRNI) database and Economics department at Indian Institute of Management Ahmedabad (IIM-A). The intended information to be extracted, wherever applicable, were of individual scholars and members of the forementioned departments' personal academic and professional attributes.

  • Name of the faculty member
  • Department/University affiliation
  • Educational background and subject studied
  • Institution where they obtained their degree(s)
  • Year of obtaining the degree(s)
  • Current position/designation
  • Any additional information that might be relevant (like research interests, publications, etc.)

The data was requested by web scrapping with R programming language utilising libraries rvest and RSelenium and was cleaned using stingr package. rvest and RSelenium were used interchangeably depending upon the accessiblilty demand of the webpages to be sourced- conditional to web-elements interaction and if the webpages were either static or dynamic in nature. The objective is to organize the extracted data into a matrix format that can be easily analyzed using statistical software and store the extracted data in a CSV file with appropriate column names.

High-Level Overview:

  1. SheRNI

    • SheRNI_Master_Scrapper.R

      This module acts as the master run, which summons the auxiliary modules to collect links of individuals of interest and then run those links to retrieve their academic and professional attributes, which is done by calling SheRNI_Links_Scrapper and SheRNI_Profiles_Scrapper modules. This inputs the main homepage of SheRNI's database to filter researchers and outputs a data.frame with educational / research attributes stipulated to be scrapped.

    • SheRNI_Links_Scrapper.R

      This module opens the SheRNI webpage, passes the appropriate filters to derive the target faculty members and retrieves each individuals' profile page's link from the generated search results.

    • SheRNI_Profiles_Scrapper.R

      This module loads all the individual profiles' links gathered previously through SheRNI_Links_Scrapper and retrieves the researchers' academic and research attributes of interest.

    • Output

      This folder has the outputs of SheRNI_Master_Scrapper.R. faculty_profile_links.csv is the resultant return of running SheRNI_Links_Scrapper.R and Researchers_Profiles.csv is the final output of SheRNI_Profiles_Scrapper.R.

    Run order:
    SheRNI_Master_Scrapper.R -> SheRNI_Links_Scrapper.R -> SheRNI_Profiles_Scrapper.R
    
  2. IIM-A

    • IIMA_Econ_Faculty_Scrapper.R

      This script loads the Indian Institute of Management, Ahmadabad economics faculty webpage and retrieves links to all its members' webpage. The individual pages are then loaded to collect educational and research details of the members, which are collated into a data.frame.

    • Output

      This folder has the output of IIMA_Econ_Faculty_Scrapper.R, IIMA_Econ_Dept_Profiles.csv.

About

A Web Scraping Task to collect detailed information on faculty members from selected departments with a PhD in Economics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages