A Web Scraping Task to collect detailed information on faculty members from selected departments with a PhD in Economics.
The study aims to collect data from websites of select departments of economics, namely, Economics faculty from She Research Network in India (SheRNI) database and Economics department at Indian Institute of Management Ahmedabad (IIM-A). The intended information to be extracted, wherever applicable, were of individual scholars and members of the forementioned departments' personal academic and professional attributes.
- Name of the faculty member
- Department/University affiliation
- Educational background and subject studied
- Institution where they obtained their degree(s)
- Year of obtaining the degree(s)
- Current position/designation
- Any additional information that might be relevant (like research interests, publications, etc.)
The data was requested by web scrapping with R programming language utilising libraries rvest
and RSelenium
and was cleaned using stingr
package. rvest
and RSelenium
were used interchangeably depending upon the accessiblilty demand of the webpages to be sourced- conditional to web-elements interaction and if the webpages were either static or dynamic in nature. The objective is to organize the extracted data into a matrix format that can be easily analyzed using statistical software and store the extracted data in a CSV file with appropriate column names.
-
-
This module acts as the master run, which summons the auxiliary modules to collect links of individuals of interest and then run those links to retrieve their academic and professional attributes, which is done by calling
SheRNI_Links_Scrapper
andSheRNI_Profiles_Scrapper
modules. This inputs the main homepage of SheRNI's database to filter researchers and outputs a data.frame with educational / research attributes stipulated to be scrapped. -
This module opens the SheRNI webpage, passes the appropriate filters to derive the target faculty members and retrieves each individuals' profile page's link from the generated search results.
-
This module loads all the individual profiles' links gathered previously through
SheRNI_Links_Scrapper
and retrieves the researchers' academic and research attributes of interest. -
This folder has the outputs of
SheRNI_Master_Scrapper.R
.faculty_profile_links.csv
is the resultant return of runningSheRNI_Links_Scrapper.R
andResearchers_Profiles.csv
is the final output ofSheRNI_Profiles_Scrapper.R
.
SheRNI_Master_Scrapper.R -> SheRNI_Links_Scrapper.R -> SheRNI_Profiles_Scrapper.R
-
-
-
This script loads the Indian Institute of Management, Ahmadabad economics faculty webpage and retrieves links to all its members' webpage. The individual pages are then loaded to collect educational and research details of the members, which are collated into a data.frame.
-
This folder has the output of
IIMA_Econ_Faculty_Scrapper.R
,IIMA_Econ_Dept_Profiles.csv
.
-