Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 1.88 KB

README.md

File metadata and controls

34 lines (25 loc) · 1.88 KB

Welcome to My First Scraper


Introduction

meme_scraping

Github trending's page

Technical specifications

Using python libraries requests and beautifulsoup4, return a CSV of the TOP 25 trending repositories from Github.

  1. Request (with request)
  2. Extract (with beautifulsoup4)
  3. Transform
  4. Format

Part 0: Request Write a function prototyped: def request_github_trending(url) it will return the result of Request.

Part 1: Extract Write a function prototyped: def extract(page) to find_all instances of HTML code of repository rows and return it. You should use BeautifulSoup. :-)

Part 2: Transform Write a function prototyped: def transform(html_repos) taking an array of all the instances of HTML code of the repository row. It will return an array of hash following this format: [{'developer': NAME, 'repository_name': REPOS_NAME, 'nbr_stars': NBR_STARS}, ...]

Part 3: Format Write a function prototyped: def format(repositories_data) taking a repository array of hash and transforming it and returning it into a CSV string. Each column will be separated by , and each line by \n The columns will be Developer,Repository Name,Number of Stars

image

Demo version