Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Last update: Apr 05, 2022

Related tags

Web Crawling github-scraper-app

Overview

Github Scraper

Github scraper app is used to scrape data for a specific user profile.
Github scraper app gets a github profile name and check whether the given user name is exists or not.
If the user name exists, app will scrape the data from that github profile.
If the user name doesn't exists, app displays a info message.
You can download the scraped data in CSV,JSON and pandas profiling HTML report formats.

Installation :-

To install all necessary requirement packages for the app 👇

pip install -r requirements.txt

Packages Used :-

import requests
import pandas as pd
import streamlit as st
from bs4 import BeautifulSoup
from pandas_profiling import ProfileReport
from streamlit_pandas_profiling import st_profile_report

Function To Scrape the Data :-

def ScrapeData(user_name):
    url = "https://github.com/{}?tab=repositories".format(user_name)
    page = requests.get(url) 
    soup = BeautifulSoup(page.content, "html.parser")
    info = {"name": soup.find(class_="vcard-fullname").get_text()}
    info["image_url"] = soup.find(class_="avatar-user")["src"]
    info["followers"] = (
        soup.select_one("a[href*=followers]").get_text().strip().split("\n")[0]
    )
    info["following"] = (
        soup.select_one("a[href*=following]").get_text().strip().split("\n")[0]
    )

    try:
        info["location"] = soup.select_one("li[itemprop*=home]").get_text().strip()
    except:
        info["location"] = ""

    try:
        info["url"] = soup.select_one("li[itemprop*=url]").get_text().strip()
    except:
        info["url"] = ""

    repositories = soup.find_all(class_="source")
    repo_info = []
    for repo in repositories:
        try:
            name = repo.select_one("a[itemprop*=codeRepository]").get_text().strip()
            link = "https://github.com/{}/{}".format(user_name, name)
        except:
            name = ""
            link = ""
            
        try:
            updated = repo.find("relative-time").get_text()
        except:
            updated = ""

        try:
            language = repo.select_one("span[itemprop*=programmingLanguage]").get_text()
        except:
            language = ""

        try:
            description = repo.select_one("p[itemprop*=description]").get_text().strip()
        except:
            description = ""

        repo_info.append(
            {
                "name": name,
                "link": link,
                "updated ": updated,
                "language": language,
                "description": description,
            }
        )
    repo_info = pd.DataFrame(repo_info)
    return info, repo_info

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Related tags

Overview

Github Scraper

Installation :-

Packages Used :-

Function To Scrape the Data :-

Demo GIF Image 👇 :-

Owner

Siva Prakash

Jobinja.ir jobs scraper.

Web Scraping Instagram photos with Selenium by only using a hashtag.

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

TikTok Username Swapper/Claimer/etc

This is a python api to scrape search results from a url.

腾讯课堂，模拟登陆，获取课程信息，视频下载，视频解密。

FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

API to parse tibia.com content into python objects.

京东云无线宝积分推送，支持查看多设备积分使用情况

Crawler in Python 3.7, 3.8. 3.9. Pypy3

Scrapy-soccer-games - Scraping information about soccer games from a few websites

Scrapes proxies and saves them to a text file

Web and PDF Scraper Refactoring

fork huanghyw/jd_seckill

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

Scrapes all articles and their headlines from theonion.com

This repo has the source code for the crawler and data crawled from auto-data.net