download NCERT books using scrapy

Overview

download_ncert_books

download NCERT books using scrapy

NCERT_CLASS_1 NCERT_CLASS_2 NCERT_CLASS_3 NCERT_CLASS_4 NCERT_CLASS_5 NCERT_CLASS_6 NCERT_CLASS_7 NCERT_CLASS_8 NCERT_CLASS_9 NCERT_CLASS_10 NCERT_CLASS_11 NCERT_CLASS_12

Downloading Books:

You can either use the spider by cloning this repo and following the instructions given below
or
You can download the books direcly from the release section or by clicking on the badges above

There are 2 different kind of zips in the release section for every class

  1. Book wise NCERT_CLASS_ClassNo_Subject_BookName.zip : These zips contain the Chapters of the BookName for the Subject of the ClassNo
  2. Books Text Class_ClassNo_Text.zip : These zips contain the text extracted from all the books of the ClassNo

How to use the spider

Initial Setup

git clone https://github.com/nit-in/download_ncert_books.git
cd download_ncert_books
pip install -r requirements.txt

to run the spider

scrapy crawl --nolog ncert

and follow the prompts

for example if you want to download Class 11th Economics Book

 scrapy crawl  --nolog ncert                                                                                                                                      ─╯

Enter the class:        11

Select one the subjects:
Enter 1 for Sanskrit
Enter 2 for Accountancy
Enter 3 for Chemistry
Enter 4 for Mathematics
Enter 5 for Economics
Enter 6 for Psychology
Enter 7 for Geography

and so on ...

Enter subject number:   5

Select one the books:
Enter 1 for Indian Economic Development
Enter 2 for Statistics for Economics
Enter 3 for Sankhyiki
Enter 4 for Bhartiya Airthryavstha Ka Vikas 
Enter 5 for Hindustan Ki Moaashi Tarraqqi(Urdu)
Enter 6 for Shumariyaat Bar-e-Mushiyat(Urdu)

Enter book number:      1

Downloading...  Class: Class11  Subject: Economics      Book: Indian_Economic_Development       Chapters: 10


downloading keec1ps.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec1ps.pdf
downloading keec101.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec101.pdf
downloading keec102.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec102.pdf

			OR 

to download multiple books

enter their numbers separated by commas

e.g. 

Select one the books:
Enter 1 for Indian Economic Development
Enter 2 for Statistics for Economics
Enter 3 for Sankhyiki
Enter 4 for Bhartiya Airthryavstha Ka Vikas 
Enter 5 for Hindustan Ki Moaashi Tarraqqi(Urdu)
Enter 6 for Shumariyaat Bar-e-Mushiyat(Urdu)

Enter book number:      1,2

if you want to see scrapy spider log

scrapy shell ncert
You might also like...
Snowflake database loading utility with Scrapy integration

Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

Scraping news from Ucsal portal with Scrapy.

NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

Fundamentus scrapy

Fundamentus_scrapy Baixa informacões que os outros scrapys do fundamentus não realizam. Para iniciar (python main.py), sera criado um arquivo chamado

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Scrapy-based cyber security news finder

Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of

Scrapy uses Request and Response objects for crawling web sites.

Requests and Responses¶ Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and p

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Comments
  • Bump requests from 2.26.0 to 2.28.1

    Bump requests from 2.26.0 to 2.28.1

    Bumps requests from 2.26.0 to 2.28.1.

    Release notes

    Sourced from requests's releases.

    v2.28.1

    2.28.1 (2022-06-29)

    Improvements

    • Speed optimization in iter_content with transition to yield from. (#6170)

    Dependencies

    • Added support for chardet 5.0.0 (#6179)
    • Added support for charset-normalizer 2.1.0 (#6169)

    New Contributors

    Full Changelog: https://github.com/psf/requests/blob/main/HISTORY.md#2281-2022-06-29

    v2.28.0

    2.28.0 (2022-06-09)

    Deprecations

    • ⚠️ Requests has officially dropped support for Python 2.7. ⚠️ (#6091)
    • Requests has officially dropped support for Python 3.6 (including pypy3). (#6091)

    Improvements

    • Wrap JSON parsing issues in Request's JSONDecodeError for payloads without an encoding to make json() API consistent. (#6097)
    • Parse header components consistently, raising an InvalidHeader error in all invalid cases. (#6154)
    • Added provisional 3.11 support with current beta build. (#6155)
    • Requests got a makeover and we decided to paint it black. (#6095)

    Bugfixes

    • Fixed bug where setting CURL_CA_BUNDLE to an empty string would disable cert verification. All Requests 2.x versions before 2.28.0 are affected. (#6074)
    • Fixed urllib3 exception leak, wrapping urllib3.exceptions.SSLError with requests.exceptions.SSLError for content and iter_content. (#6057)
    • Fixed issue where invalid Windows registry entires caused proxy resolution to raise an exception rather than ignoring the entry. (#6149)
    • Fixed issue where entire payload could be included in the error message for JSONDecodeError. (#6079)

    New Contributors

    ... (truncated)

    Changelog

    Sourced from requests's changelog.

    2.28.1 (2022-06-29)

    Improvements

    • Speed optimization in iter_content with transition to yield from. (#6170)

    Dependencies

    • Added support for chardet 5.0.0 (#6179)
    • Added support for charset-normalizer 2.1.0 (#6169)

    2.28.0 (2022-06-09)

    Deprecations

    • ⚠️ Requests has officially dropped support for Python 2.7. ⚠️ (#6091)
    • Requests has officially dropped support for Python 3.6 (including pypy3.6). (#6091)

    Improvements

    • Wrap JSON parsing issues in Request's JSONDecodeError for payloads without an encoding to make json() API consistent. (#6097)
    • Parse header components consistently, raising an InvalidHeader error in all invalid cases. (#6154)
    • Added provisional 3.11 support with current beta build. (#6155)
    • Requests got a makeover and we decided to paint it black. (#6095)

    Bugfixes

    • Fixed bug where setting CURL_CA_BUNDLE to an empty string would disable cert verification. All Requests 2.x versions before 2.28.0 are affected. (#6074)
    • Fixed urllib3 exception leak, wrapping urllib3.exceptions.SSLError with requests.exceptions.SSLError for content and iter_content. (#6057)
    • Fixed issue where invalid Windows registry entires caused proxy resolution to raise an exception rather than ignoring the entry. (#6149)
    • Fixed issue where entire payload could be included in the error message for JSONDecodeError. (#6036)

    2.27.1 (2022-01-05)

    Bugfixes

    • Fixed parsing issue that resulted in the auth component being dropped from proxy URLs. (#6028)

    2.27.0 (2022-01-03)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump itemadapter from 0.4.0 to 0.7.0

    Bump itemadapter from 0.4.0 to 0.7.0

    Bumps itemadapter from 0.4.0 to 0.7.0.

    Release notes

    Sourced from itemadapter's releases.

    v0.7.0

    What's Changed

    New Contributors

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.6.0...v0.7.0

    v0.6.0

    What's Changed

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.5.0...v0.6.0

    v0.5.0

    What's Changed

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.4.0...v0.5.0

    Changelog

    Sourced from itemadapter's changelog.

    0.7.0 (2022-08-02)

    ItemAdapter.get_field_names_from_class (#64)

    0.6.0 (2022-05-12)

    Slight performance improvement (#62)

    0.5.0 (2022-03-18)

    Improve performance by removing imports inside functions (#60)

    Commits
    • 0bd037c Bump version: 0.6.0 → 0.7.0
    • 8f3826a Update changelog for 0.7.0
    • 900ae14 ItemAdapter.get_field_names_from_class (#64)
    • 927ee25 Bump version: 0.5.0 → 0.6.0
    • 86f82ea Update changelog for 0.6.0
    • 8f239bc Merge pull request #62 from scrapy/performance
    • 60c9ccc Merge pull request #61 from scrapy/fix-repr
    • 8733014 Replace 'any' ocurrences
    • d66aa62 Remove hardcoded class name in ItemAdapter.repr
    • 1203b5e Bump version: 0.4.0 → 0.5.0
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump scrapy from 2.5.0 to 2.7.1

    Bump scrapy from 2.5.0 to 2.7.1

    Bumps scrapy from 2.5.0 to 2.7.1.

    Release notes

    Sourced from scrapy's releases.

    2.7.1

    • Relaxed the restriction introduced in 2.6.2 so that the Proxy-Authentication header can again be set explicitly in certain cases, restoring compatibility with scrapy-zyte-smartproxy 2.1.0 and older
    • Bug fixes

    See the full changelog

    2.7.0

    See the full changelog

    2.6.3

    Makes pip install Scrapy work again.

    It required making changes to support pyOpenSSL 22.1.0. We had to drop support for SSLv3 as a result.

    We also upgraded the minimum versions of some dependencies.

    See the changelog.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

    ... (truncated)

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.7.1 (2022-11-02)

    New features

    
    -   Relaxed the restriction introduced in 2.6.2 so that the
        ``Proxy-Authentication`` header can again be set explicitly, as long as the
        proxy URL in the :reqmeta:`proxy` metadata has no other credentials, and
        for as long as that proxy URL remains the same; this restores compatibility
        with scrapy-zyte-smartproxy 2.1.0 and older (:issue:`5626`).
    

    Bug fixes

    
    -   Using ``-O``/``--overwrite-output`` and ``-t``/``--output-format`` options
        together now produces an error instead of ignoring the former option
        (:issue:`5516`, :issue:`5605`).
    
    • Replaced deprecated :mod:asyncio APIs that implicitly use the current event loop with code that explicitly requests a loop from the event loop policy (:issue:5685, :issue:5689).

    • Fixed uses of deprecated Scrapy APIs in Scrapy itself (:issue:5588, :issue:5589).

    • Fixed uses of a deprecated Pillow API (:issue:5684, :issue:5692).

    • Improved code that checks if generators return values, so that it no longer fails on decorated methods and partial methods (:issue:5323, :issue:5592, :issue:5599, :issue:5691).

    Documentation </code></pre> <ul> <li> <p>Upgraded the Code of Conduct to Contributor Covenant v2.1 (:issue:<code>5698</code>).</p> </li> <li> <p>Fixed typos (:issue:<code>5681</code>, :issue:<code>5694</code>).</p> </li> </ul> <p>Quality assurance</p> <pre><code>

    • Re-enabled some erroneously disabled flake8 checks (:issue:5688).

    • Ignored harmless deprecation warnings from :mod:typing in tests (:issue:5686, :issue:5697).

    • Modernized our CI configuration (:issue:5695, :issue:5696).

    &lt;/tr&gt;&lt;/table&gt; </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary>

    <ul> <li><a href="https://github.com/scrapy/scrapy/commit/6ded3cf4cd134b615239babe28bb28c3ff524b05"><code>6ded3cf</code></a> Bump version: 2.7.0 → 2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/95880c5de1b1909bf03303fb9c02cddb0508fe1a"><code>95880c5</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5701">#5701</a> from scrapy/relnotes-2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/5ec175b8bb08f93c431d7d64d2389b90ec7a1f37"><code>5ec175b</code></a> Small relnotes fixes.</li> <li><a href="https://github.com/scrapy/scrapy/commit/940a73863bf7dcb16b3f2d9f5efb83efe4599712"><code>940a738</code></a> Release notes for 2.7.1.</li> <li><a href="https://github.com/scrapy/scrapy/commit/a95a338eeada7275a5289cf036136610ebaf07eb"><code>a95a338</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5599">#5599</a> from tonal/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/9077d0f9b490114f117c668f115240c16afccedf"><code>9077d0f</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5698">#5698</a> from pankali/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/76c2cb070e4efe3ae33a4b3d72a5bcac6709f48f"><code>76c2cb0</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5697">#5697</a> from iamkaushal/<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5686">#5686</a>_fix</li> <li><a href="https://github.com/scrapy/scrapy/commit/9f45be439de8a3b9a6d201c33e98b408a73c02bb"><code>9f45be4</code></a> Update Code of Conduct to Contributor Covenant v2.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/bd9e482c2f0db92065708c8291be6e8bc1f05218"><code>bd9e482</code></a> added typing.io and typing.re in pytest warning filter to ignore</li> <li><a href="https://github.com/scrapy/scrapy/commit/fd692f309105d917f5f46bd00a88c550d6cc7da3"><code>fd692f3</code></a> Prevent running the -O and -t command-line options together (<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5605">#5605</a>)</li> <li>Additional commits viewable in <a href="https://github.com/scrapy/scrapy/compare/2.5.0...2.7.1">compare view</a></li> </ul> </details>

    <br />

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(class_9)
Owner
coding is a hobby; Not professionally educated in programming; If you find issues or mistake DO tell me ;-)
Introduction to WebScraping Workshop - Semcomp 24 Beta

Extrair informações da internet de forma automatizada. Existem diversas maneiras de fazer isso, nesse tutorial vamos ver algumas delas, por meio de bibliotecas de python.

Luísa Moura 19 Sep 11, 2022
A simple Discord scraper for discord bots

A simple Discord scraper for discord bots. That includes sending an guild members ids to an file, Mass inviter for joining servers your bot is in and Fetching all the servers of the bot (w/MemberCoun

3zg 1 Jan 06, 2022
DaProfiler allows you to get emails, social medias, adresses, works and more on your target using web scraping and google dorking techniques

DaProfiler allows you to get emails, social medias, adresses, works and more on your target using web scraping and google dorking techniques, based in France Only. The particularity of this program i

Dalunacrobate 347 Jan 07, 2023
ChromiumJniGenerator - Jni Generator module extracted from Chromium project

ChromiumJniGenerator - Jni Generator module extracted from Chromium project

allenxuan 4 Jun 12, 2022
This script is intended to crawl license information of repositories through the GitHub API.

GithubLicenseCrawler This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.

schutera 4 Oct 25, 2022
Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Christian Gracia 0 Jan 22, 2022
A Python module to bypass Cloudflare's anti-bot page.

cloudflare-scrape A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Reque

3k Jan 04, 2023
Simple library for exploring/scraping the web or testing a website you’re developing

Robox is a simple library with a clean interface for exploring/scraping the web or testing a website you’re developing. Robox can fetch a page, click on links and buttons, and fill out and submit for

Dan Claudiu Pop 79 Nov 27, 2022
Web and PDF Scraper Refactoring

Web and PDF Scraper Refactoring This repository contains the example code of the Web and PDF scraper code roast. Here are the links to the videos: Par

18 Dec 31, 2022
一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

QQ音乐歌词爬虫 一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件,默认去除了所有演唱会(Live)版本的歌曲。 使用方法 直接运行python run.py即可,然后输入你想获取的歌手名字,然后静静等待片刻。 output目录下保存生成的歌词和歌名文件。以周杰伦为例,会生成两

Yang Wei 11 Jul 27, 2022
Instagram profile scrapper with python

IG Profile Scrapper Instagram profile Scrapper Just type the username, and boo! :D Instalation clone this repo to your computer git clone https://gith

its Galih 6 Nov 07, 2022
A distributed crawler for weibo, building with celery and requests.

A distributed crawler for weibo, building with celery and requests.

SpiderClub 4.8k Jan 03, 2023
A pure-python HTML screen-scraping library

Scrapely Scrapely is a library for extracting structured data from HTML pages. Given some example web pages and the data to be extracted, scrapely con

Scrapy project 1.8k Dec 31, 2022
抢京东茅台脚本,定时自动触发,自动预约,自动停止

jd_maotai 抢京东茅台脚本,定时自动触发,自动预约,自动停止 小白信用 99.6,暂时还没抢到过,朋友 80 多抢到了一瓶,所以我感觉是跟信用分没啥关系,完全是看运气的。

Aruelius.L 117 Dec 22, 2022
A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

Yun Wang 1 Jan 10, 2022
Python scrapper scrapping torrent website and download new movies Automatically.

torrent-scrapper Python scrapper scrapping torrent website and download new movies Automatically. If you like it Put a ⭐ on this repo 😇 Run this git

Fazil vk 1 Jan 08, 2022
An automated, headless YouTube Watcher and Scraper

Searches YouTube, queries recommended videos and watches them. All fully automated and anonymised through the Tor network. The project consists of two independently usable components, the YouTube aut

44 Oct 18, 2022
A python tool to scrape NFT's off of OpenSea

Right Click Bot A script to download NFT PNG's from OpenSea. All the NFT's you could ever want, no blockchain, for free. Usage Must Use Python 3! Auto

15 Jul 16, 2022
An arxiv spider

An Arxiv Spider 做为一个cser,杰出男孩深知内核对连接到计算机上的硬件设备进行管理的高效方式是中断而不是轮询。每当小伙伴发来一篇刚挂在arxiv上的”热乎“好文章时,杰出男孩都会感叹道:”师兄这是每天都挂在arxiv上呀,跑的好快~“。于是杰出男孩找了找 github,借鉴了一下其

Jie Liu 11 Sep 09, 2022
用python爬取江苏几大高校的就业网站,并提供3种方式通知给用户,分别是通过微信发送、命令行直接输出、windows气泡通知。

crawler_for_university 用python爬取江苏几大高校的就业网站,并提供3种方式通知给用户,分别是通过微信发送、命令行直接输出、windows气泡通知。 环境依赖 wxpy,requests,bs4等库 功能描述 该项目基于python,通过爬虫爬各高校的就业信息网,爬取招聘信

8 Aug 16, 2021