Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
Webscreener is a tool for mass web domains pentesting.

Webscreener is a tool for mass web domains pentesting. It is used to take snapshots for domains that is generated by a tool like knockpy or Sublist3r. It cuts out most of the pentesting time by scree

Seekurity 3 Jun 07, 2021
Automação de Processos (obtenção de informações com o Selenium), atualização de Planilha e Envio de E-mail.

Automação de Processo: Código para acompanhar o valor de algumas ações na B3. O código entra no Google Drive, puxa os valores das ações (pré estabelec

Hemili Beatriz 1 Jan 08, 2022
Python scripts for a generic performance testing infrastructure using Locust.

TODOs Reference to published paper or online version of it loadtest_plotter.py: Cleanup and reading data from files ARS_simulation.py: Cleanup, docume

Juri Tomak 3 Dec 15, 2022
WomboAI Art Generator

WomboAI Art Generator Automate AI art generation using wombot.art. Also integrated into SnailBot for you to try out. Setup Install Python Go to the py

nbee 7 Dec 03, 2022
Selenium Page Object Model with Python

Page-object-model (POM) is a pattern that you can apply it to develop efficient automation framework.

Mohammad Ifran Uddin 1 Nov 29, 2021
A simple tool to test internet stability.

pingtest Description A personal project for testing internet stability, intended for use in Linux and Windows.

chris 0 Oct 17, 2021
PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive tasks.

PyAutoEasy PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive t

Dingu Sagar 7 Oct 27, 2022
Selects tests affected by changed files. Continous test runner when used with pytest-watch.

This is a pytest plug-in which automatically selects and re-executes only tests affected by recent changes. How is this possible in dynamic language l

Tibor Arpas 614 Dec 30, 2022
Sixpack is a language-agnostic a/b-testing framework

Sixpack Sixpack is a framework to enable A/B testing across multiple programming languages. It does this by exposing a simple API for client libraries

1.7k Dec 24, 2022
Django test runner using nose

django-nose django-nose provides all the goodness of nose in your Django tests, like: Testing just your apps by default, not all the standard ones tha

Jazzband 880 Dec 15, 2022
Nokia SR OS automation

Nokia SR OS automation Nokia is one of the biggest vendors of the telecommunication equipment, which is very popular in the Service Provider segment.

Karneliuk.com 7 Jul 23, 2022
Useful additions to Django's default TestCase

django-test-plus Useful additions to Django's default TestCase from REVSYS Rationale Let's face it, writing tests isn't always fun. Part of the reason

REVSYS 546 Dec 22, 2022
The Social-Engineer Toolkit (SET) repository from TrustedSec - All new versions of SET will be deployed here.

💼 The Social-Engineer Toolkit (SET) 💼 Copyright 2020 The Social-Engineer Toolkit (SET) Written by: David Kennedy (ReL1K) @HackingDave Company: Trust

trustedsec 8.4k Dec 31, 2022
Pymox - open source mock object framework for Python

Pymox is an open source mock object framework for Python. First Steps Installation Tutorial Documentation http://pymox.readthedocs.io/en/latest/index.

Ivan Rocha 7 Feb 02, 2022
Codeforces Test Parser for C/C++ & Python on Windows

Codeforces Test Parser for C/C++ & Python on Windows Installation Run pip instal

Minh Vu 2 Jan 05, 2022
This is a web test framework based on python+selenium

Basic thoughts for this framework There should have a BasePage.py to be the parent page and all the page object should inherit this class BasePage.py

Cactus 2 Mar 09, 2022
Cornell record & replay mock server

Cornell: record & replay mock server Cornell makes it dead simple, via its record and replay features to perform end-to-end testing in a fast and isol

HiredScoreLabs 134 Sep 15, 2022
Fills out the container extension form automatically. (Specific to IIT Ropar)

automated_container_extension Fills out the container extension form automatically. (Specific to IIT Ropar) Download the chrome driver from the websit

Abhishek Singh Sambyal 1 Dec 24, 2021
Travel through time in your tests.

time-machine Travel through time in your tests. A quick example: import datetime as dt

Adam Johnson 373 Dec 27, 2022
An AWS Pentesting tool that lets you use one-liner commands to backdoor an AWS account's resources with a rogue AWS account - or share the resources with the entire internet 😈

An AWS Pentesting tool that lets you use one-liner commands to backdoor an AWS account's resources with a rogue AWS account - or share the resources with the entire internet 😈

Brandon Galbraith 276 Mar 03, 2021