List of content farm sites like g.penzai.com.

Overview

内容农场网站清单

Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。

尤为过分的是,该类网站可能有成千上万个分身域名被 Google 收录,严重影响搜索体验。详见 2021 年 10 初的社区反馈:

  1. Github: 如何屏蔽“小搭百科网”?
  2. V2EX: 请问在 google 搜索时,频繁遇到小 X 知识网等内容农场式结果,怎么办?
  3. V2EX: google 搜中文太毒了吧,是不是已经放弃中文搜索了
  4. HOSTLOC: 这采集站群太强了吧
  5. HOSTLOC: 小*知识网站群是哪位大佬的杰作

使用正则匹配标题的方式不能完全屏蔽,所以为方便广大网友过滤搜索结果,特整理此清单。

由于此次事件主角「小搭百科网」在造成影响后主动关站,所以接下来也将关注、收录其他的类似内容农场站。

使用方式

uBlacklist

安装 uBlacklist

Chrome Web Store / Firefox Add-ons / App Store (for macOS and iOS)

后进入 Option 菜单,点击 Add a subscription,输入如下内容:

  • Name: content-farm-list
  • URL: https://raw.githubusercontent.com/wdmpa/content-farm-list/main/uBlacklist.txt

  • Name: content-farm-list
  • URL: https://wdmpa.org/content-farm-list/uBlacklist.txt

单击 'Add' 按钮。

Google Hit Hider

http://www.jeffersonscher.com/gm/google-hit-hider/

Install

Grease Fork / OpenUserJS.org

Manage lists

http://www.jeffersonscher.com/gm/google-hit-hider/manage-lists.php

订阅说明

文件 说明
uBlacklist.txt uBlacklist 规则集合
Surge.txt Surge 规则集合
uBlacklist/spam/g.penzai.com.txt uBlacklist 专用小搭百科网域名集合
Surge/spam/g.penzai.com.txt Surge 专用小搭百科网域名集合
uBlacklist/machine-translated/stackoverflow.txt uBlacklist 专用机翻 StackOverflow 域名集合
Surge/machine-translated/stackoverflow.txt Surge 专用机翻 StackOverflow 域名集合

设置搜索引擎

因与清单中域名匹配的结果会被移除,所以搜索引擎的结果页剩余条目太少,不便浏览,建议登录后设置搜索结果显示为每页面 100 条。

我们能做什么?

一、发 PR 添加域名

  1. 从本地插件 uBlacklist 中导出域名列表
  2. 在搜索引擎中尝试长尾关键词,以发现更多目前权重尚低的农场域名

按结构在 domains 目录中添加新的分类集合文件。参考文件中已有内容的格式,在任意位置添加即可。(Fork 本仓库后编辑再 Push,或在页面中编辑均可。)

文件 说明
domains/spam/g.penzai.com.txt 小搭百科网域名集合
domains/machine-translated/stackoverflow.txt 机翻 StackOverflow 域名集合

提交后,脚本会自动更新订阅文件中的内容。

二、举报

向其使用的云服务提供商举报其滥用行为。

Owner
WDMPA
World Developer Mood Protection Association
WDMPA
Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression Knowledge distillation for BERT model Installation Run command below to install the environm

Siqi 180 Dec 19, 2022
Source code for "OmniPhotos: Casual 360° VR Photography"

OmniPhotos: Casual 360° VR Photography Project Page | Video | Paper | Demo | Data This repository contains the source code for creating and viewing Om

Christian Richardt 144 Dec 30, 2022
PCGNN - Procedural Content Generation with NEAT and Novelty

PCGNN - Procedural Content Generation with NEAT and Novelty Generation Approach — Metrics — Paper — Poster — Examples PCGNN - Procedural Content Gener

Michael Beukman 8 Dec 10, 2022
Code for "Offline Meta-Reinforcement Learning with Advantage Weighting" [ICML 2021]

Offline Meta-Reinforcement Learning with Advantage Weighting (MACAW) MACAW code used for the experiments in the ICML 2021 paper. Installing the enviro

Eric Mitchell 28 Jan 01, 2023
disentanglement_lib is an open-source library for research on learning disentangled representations.

disentanglement_lib disentanglement_lib is an open-source library for research on learning disentangled representation. It supports a variety of diffe

Google Research 1.3k Dec 28, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 865 Nov 17, 2022
A simple python library for fast image generation of people who do not exist.

Random Face A simple python library for fast image generation of people who do not exist. For more details, please refer to the [paper](https://arxiv.

Sergei Belousov 170 Dec 15, 2022
The implementation for "Comprehensive Knowledge Distillation with Causal Intervention".

Comprehensive Knowledge Distillation with Causal Intervention This repository is a PyTorch implementation of "Comprehensive Knowledge Distillation wit

Xiang Deng 10 Nov 03, 2022
Text Generation by Learning from Demonstrations

Text Generation by Learning from Demonstrations The README was last updated on March 7, 2021. The repo is based on fairseq (v0.9.?). Paper arXiv Prere

38 Oct 21, 2022
Tom-the-AI - A compound artificial intelligence software for Linux systems.

Tom the AI (version 0.82) WARNING: This software is not yet ready to use, I'm still setting up the GitHub repository. Should be ready in a few days. T

2 Apr 28, 2022
Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

Fast-Partial-Ranking-MNL This repo provides a PyTorch implementation for the CopulaGNN models as described in the following paper: Fast Learning of MN

Xingjian Zhang 3 Aug 19, 2022
Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021)

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) Overview Prerequisites Linux Pytho

Shaojie Li 34 Mar 31, 2022
Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

longlongman 170 Dec 01, 2022
(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

Xiangtao Kong 308 Jan 05, 2023
Implementation of the paper ''Implicit Feature Refinement for Instance Segmentation''.

Implicit Feature Refinement for Instance Segmentation This repository is an official implementation of the ACM Multimedia 2021 paper Implicit Feature

Lufan Ma 17 Dec 28, 2022
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Depth-supervised NeRF: Fewer Views and Faster Training for Free Project | Paper | YouTube Pytorch implementation of our method for learning neural rad

524 Jan 08, 2023
Clockwork Convnets for Video Semantic Segmentation

Clockwork Convnets for Video Semantic Segmentation This is the reference implementation of arxiv:1608.03609: Clockwork Convnets for Video Semantic Seg

Evan Shelhamer 141 Nov 21, 2022
Simulating Sycamore quantum circuits classically using tensor network algorithm.

Simulating the Sycamore quantum supremacy circuit This repo contains data we have obtained in simulating the Sycamore quantum supremacy circuits with

Feng Pan 46 Nov 17, 2022