Index different CKAN entities in Solr, not just datasets

Overview

ckanext-sitesearch

Index different CKAN entities in Solr, not just datasets

Requirements

This extension requires CKAN 2.9 or higher and Python 3

Features

Search actions

ckanext-sitesearch allows Solr-powered searches on the following CKAN entities:

Entity Action Permissions Notes
Organizations organization_search Public
Groups group_search Public
Users user_search Sysadmins only
Pages page_search Public (individual page permissions apply) Requires ckanext-pages

All *_search actions support most of the same paramters that package_search, except the facet* and include_* ones. That includes q, fq, rows, start and sort.

In all actions, the output matches the one of package_search as well, an object with a count key and a results one, wich is a list of the corresponding entities dict (ie the result of organization_show, user_show etc):

, , ] } ">
{
    "count": 2,
    "results": [
        
    
     ,
        
     
      ,
    ]
}


     
    

Additionally the plugin registers a site_search action that performs a search across all entities that the user is allowed to, including datasets. Results are returned in an object including the keys for which the user has permission to search on. For instance for a sysadmin user that has access to all searches:

, "organizations": , "groups": , "users": , "pages": }">
{
    "datasets": 
       
        ,
    "organizations": 
        
         ,
    "groups": 
         
          ,
    "users": 
          
           ,
    "pages": 
           
             } 
           
          
         
        
       

For each item, the results object is the one described above (count and results keys).

Note that all parameters are passed unchanged to each of the search actions, so this site-wide search is mostly useful for free-text searches like q=flood.

CLI

The plugin inlcudes a ckan command to reindex the current entities in the database in Solr:

ckan sitesearch rebuild 
   

   

Where entity_type is one of organizations, groups, users or pages. You can also pass the id or name of a particular entity to index just that particular one:

ckan sitesearch rebuild organization department-of-transport

Check the command help for additional options:

ckan sitesearch rebuild --help

Installation

To install ckanext-sitesearch:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate

  2. Clone the source and install it on the virtualenv

    git clone https://github.com/okfn/ckanext-sitesearch.git cd ckanext-sitesearch pip install -e . pip install -r requirements.txt

  3. Add sitesearch to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/ckan.ini).

  4. Restart CKAN

Config settings

None at present

Developer installation

To install ckanext-sitesearch for development, activate your CKAN virtualenv and do:

git clone https://github.com/okfn/ckanext-sitesearch.git
cd ckanext-sitesearch
python setup.py develop

Tests

To run the tests, do:

pytest --ckan-ini=test.ini

License

AGPL

Owner
Open Knowledge Foundation
Also find us at: @frictionlessdata @opentrials @openspending @openknowledge-archive
Open Knowledge Foundation
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
Finally, some decent sample sentences

tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se

hecko 19 Dec 13, 2022
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

John Snow Labs 3k Jan 05, 2023
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
a chinese segment base on crf

Genius Genius是一个开源的python中文分词组件,采用 CRF(Conditional Random Field)条件随机场算法。 Feature 支持python2.x、python3.x以及pypy2.x。 支持简单的pinyin分词 支持用户自定义break 支持用户自定义合并词

duanhongyi 237 Nov 04, 2022
EdiTTS: Score-based Editing for Controllable Text-to-Speech

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Neosapience 99 Jan 02, 2023
Opal-lang - A WIP programming language based on Python

thanks to aphitorite for the beautiful logo! opal opal is a WIP transcompiled pr

3 Nov 04, 2022
PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

YangHeng 567 Jan 07, 2023
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

Saarland University Spoken Language Systems Group 39 Nov 15, 2022
Grover is a model for Neural Fake News -- both generation and detectio

Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.

Rowan Zellers 856 Dec 24, 2022
A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Persian-Image-Captioning We fine-tuning the Vision Encoder Decoder Model for the task of image captioning on the coco-flickr-farsi dataset. The implem

Hamtech-ai 15 Aug 25, 2022
Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. @inproceedings{tedes

Babelscape 40 Dec 11, 2022
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Michael Hansen 988 Jan 04, 2023
Index different CKAN entities in Solr, not just datasets

ckanext-sitesearch Index different CKAN entities in Solr, not just datasets Requirements This extension requires CKAN 2.9 or higher and Python 3 Featu

Open Knowledge Foundation 3 Dec 02, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022
OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters - where the final result looks like waves in the ocean.

Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

Meta Research 578 Dec 13, 2022
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training

ZYMa 12 Apr 07, 2022
基于pytorch_rnn的古诗词生成

pytorch_peot_rnn 基于pytorch_rnn的古诗词生成 说明 config.py里面含有训练、测试、预测的参数,更改后运行: python main.py 预测结果 if config.do_predict: result = trainer.generate('丽日照残春')

西西嘛呦 3 May 26, 2022