🔤 Measure edit distance based on keyboard layout

Related tags

Miscellaneousclavier
Overview

clavier

Measure edit distance based on keyboard layout.



Table of contents

Introduction

Default edit distances, such as the Levenshtein distance, don't differentiate between characters. The distance between two characters is either 0 or 1. This package allows you to measure edit distances by taking into account keyboard layouts.

The scope is purposefully limited to alphabetical, numeric, and punctuation keys. That's because this package is meant to assist in analyzing user inputs -- e.g. for spelling correction in a search engine.

The goal of this package is to be flexible. You can define any logical layout, such as QWERTY or AZERTY. You can also control the physical layout by defining where the keys are on the board.

Installation

pip install git+https://github.com/MaxHalford/clavier

User guide

Keyboard layouts

☝️ Things are a bit more complicated than QWERTY vs. AZERTY vs. XXXXXX. Each layout has many variants. I haven't yet figured out a comprehensive way to map all these out.

This package provides a list of keyboard layouts. For instance, we'll load the QWERTY keyboard layout.

>>> import clavier
>>> keyboard = clavier.load_qwerty()
>>> keyboard
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /

>>> keyboard.shape
(4, 13)

>>> len(keyboard)
46

Here is the list of currently available layouts:

>>> for layout in (member for member in dir(clavier) if member.startswith('load_')):
...     print(layout.replace('load_', ''))
...     exec(f'print(clavier.{layout}())')
...     print('---')
dvorak
` 1 2 3 4 5 6 7 8 9 0 [ ]
' , . p y f g c r l / = \
a o e u i d h t n s -
; q j k x b m w v z
---
qwerty
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /
---

Distance between characters

Measure the Euclidean distance between two characters on the keyboard.

>>> keyboard.char_distance('1', '2')
1.0

>>> keyboard.char_distance('q', '2')
1.4142135623730951

>>> keyboard.char_distance('1', 'm')
6.708203932499369

Distance between words

Measure a modified version of the Levenshtein distance, where the substitution cost is the output of the char_distance method.

>>> keyboard.word_distance('apple', 'wople')
2.414213562373095

>>> keyboard.word_distance('apple', 'woplee')
3.414213562373095

You can also override the deletion cost by specifying the deletion_cost parameter, and the insertion cost via the insertion_cost parameter. Both default to 1.

Typing distance

Measure the sum of distances between each pair of consecutive characters. This can be useful for studying keystroke dynamics.

>>> keyboard.typing_distance('hello')
10.245040190466598

For sentences, you can split them up into words and sum the typing distances.

>>> sentence = 'the quick brown fox jumps over the lazy dog'
>>> sum(keyboard.typing_distance(word) for word in sentence.split(' '))
105.60457487263012

Interestingly, this can be used to compare keyboard layouts in terms of efficiency. For instance, the Dvorak keyboard layout is supposedly more efficient than the QWERTY layout. Let's compare both on the first stanza of If— by Rudyard Kipling:

>> words = list(map(str.lower, stanza.split())) >>> qwerty = clavier.load_qwerty() >>> sum(qwerty.typing_distance(word) for word in words) 740.3255229138255 >>> dvorak = clavier.load_dvorak() >>> sum(dvorak.typing_distance(word) for word in words) 923.6597116104518 ">
>>> stanza = """
... If you can keep your head when all about you
...    Are losing theirs and blaming it on you;
... If you can trust yourself when all men doubt you,
...    But make allowance for their doubting too;
... If you can wait and not be tired by waiting,
...    Or, being lied about, don't deal in lies,
... Or, being hated, don't give way to hating,
...    And yet don't look too good, nor talk too wise;
... """

>>> words = list(map(str.lower, stanza.split()))

>>> qwerty = clavier.load_qwerty()
>>> sum(qwerty.typing_distance(word) for word in words)
740.3255229138255

>>> dvorak = clavier.load_dvorak()
>>> sum(dvorak.typing_distance(word) for word in words)
923.6597116104518

It seems the Dvorak layout is in fact slower than the QWERTY layout. But of course this might not be the case in general.

Nearest neighbors

You can iterate over the k nearest neighbors of any character.

>>> qwerty = clavier.load_qwerty()
>>> for char, dist in qwerty.nearest_neighbors('s', k=8, cache=True):
...     print(char, f'{dist:.4f}')
w 1.0000
a 1.0000
d 1.0000
x 1.0000
q 1.4142
e 1.4142
z 1.4142
c 1.4142

The cache parameter determines whether or not the result should be cached for the next call.

Physical layout specification

By default, the keyboard layouts are ortholinear, meaning that the characters are physically arranged over a grid. You can customize the physical layout to make it more realistic and thus obtain distance measures which are closer to reality. This can be done by specifying parameters to the keyboards when they're loaded.

Staggering

Staggering is the amount of offset between two consecutive keyboard rows.

You can specify a constant staggering as so:

>>> keyboard = clavier.load_qwerty(staggering=0.5)

By default the keys are spaced by 1 unit. So a staggering value of 0.5 implies a 50% horizontal shift between each pair of consecutive rows. You may also specify a different amount of staggering for each pair of rows:

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])

There's 3 elements in the list because the keyboard has 4 rows.

Key pitch

Key pitch is the amount of distance between the centers of two adjacent keys. Most computer keyboards have identical horizontal and vertical pitches, because the keys are all of the same size width and height. But this isn't the case for mobile phone keyboards. For instance, iPhone keyboards have a higher vertical pitch.

Drawing a keyboard layout

>>> keyboard = clavier.load_qwerty()
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty.png', bbox_inches='tight')

qwerty

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty_staggered.png', bbox_inches='tight')

qwerty_staggered

Custom layouts

You can of course specify your own keyboard layout. There are different ways to do this. We'll use the iPhone keypad as an example.

The from_coordinates method

>>> keypad = clavier.Keyboard.from_coordinates({
...     '1': (0, 0), '2': (0, 1), '3': (0, 2),
...     '4': (1, 0), '5': (1, 1), '6': (1, 2),
...     '7': (2, 0), '8': (2, 1), '9': (2, 2),
...     '*': (3, 0), '0': (3, 1), '#': (3, 2),
...                  '☎': (4, 1)
... })
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

The from_grid method

>> keypad 1 2 3 4 5 6 7 8 9 * 0 # ☎ ">
>>> keypad = clavier.Keyboard.from_grid("""
...     1 2 3
...     4 5 6
...     7 8 9
...     * 0 #
...       ☎
... """)
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

Development

git clone https://github.com/MaxHalford/clavier
cd clavier
pip install poetry
poetry install
poetry shell
pytest

License

The MIT License (MIT). Please see the license file for more information.

Owner
Max Halford
Going where the wind blows 🍃 🦔
Max Halford
A Python version of Canvacord

A copy of canvacord made in python! Table of contents Installation Examples Creating Images Links Downloads Installation Run any of these commands in

10 Mar 28, 2022
WinBoost: Boost your windows system.

Winboost runs a complete checkup of your entire system locating junk files, speed-reducing issues and causes of any system or application glitches or crashes. Through a lot of research and testing, w

Smit Parmar 4 Oct 01, 2021
Chicks get hostloc points regularly

hostloc_getPoints 小鸡定时获取hostloc积分 github action大规模失效,mjj平均一人10鸡,以下可以部署到自己的小鸡上

59 Dec 28, 2022
Arabic to Roman Converter in Python

Arabic-to-Roman-Converter Made together with https://github.com/goltaraya . Arabic to Roman Converter in Python. -Instructions: 1 - Make sure you have

Pedro Lucas Tomazeti Fernandes 6 Oct 28, 2021
A small site to list shared directories

Nebula Server Directories This site can be used to list folder and subdirectories in your server : Python It's required to have Python 3.8 or more ins

Adrien J. 1 Dec 28, 2021
Here is my Senior Design Project that I implemented to graduate from Computer Engineering.

Here is my Senior Design Project that I implemented to graduate from Computer Engineering. It is a chatbot made in RASA and helps the user to plan their vacation in the Turkish language. In order to

Ezgi Subaşı 25 May 31, 2022
Capture screen and download off Roku based devices

rokuview Capture screen and download off Roku based devices Tested on Hisense TV with Roku OS built-in No guarantee this will work with all Roku model

3 May 27, 2021
Python 100daysofcode

#python #100daysofcode Python is a simple, general purpose ,high level & object-oriented programming language even it's is interpreted scripting langu

Tara 1 Feb 10, 2022
MobaXterm-GenKey

MobaXterm-GenKey 你懂的!! 本地启动 需要安装Python3!!!

malaohu 328 Dec 29, 2022
Custom python interface to xstan (a modified (cmd)stan)

Custom python interface to xstan (a modified (cmd)stan) Use at your own risk, currently everything is very brittle and will probably be changed in the

2 Dec 16, 2021
Simple plug-and-play installer for users who want to LineageOS from stock firmware, or from another custom ROM.

LineageOS for the Teracube 2e Simple plug-and-play installer for users who want to LineageOS from stock firmware, or from another custom ROM. Dependen

Gagan Malvi 5 Mar 31, 2022
An app to automatically take attendance by scanning students' bar coded ID card as they enter the classroom.

Auto Classroom Attendance This application may be run on a PC to automatically scan students' ID card using a generic bar code scanner and output the

1 Nov 10, 2021
Logging-monitoring-instrumentation - A brief repository on logging monitoring and instrumentation in Python

logging-monitoring-instrumentation A brief repository on logging monitoring and

Noah Gift 6 Feb 17, 2022
Class and mathematical functions for quaternion numbers.

Quaternions Class and mathematical functions for quaternion numbers. Installation Python This is a Python 3 module. If you don't have Python installed

3 Nov 08, 2022
NYCU(NCTU)-差勤-助教

NCTU-TA-fill 填寫 差勤-助教時數 有沒有覺得在差勤系統填助教時數有點浪費生命? 今天有個懶鬼浪費好多時間幫大家寫了code 只要填好的必要的資料,就可以讓電腦自動幫你完成差勤助教的時數填寫喔! https://pt-attendance.nctu.edu.tw/verify/userL

14 Dec 21, 2021
🥦 Send and receive nano with 2 simple functions

easy_nano Send and receive nano (without having to understand the nano protocol).

1 Feb 14, 2022
Awesome Casino is simple offline casino made on python.

Awesome-Casino Awesome Casino is simple offline casino made on python. I found bug, what can i do? If you find any bug or want to suggest any idea, al

Herman 1 Feb 04, 2022
Multifunctional Analysis of Regions through Input-Output

MARIO Multifunctional Analysis of Regions through Input-Output. (Documents) What is it MARIO is a python package for handling input-output tables and

14 Dec 25, 2022
A tool to help plan vacations with friends and family

Vacationer In Development A tool to help plan vacations with friends and family Deployment Requirements: NPM Docker Docker-Compose Deployment Instruct

JK 2 Oct 05, 2021
Boamp-extractor - Script d'extraction des AOs publiés au BOAMP

BOAMP Extractor BOAMP-Extractor permet d'extraire les offres de marchés publics publiées au bulletin officiel des annonces des marchés publics (BOAMP)

Julien 3 Dec 09, 2022