Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Last update: Nov 16, 2022

Related tags

Overview

Eacc

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents. It uses Python code to specify both lexer and grammar for a given document. Eacc can handle succinctly most parsing cases that existing Python parsing tools propose to address.

Documents are split into tokens and a token has a type when a sequence of tokens is matched it evaluates to a specific type then rematcned again against the existing rules. The types can be function objects it means patterns can be evaluated based on extern conditions.

The fact of it being possible to have a grammar rule associated to a type and the type being variable in the context of the program it makes eacc useful for some text analysis problems.

A document grammar is written mostly in an ambiguous manner. The parser has a lookahead mechanism to express precedence when matching rules.

It is possible to extend the document grammar at the time it is being parsed. Such a feature is interesting to handle some edge cases.

The parser also accept some special operators like Except, Only, Times etc. These operators are used to match sequences of tokens based on their token types and length.

Features

Fast and flexible Lexer
- Use class inheritance to extend/modify your existing lexers.
Handle broken documents.
- Useful in some edge cases.
Short implementation
- You can easily extend or modify functionalities.
Powerful but easy to learn
- Learn a few classes workings to implement a parser.
Pythonic notation for grammars
- No need to dig deep into grammar theory.

Note: For a real and more sophisticated example of eacc usage check out.

Crocs is capable of reading a regex string then generating possible matches for the inputed regex.

https://github.com/iogf/crocs

Basic Example

The code below specifies a lexer and a parsing approach for a simple expression calculator. When one of the mathematical operations +, -, * or / is executed then the result is a number

Based on such a simple assertion it is possible to implement our calculator.

from eacc.eacc import Rule, Grammar, Eacc
from eacc.lexer import Lexer, LexTok, XSpec
from eacc.token import Plus, Minus, LP, RP, Mul, Div, Num, Blank, Sof, Eof

class CalcTokens(XSpec):
    # Used to extract the tokens.
    t_plus   = LexTok(r'\+', Plus)
    t_minus  = LexTok(r'\-', Minus)

    t_lparen = LexTok(r'\(', LP)
    t_rparen = LexTok(r'\)', RP)
    t_mul    = LexTok(r'\*', Mul)
    t_div    = LexTok(r'\/', Div)

    t_num    = LexTok(r'[0-9]+', Num, float)
    t_blank  = LexTok(r' +', Blank, discard=True)

    root = [t_plus, t_minus, t_lparen, t_num, 
    t_blank, t_rparen, t_mul, t_div]

class CalcGrammar(Grammar):
    # The token patterns when matched them become
    # ParseTree objects which have a type.
    r_paren = Rule(LP, Num, RP, type=Num)
    r_div   = Rule(Num, Div, Num, type=Num)
    r_mul   = Rule(Num, Mul, Num, type=Num)
    o_div   = Rule(Div)
    o_mul   = Rule(Mul)

    r_plus  = Rule(Num, Plus, Num, type=Num, up=(o_mul, o_div))
    r_minus = Rule(Num, Minus, Num, type=Num, up=(o_mul, o_div))

    # The final structure that is consumed. Once it is
    # consumed then the process stops.
    r_done  = Rule(Sof, Num, Eof)

    root = [r_paren, r_plus, r_minus, r_mul, r_div, r_done]

# The handles mapped to the patterns to compute the expression result.
def plus(expr, sign, term):
    return expr.val() + term.val()

def minus(expr, sign, term):
    return expr.val() - term.val()

def div(term, sign, factor):
    return term.val()/factor.val()

def mul(term, sign, factor):
    return term.val() * factor.val()

def paren(left, expression, right):
    return expression.val()

def done(sof, num, eof):
    print('Result:', num.val())
    return num.val()

if __name__ == '__main__':
    data = '2 * 5 + 10 -(2 * 3 - 10 )+ 30/(1-3+ 4* 10 + (11/1))' 

    lexer  = Lexer(CalcTokens)
    tokens = lexer.feed(data)
    eacc   = Eacc(CalcGrammar)
    
    # Link the handles to the patterns.
    eacc.add_handle(CalcGrammar.r_plus, plus)
    eacc.add_handle(CalcGrammar.r_minus, minus)
    eacc.add_handle(CalcGrammar.r_div, div)
    eacc.add_handle(CalcGrammar.r_mul, mul)
    eacc.add_handle(CalcGrammar.r_paren, paren)
    eacc.add_handle(CalcGrammar.r_done, done)
    
    ptree = eacc.build(tokens)
    ptree = list(ptree)

The defined rule below fixes precedence in the above ambiguous grammar.

    r_plus  = Rule(Num, Plus, Num, type=Num, up=(o_mul, o_div))

The above rule will be matched only if the below rules aren't matched ahead.

    o_div   = Rule(Div)
    o_mul   = Rule(Mul)

In case the above rule is matched then the result has type Num it will be rematched against the existing rules and so on.

When a mathematical expression is well formed it will result to the following structure.

Sof Num Eof

Which is matched by the rule below.

    r_done  = Rule(Sof, Num, Eof)

That rule is mapped to the handle below. It will merely print the resulting value.

def done(sof, num, eof):
    print('Result:', num.val())
    return num.val()

The Sof and Eof are start of file and end of file tokens. These are automatically inserted by the parser.

In case it is not a valid mathematical expression then it raises an exception. When a given document is well formed, the defined rules will consume it entirely.

The lexer is really flexible it can handle some interesting cases in a short and simple manner.

from eacc.lexer import XSpec, Lexer, SeqTok, LexTok, LexSeq
from eacc.token import Keyword, Identifier, RP, LP, Colon, Blank

class KeywordTokens(XSpec):
    t_if = LexSeq(SeqTok(r'if', type=Keyword),
    SeqTok(r'\s+', type=Blank))

    t_blank  = LexTok(r' +', type=Blank)
    t_lparen = LexTok(r'\(', type=LP)
    t_rparen = LexTok(r'\)', type=RP)
    t_colon  = LexTok(r'\:', type=Colon)

    # Match identifier only if it is not an if.
    t_identifier = LexTok(r'[a-zA-Z0-9]+', type=Identifier)

    root = [t_if, t_blank, t_lparen, 
    t_rparen, t_colon, t_identifier]

lex = Lexer(KeywordTokens)
data = 'if ifnum: foobar()'
tokens = lex.feed(data)
print('Consumed:', list(tokens))

That would output:

Consumed: [Keyword('if'), Blank(' '), Identifier('ifnum'), Colon(':'),
Blank(' '), Identifier('foobar'), LP('('), RP(')')]

The above example handles the task of tokenizing keywords correctly. The SeqTok class works together with LexSeq to extract the tokens based on a given regex while LexNode works on its own to extract tokens that do not demand a lookahead step.

Install

Note: Work with python3 only.

pip install eacc

Documentation

You might also like...

Releases(v3.1.6)

v3.1.6(Jul 6, 2020)

Bug fix with rule handle execution.
Source code(tar.gz)
Source code(zip)
v3.1.5(Jul 4, 2020)

Bug fixes, code improvements.
Source code(tar.gz)
Source code(zip)
v3.1.4(Jul 3, 2020)

Code improvements.
Source code(tar.gz)
Source code(zip)
v3.1.3(Jun 28, 2020)

Bug fixes, behavior and code improvements.
Source code(tar.gz)
Source code(zip)
v3.1.2(Jun 23, 2020)

Bug fixes.
Source code(tar.gz)
Source code(zip)
v3.1.1(Jun 23, 2020)

Bug fixes.
Source code(tar.gz)
Source code(zip)
v3.1.0(Jun 17, 2020)

Some optmizations, adding Only, Except, DotTok operators.
Source code(tar.gz)
Source code(zip)
v3.0.0(May 6, 2020)

Lexer optimization, design improvements.
Source code(tar.gz)
Source code(zip)
v2.0.0(Apr 5, 2020)

Design improvements.
Source code(tar.gz)
Source code(zip)
v1.0.0(Apr 4, 2020)

First release.
Source code(tar.gz)
Source code(zip)

Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Related tags

Overview

Eacc

Features

Basic Example

Install

Documentation

You might also like...

Sms Bomber, Tool Encryptor

JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.

Żmija is a simple universal code generation tool.

epub2sphinx is a tool to convert epub files to ReST for Sphinx

Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

A collection of simple python mini projects to enhance your python skills

Repository for learning Python (Python Tutorial)

A python package to avoid writing and maintaining duplicated python docstrings.

advance python series: Data Classes, OOPs, python

Releases(v3.1.6)

v3.1.6(Jul 6, 2020)

v3.1.5(Jul 4, 2020)

v3.1.4(Jul 3, 2020)

v3.1.3(Jun 28, 2020)

v3.1.2(Jun 23, 2020)

v3.1.1(Jun 23, 2020)

v3.1.0(Jun 17, 2020)

v3.0.0(May 6, 2020)

v2.0.0(Apr 5, 2020)

v1.0.0(Apr 4, 2020)

Owner

Iury de oliveira gomes figueiredo

Bring RGB to life in Neovim

VSCode extension that generates docstrings for python files

An awesome Data Science repository to learn and apply for real world problems.

MkDocs plugin for setting revision date from git per markdown file

Numpy's Sphinx extensions

[Unofficial] Python PEP in EPUB format

Python Programming (Practical) (1-25) Download 👇🏼

Python bindings to OpenSlide

Convert excel xlsx file's table to csv file, A GUI application on top of python/pyqt and other opensource softwares.

Watch a Sphinx directory and rebuild the documentation when a change is detected. Also includes a livereload enabled web server.

Valentine-with-Python - A Python program generates an animation of a heart with cool texts of your loved one

Practical Python Programming

Portfolio project for Code Institute Full Stack software development course.

The Python Dict that's better than heroin.

MkDocs Plugin allowing your visitors to *File > Print > Save as PDF* the entire site.

A collection and example code of every topic you need to know about in the basics of Python.

Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02

Pydantic model generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.

MonsterManualPlus - An advanced monster manual for Tower of the Sorcerer.

Example Python code for running the mango-explorer marketmaker

MkDocs Plugin allowing your visitors to File > Print > Save as PDF the entire site.