🔬 Fixed struct serialization system, using Python 3.9 annotated type hints

Overview

py-struct

Fixed-size struct serialization, using Python 3.9 annotated type hints

This was originally uploaded as a Gist because it's not intended as a serious project, but I decided to take it a bit further and add some features.

Features:

  • One file, zero dependencies
  • Easy to use, just annotate your fields and use the decorator
  • Overridable (just define __load__, __save__, __size__, __align__)
  • Compatible with dataclasses
  • Integer / float primitives
  • Fixed size arrays
  • Raw chunks (bytes)
  • Static checking / size calculation
  • Packed or aligned structs, with 3 padding handling modes

Getting started

from .serialization import *
from io import BytesIO
from dataclasses import dataclass

# C ALIASES (for LP64)

Bool = U8
Char = S8; UChar = U8
Short = S16; UShort = U16
Int = S32; UInt = U32
Long = S64; ULong = U64
IntPtr = S64; UIntPtr = U64
Ptr = Size = U64

# Define some structs

@dataclass
class Foo(Struct):
    yeet: Bool
    ping: Bool

@dataclass
class MyStruct(Struct):
    foo: Foo
    bar: UInt
    three_bazs: tuple[Long, Long, Long]

# Decode with __load__(), passing an IO

data = b'\x01\x00\x00\x00\x00\x05\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
parsed = MyStruct.__load__(BytesIO(data))

assert parsed == MyStruct(foo=Foo(yeet=1, ping=0), bar=1280, three_bazs=(1, 2, 3))

# Encode back with __save__()

parsed.__save__(st := BytesIO())
assert data == st.getvalue()

Usage

Serializable protocol

A serializable class is one that implements the FixedSerializable protocol:

  • __load__(cls, st: BinaryIO) -> cls: class method that deserializes a readable binary stream into an instance
  • __save__(self, st: BinaryIO): instance method that serializes the instance into a writeable binary stream
  • __size__: int: class attribute that indicates total serialized size
  • __align__: int: align factor (1 if no alignment)

__size__ is expected to be a multiple of __align__ i.e. it includes any trailing alignment as needed.

class Color(NamedTuple):
    r: int
    g: int
    b: int

    __size__ = 3
    __align__ = 1
    @classmethod
    def __load__(cls, st):
        return cls(*st.read(3))
    def __save__(self, st):
        st.write(bytes(self))

Structs

Most times you'll just derive from Struct, which implements the serializable protocol for you, based on the property annotations of the class. As in C, fields are serialized in order of declaration.

@dataclass
class Color(Struct):
    r: U8
    g: U8
    b: U8

The implemented __load__ will construct an instance of the class passsing in keyword arguments according to the property annotations. In this example, the class will be constructed like Color(r=0, g=21, b=10). To avoid implementing the constructor and other methods yourself, Python's dataclass decorator can be used.

For the annotated properties, the following types are allowed:

  • A class implementing the serializable protocol, such as another struct.
  • One of the provided integer / float primitives: U8, S8, U16, S16, U32, S32, U64, S64, F32, F64. These are aliases of int with custom metadata consumed by Struct.
  • bytes, list[T] or tuple[T, ...] (where T is itself an allowed type).
  • A tuple with allowed types as elements. However, it must be annotated with FixedSize metadata like so: Annotated[list[U8], FixedSize(20)]

Struct is a metaclass, so it must be the first parent. Inheritace (subclassing the struct class, or more parents in addition to Struct) is discouraged and will probably not work correctly.

Alignment

Struct can automatically insert padding to align fields according to their __align__ attribute (or for primitives, their size). In this case, the struct itself is aligned to the LCM of the field alignments (and its __size__ is padded accordingly).

The align metaclass attribute controls how alignment is handled:

  • discard (default): When decoding, any bytes are accepted as padding (and discarded). When encoding, zeroes are inserted. (This is the only alignment mode that introduces malleability.)
  • zeros: Like discard, but only zeroes are accepted as padding when decoding.
  • explicit: Don't actually insert any padding, just check that all fields are aligned and that __size__ is aligned too. This mode expects you to explicitly declare padding as (for example) bytes.
  • no: No alignment at all. Field alignments are ignored and __align__ is set to 1. This is equivalent to a packed / unaligned struct.

For example, to create a packed struct:

@dataclass
class Address(Struct, align='no'):
    port: U16
    # without align='no', 2 bytes of padding would be inserted here
    host: U32

Caveat: tuple (last case in allowed types) will not verify or insert alignment between its elements. Its alignment will be the GCD of the alignments of its elements, and the size will be the sum of the sizes. This means tuple[U64, U64] will probably do what you want (align to 8 bytes), but tuple[U64, U32] will only align to 4 bytes. If you need alignment, use a nested Struct instead of a tuple.

Malleability

Serialization is non-malleable (that is, there's a bijection between serialized and unserialized values) if all the following conditions are met:

  • Structs use an alignment setting other than the default align='discard'.

Of course, if serialization is implemented manually at some point, malleability has to be checked there as well.

Wishlist

  • Bit fields
  • Post validation / transform (enums, booleans, string buffers, sets)
  • Endianness control
    • At annotation time, or at runtime?
  • Unions (? not clear how I'd implement those)
  • Optimization and laziness

Higher level:

  • Pointer newtype / wrapper class
    • Ideally annotated with target hint
    • Call to dereference, optionally passing index
  • Generics in struct classes
Owner
Alba Mendez
hi i'm a lesbian catgirl who loves hummus and kernels 🌸 she/her
Alba Mendez
Python collections that are backended by sqlite3 DB and are compatible with the built-in collections

sqlitecollections Python collections that are backended by sqlite3 DB and are compatible with the built-in collections Installation $ pip install git+

Takeshi OSOEKAWA 11 Feb 03, 2022
My notes on Data structure and Algos in golang implementation and python

My notes on DS and Algo Table of Contents Arrays LinkedList Trees Types of trees: Tree/Graph Traversal Algorithms Heap Priorty Queue Trie Graphs Graph

Chia Yong Kang 0 Feb 13, 2022
A mutable set that remembers the order of its entries. One of Python's missing data types.

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.

Elia Robyn Lake (Robyn Speer) 173 Nov 28, 2022
Common sorting algorithims in Python

This a Github Repository with code for my attempts for commonly used sorting algorithims, tested on a list with 3000 randomly generated numbers.

Pratham Prasoon 14 Sep 02, 2021
A DSA repository but everything is in python.

DSA Status Contents A: Mathematics B: Bit Magic C: Recursion D: Arrays E: Searching F: Sorting G: Matrix H: Hashing I: String J: Linked List K: Stack

Shubhashish Dixit 63 Dec 23, 2022
Al-Quran dengan Terjemahan Indonesia

Al-Quran Rofi Al-Quran dengan Terjemahan / Tafsir Jalalayn Instalasi Al-Quran Rofi untuk Archlinux untuk pengguna distro Archlinux dengan paket manage

Nestero 4 Dec 20, 2021
A mutable set that remembers the order of its entries. One of Python's missing data types.

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index nu

Elia Robyn Lake (Robyn Speer) 173 Nov 28, 2022
Final Project for Practical Python Programming and Algorithms for Data Analysis

Final Project for Practical Python Programming and Algorithms for Data Analysis (PHW2781L, Summer 2020) Redlining, Race-Exclusive Deed Restriction Lan

Aislyn Schalck 1 Jan 27, 2022
An esoteric data type built entirely of NaNs.

NaNsAreNumbers An esoteric data type built entirely of NaNs. Installation pip install nans_are_numbers Explanation A floating point number is just co

Travis Hoppe 72 Jan 01, 2023
Datastructures such as linked list, trees, graphs etc

datastructures datastructures such as linked list, trees, graphs etc Made a public repository for coding enthusiasts. Those who want to collaborate on

0 Dec 01, 2021
One-Stop Destination for codes of all Data Structures & Algorithms

CodingSimplified_GK This repository is aimed at creating a One stop Destination of codes of all Data structures and Algorithms along with basic explai

Geetika Kaushik 21 Sep 26, 2022
Decided to include my solutions for leetcode problems.

LeetCode_Solutions Decided to include my solutions for leetcode problems. LeetCode # 1 TwoSum First leetcode problem and it was kind of a struggle. Th

DandaIT04 0 Jan 01, 2022
This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages

DSA-Code-Snippet This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages Cont

DSCSRMNCR 3 Oct 22, 2021
Map single-cell transcriptomes to copy number evolutionary trees.

Map single-cell transcriptomes to copy number evolutionary trees. Check out the tutorial for more information. Installation $ pip install scatrex SCA

Computational Biology Group (CBG) 12 Jan 01, 2023
A Python implementation of red-black trees

Python red-black trees A Python implementation of red-black trees. This code was originally copied from programiz.com, but I have made a few tweaks to

Emily Dolson 7 Oct 20, 2022
Array is a functional mutable sequence inheriting from Python's built-in list.

funct.Array Array is a functional mutable sequence inheriting from Python's built-in list. Array provides 100+ higher-order methods and more functiona

182 Nov 21, 2022
CLASSIX is a fast and explainable clustering algorithm based on sorting

CLASSIX Fast and explainable clustering based on sorting CLASSIX is a fast and explainable clustering algorithm based on sorting. Here are a few highl

69 Jan 06, 2023
Google, Facebook, Amazon, Microsoft, Netflix tech interview questions

Algorithm and Data Structures Interview Questions HackerRank | Practice, Tutorials & Interview Preparation Solutions This repository consists of solut

Quan Le 8 Oct 04, 2022
dict subclass with keylist/keypath support, normalized I/O operations (base64, csv, ini, json, pickle, plist, query-string, toml, xml, yaml) and many utilities.

python-benedict python-benedict is a dict subclass with keylist/keypath support, I/O shortcuts (base64, csv, ini, json, pickle, plist, query-string, t

Fabio Caccamo 799 Jan 09, 2023
Python Data Structures and Algorithms

No non-sense and no BS repo for how data structure code should be in Python - simple and elegant.

Prabhu Pant 1.9k Jan 08, 2023