Automate the case review on legal case documents and find the most critical cases using network analysis

Overview

Automation on Legal Court Cases Review

This project is to automate the case review on legal case documents and find the most critical cases using network analysis.

Short write-up

Affiliation: Institute for Social and Economic Research and Policy, Columbia University

Project Information:

Keywords: Automation, PDF parse, String Extraction, Network Analysis

Software:

  • Python : pdfminer, LexNLP, nltk sklearn
  • R: igraph

Scope:

  1. Parse court documents, extract citations from raw text.
  2. Build citation network, identify important cases in the network.
  3. Extract judge's opinion text and meta information including opinion author, court, decision.
  4. Model training to predict court decision based on opinion text.

Polit Study on 159 Legal Court Documents (in pilot_159 folder)

1. Process PDF documents using Python

Ipython Notebook Description
1.Extraction by LexNLP.ipynb Extract meta inforation use LexNLP package.
2.Layer Analysis on Sigle File. ipynb Use pdfminer to extract the raw text and the paragraph segamentation in the PDF document.
3.Patent Position by Layer.ipynb Identify the position of patent number in extracted layers from PDF.
4.Opinion and Author by Layer.ipynb Extract opinion text, author, decisions from the layers list.
5.Wrap up to Meta Data.ipynb Store extracted meta data to .json or .csv
6.Visualize citation frequency.ipynb Bar plot of the citation frequencies

2. Data: Parse PDF documents via Python

These datasets are NOT included in this public repository for intellectual property and privacy concern

File
pdf2text159.json A dictionary of 3 list: file_name, raw_text, layers.
cite_edge159.csv Edge list of citation network
cite_node159.csv Meta information of each case: case_number, court, dates
reference_extract.csv cited cases in a list for every case, untidy format for analysis
citation159.csv file citation pair, tidy format for calculation
regulation159.csv file regulation pair, tidy format for calculation

3. Analyze and Visualize using R

File
Calculate Citation Frequency.Rmd Analyze reference_extract.csv
Citation Network.Rmd Analyze cite_edge159

4. Visulization Chart Sample

Citation Frequencycase_freq

Citation Networkcitation_net

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in Extraction_Modelling folder)

1. Extract opinion and meta information from raw text data

.ipynb notebook Description
Full Dataset Merge.ipynb Merge the 854 cases dataset
Edge and Node List.ipynb Create edge and node list
Full Extractions.ipynb Extract author, judge panel, opinion text
Clean Opinion Text.ipynb Remove references and special characters in opinion text

2. Datasets

These datasets are NOT included in this public repository for intellectual property and privacy concern

Dataset Description
amy_cases.json large dictionary {file name: raw text} for 854 cases, from Lilian's PDF parsing
full_name_text.json convert amy_cases.json key value pair to two list: file_name, raw_text
cite_edge.csv edge list of citation
cite_node.csv node list contains case_code, case_name, court_from, court_type
extraction854.csv full extractions include case_code, case_name, court_from, court_type, result, author, judge_panel
decision_text.json json file include author, decision(result of the case), opinion (opinion text), cleaned_text (cleaned opinion text)
cleaned_text.csv csv file contains allt the cleaned text
predict_data.csv cleaned dataset for NLP modeling predict court decision

3. Visulization using R

R markdown file
Full Network Graph.Rmd draw the full citation network
Citation Betwwen Nodes.Rmd draw citation between all the available cases
Clean Data For Predictive Modelling.rmd clean text data for predictive modeling

Interactive Graph

Play with Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

ipynb notebook
NLP Predictive Modeling.ipynb Try different preprocessing, and build a logistic regression to predict court decision.

Visulization of the Bi-gram (words) with the strongest coefficient

Bigram

Owner
Yi Yin
Tech & Business Alignment @ Wolfram Research, Social Sciences Research @ Columbia University
Yi Yin
Here I plotted data for the average test scores across schools and class sizes across school districts.

HW_02 Here I plotted data for the average test scores across schools and class sizes across school districts. Average Test Score by Race This graph re

7 Oct 27, 2021
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 05, 2023
Some method of processing point cloud

Point-Cloud Some method of processing point cloud inversion the completion pointcloud to incomplete point cloud Some model of encoding point cloud to

Tan 1 Nov 19, 2021
在原神中使用围栏绘图

yuanshen_draw 在原神中使用围栏绘图 文件说明 toLines.py 将一张图片转换为对应的线条集合,视频可以按帧转换。 draw.py 在原神家园里绘制一张线条图。 draw_video.py 在原神家园里绘制视频(自动按帧摆放,截图(win)并回收) cat_to_video.py

14 Oct 08, 2022
A dashboard built using Plotly-Dash for interactive visualization of Dex-connected individuals across the country.

Dashboard For The DexConnect Platform of Dexterity Global Working prototype submission for internship at Dexterity Global Group. Dashboard for real ti

Yashasvi Misra 2 Jun 15, 2021
Ana's Portfolio

Ana's Portfolio ✌️ Welcome to my Portfolio! You will find here different Projects I have worked on (from scratch) 💪 Projects 💻 1️⃣ Hangman game (Mad

Ana Katherine Cortes Sobrino 9 Mar 15, 2022
Bcc2telegraf: An integration that sends ebpf-based bcc histogram metrics to telegraf daemon

bcc2telegraf bcc2telegraf is an integration that sends ebpf-based bcc histogram

Peter Bobrov 2 Feb 17, 2022
An interactive dashboard built with python that enables you to visualise how rent prices differ across Sweden.

sweden-rent-dashboard An interactive dashboard built with python that enables you to visualise how rent prices differ across Sweden. The dashboard/web

Rory Crean 5 Dec 19, 2021
PanGraphViewer -- show panenome graph in an easy way

PanGraphViewer -- show panenome graph in an easy way Table of Contents Versions and dependences Desktop-based panGraphViewer Library installation for

16 Dec 17, 2022
Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization

Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization

Glumpy 1.1k Jan 05, 2023
The open-source tool for building high-quality datasets and computer vision models

The open-source tool for building high-quality datasets and computer vision models. Website • Docs • Try it Now • Tutorials • Examples • Blog • Commun

Voxel51 2.4k Jan 07, 2023
Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX

ForceAtlas2 for Python A port of Gephi's Force Atlas 2 layout algorithm to Python 2 and Python 3 (with a wrapper for NetworkX and igraph). This is the

Bhargav Chippada 227 Jan 05, 2023
Fast scatter density plots for Matplotlib

About Plotting millions of points can be slow. Real slow... 😴 So why not use density maps? ⚡ The mpl-scatter-density mini-package provides functional

Thomas Robitaille 473 Dec 12, 2022
An interactive UMAP visualization of the MNIST data set.

Code for an interactive UMAP visualization of the MNIST data set. Demo at https://grantcuster.github.io/umap-explorer/. You can read more about the de

grant 70 Dec 27, 2022
A programming language built on top of Python to easily allow Swahili speakers to get started with programming without ever knowing English

pyswahili A programming language built over Python to easily allow swahili speakers to get started with programming without ever knowing english pyswa

Jordan Kalebu 72 Dec 15, 2022
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 335 Nov 29, 2022
A Graph Learning library for Humans

A Graph Learning library for Humans These novel algorithms include but are not limited to: A graph construction and graph searching class can be found

Richard Tjörnhammar 1 Feb 08, 2022
a python function to plot a geopandas dataframe

Pretty GeoDataFrame A minimum python function (~60 lines) to draw pretty geodataframe. Based on matplotlib, shapely, descartes. Installation just use

haoming 27 Dec 05, 2022
:art: Diagram as Code for prototyping cloud system architectures

Diagrams Diagram as Code. Diagrams lets you draw the cloud system architecture in Python code. It was born for prototyping a new system architecture d

MinJae Kwon 27.5k Dec 30, 2022
Show Data: Show your dataset in web browser!

Show Data is to generate html tables for large scale image dataset, especially for the dataset in remote server. It provides some useful commond line tools and fully customizeble API reference to gen

Dechao Meng 83 Nov 26, 2022