About the Project

Features

  • Search through data

    • Allow users to filter the data by metadata (file type, creation date, etc)

    • Keyword-based file search querying

    • Logical search queries

  • Provide easy-to-navigate interface for journalists to view related files

    • Pull related files using sentence indexing and provide links to the files

  • Provide preview of specific files

    • Identify and display column information for excel files

    • Identify and display relevant snippets of file based on search query

Technical Challenges

  • Parsing and cleaning over 60 thousand documents down to plain text data from various file types (Word, Excel, PDF, ect.)

  • Using Elasticsearch on AWS for file search and tagging

  • Generating recommendations of similar files using Universal Sentence Encoder and nearest neighbor searching

About the Client

The Princeton Gerrymandering Project aims to give activists and legislators the tools they need to detect offenses and craft bulletproof, bipartisan reform. Their analysis is published widely, and their work is used by legislators and reformers of all communities, without regard to partisan affiliation.

Main Features

Impact

See the above description!