Many firms have a large document corpus made up of both digitized and raw images. Now more than ever, financial institutions are turning towards unstructured data sources to capture additional attributes in order to, ultimately, adjust or confirm their analyses and discover new trends and insights. Many organizations rely on individuals to read sections of these documents or search for relevant materials in an ad hoc manner, with no systematic way of categorizing and understanding the information and trends.
Join us for this Dataiku session on interactive document intelligence, where we will showcased a modular and reusable pipeline to rapidly and automatically digitize documents, extract text, and consolidate data into a unified and searchable database. We focused on NLP techniques applied to prepare, categorize, and analyse textual data based on themes of interest (in this project: ESG), with additional theme modules available. Lastly, we will demoed a purpose-built dashboard to provide business users with a simple and interactive tool to analyse high-level trends and drill down into aggregated insights.