Back to Projects

Digitization Project

Project Hines - WWII Flight Logbook Digitization

An ongoing project focused on turning handwritten WWII flight logbooks into structured, searchable digital records.

Overview

Project Hines is a personal archival project built around scanned historical logbooks. The goal is to create a workflow that can take handwritten records, extract the important information, organize it into a structured format, and make the final archive easier to search, browse, and preserve.

Planned Workflow

  • Preprocess scanned pages to improve readability
  • Use OCR to extract handwritten text where possible
  • Prompt for manual review when handwriting is unclear
  • Transform extracted data into a structured spreadsheet or searchable dataset
  • Preserve metadata so records remain understandable and navigable

Tools and Direction

Current direction includes a zero-cost Python-based workflow on Windows using tools such as OpenCV, Tesseract, and pandas, with optional OCR model fallbacks for harder handwriting.

  • Python
  • OpenCV
  • Tesseract
  • pandas
  • OCR

Why This Project Matters

What makes this project meaningful to me is that it combines technical problem solving with preservation. It is not just about OCR in the abstract. It is about taking difficult, inconsistent historical material and building a workflow that makes the information more accessible without losing context.

From a technical perspective, it also pushes me to think about preprocessing, imperfect data, human review, structured output, and how to design a system that is useful even when automation is not perfect.

Current Status

This is an active project, and the write-up will grow as the pipeline becomes more complete. I am keeping it on the portfolio because it reflects the kind of long-form, real-world technical work I want to keep building: practical, iterative, and useful beyond the classroom.