Project Case StudyJuly 2024

PUPR - OCR System

Automated data extraction platform using OCR and image enhancement to streamline document processing for government EHRM systems.

DesktopNode.jsImageMagickPostgreSQLReact.jsSequelizeZodOpen APICron Jobs

#PUPR - OCR System

##Architectural Overview

PUPR - OCR System is an internal tool built for the Ministry of Public Works and Housing (PUPR) to help digitize scattered employee documents and ID cards. As a Backend Engineer, I was responsible for connecting the optical character recognition (OCR) scripts to the main HR database, saving the administration team hundreds of hours of manual entry.
Dashboard Analytics

// Dashboard Analytics

##Key Features

Dashboard Analytics
A simple overview of how many documents were processed, accepted, and rejected over time.
Document Processing queue
Detailed views into each uploaded image where admins can review the automatically extracted text.
Correction & Rejection Workflow
Tools to manually correct OCR mistakes or reject illegible uploads entirely.
User Profile Management
Basic management capabilities handling passwords and role assignments for the review team.

##Engineering Highlights

###Automated Extraction Pipeline

To link the document scanner to the database safely: - Image Processing Node: I utilized Node.js combined with ImageMagick child processes to gently deskew and crop photos before feeding them to the OCR engine. - Validation Engine: Added Zod schema checkers to verify that the extracted text (like NIK strings and birthdates) matched strict Regex patterns before they were allowed to hit the database.
Detail Document Validation

// Detail Document Validation

##My Role and Responsibilities

  • API & Webhook Design: Built the REST endpoints the frontend uses to upload the large scanned image files efficiently.
  • Database Relational Bonding: Used Sequelize ORM with PostgreSQL to tightly relate employee records to their newly digitized documents without losing track of raw image files.
  • Background Workers: Configured simple Cron Jobs to regularly clear out failed or orphaned temporary image processing files.
History Document

// History Document

##Technical Stack

- Languages: JavaScript, Node.js - Frontend Integration: React.js - Database: PostgreSQL (Sequelize) - Data Parsing Tools: Zod, ImageMagick - Architecture: REST API, Cron Jobs