Desmond Fung

Data Engineer & MCS Student

Desmond Fung

Building production data infrastructure at State Farm while pursuing a Master of Computer Science at UIUC.

About

Background

Data Engineer at State Farm building production serverless systems processing millions of documents daily. Currently pursuing a Master of Computer Science in Data Science at UIUC (4.00 GPA) with coursework spanning deep learning, cloud computing, and statistical learning. B.S. in Computer Science and B.S. in Statistics from UW–Madison.

Languages

PythonJavaSQLJavaScriptCRHTML/CSS

AI / ML

PyTorchTensorFlowScikit-learnNumPyPandasMatplotlibNetworkX

Cloud & DevOps

AWSTerraformGitLab CI/CDDockerAnsibleboto3

AWS Services

EC2LambdaS3DynamoDBSQSSNSAPI GatewayStep FunctionsCloudWatchIAM

Databases & Tools

MySQLDuckDBGitSplunkDynatraceTableauLinux/Unix

Certifications

AWS Cloud PractitionerTerraform Associate

Projects

Selected Work

Healthcare AI

Clinical Outcome Prediction with KEEP

Reproducing the KEEP embedding framework for in-hospital mortality prediction on MIMIC-IV — constructing a ~5,700-node medical knowledge graph from OMOP vocabularies and integrating as a pluggable embedding layer into PyHealth 2.0.

PyTorchnode2vecGloVeNetworkXDuckDB
TBD

Work in Progress

Project details coming soon.

TBD

Work in Progress

Project details coming soon.

Experience

Career & Education

Data Engineer

State Farm

Bloomington, IL

June 2022 — Present

Production serverless infrastructure and automated data pipelines at enterprise scale.

  • Built a serverless document storage API (Lambda, API Gateway, S3, DynamoDB) processing ~1.56M documents per workday (~36M/month) across 12 applications, replacing a legacy on-premises system with zero-downtime cutover
  • Architected an automated data pipeline processing 38K–124K records daily with concurrent CloudWatch Logs queries, achieving ~4x throughput improvement through intelligent time-slicing
  • Engineered idempotent document processing to handle duplicate S3 events, eliminating ~150 daily false alerts and ~20 hours/week of manual investigation
  • Developed a self-healing error recovery pipeline auto-recovering ~60K documents daily, reducing manual reprocessing from ~10 hours/week to near-zero
  • Built multi-region DR infrastructure (us-east-1/us-west-2) for a zero-downtime platform serving 18.5B documents (~6PB) using modular Terraform with DynamoDB global tables
  • Co-led migration of 60+ production servers in 1 month (6x acceleration), owning end-to-end execution across multiple dependent teams
  • Mentored 3 engineers on infrastructure operations and co-mentored 2 summer interns on AWS, Terraform, and deployment

Master of Computer Science in Data Science

University of Illinois Urbana-Champaign

Urbana-Champaign, IL

Expected Aug 2026

GPA: 4.00

  • Deep Learning for Healthcare
  • Cloud Computing Applications
  • Applied Machine Learning
  • Practical Statistical Learning
  • Methods of Applied Statistics
  • Database Systems
  • Scientific Visualization
  • Theory and Practice of Data Cleaning

B.S. in Computer Science and B.S. in Statistics

University of Wisconsin-Madison

Madison, WI

  • Elementary Matrix and Linear Algebra
  • Multivariable Calculus
  • Applied Regression Analysis
  • Introduction to Probability and Mathematical Statistics
  • Deep Learning
  • Machine Learning
  • Introduction to Artificial Intelligence
  • Data Visualization
  • Discrete Mathematics
  • Introduction to Operating Systems
  • Introduction to Algorithm
  • Database Management Systems