About
With over 15 years experience manipulating data in Python, and a collective decade working side by side with clinical psych researchers, developing collection, transformation, and storage solutions to facilitate the complete data life-cycle, I've worked my way through every stage, from collection to publication, and even a bit of grant writing. In my approach I emphasize portability, scalability, and fault tolerance by developing infrastructure as code, abstracting configuration, and leveraging templating, containerization, and automation (Terraform being a personal favorite) whenever possible.
I’ve simultaneously maintained a personal interest in financial market mechanics, and am passionate about open data initiatives and data driven reporting, which I express through my personal project, Spotlight, an aspiring bespoke data aggregation and tracking platform intended to facilitate journalists and simplify civic engagement.
Work Experience
-
Oversee all data and related technology for 6 active clinical trials and and > 10 years prior, including compute infrastructure, database management, ETL/ELT pipelines, and web service development and hosting
-
Cloud transition: built hybrid multicloud lakehouse with Minio, Azure, and S3 to feed remote storage for ClickHouse, ETL staging, Git Annex, and Nextcloud user access layer to modernize data infrastructure while maintaining HIPAA compliance, high security standards, and scalability at minimal cost
-
Implement and manage DAG scheduler platform to orchestrate Python/Bash/SQL ETL/ELT pipelines configured with S3 event triggers and Slack notifications
-
Stood up self-managed Elasticsearch, Kibana, Filebeat, and Logstash to dynamically index system metrics and files, providing efficient data aggregation, querying, and KPI visualization, which lowered overhead on maintaining diverse data sources and formats and helped locate missing data
-
Automate fMRI processing and time-series alignment of longitudinal data
-
Model clinical assessment data for relational and graph dbs (ArangoDB)
-
Develop and host full stack JavaScript web applications, enabling daily remote data collection from subject mobile devices, increasing data resolution leading to publication of novel findings in Nature Mental Health
-
Wrote cluster provisioning IaC using Terraform, Make, Bash, & systemd templates for RHEL CoreOS to produce faster/lighter K8 alternative with low attack surface
-
Develop and maintain multi-tenant virtual workspace solution offering scalable and portable computing environments with RBAC data access and pre-configured analysis software
-
Create visual dashboards of data and system metrics in Kibana and Apache Superset
-
Created and managed data processing/ETL pipelines for Docker Swarm neuro-analysis cluster using Python, JavaScript, Bash, and AWS Lambda
-
Automated retrieval and preprocessing of neuro-imaging datasets.
-
Made automated ML-driven fMRI QC pipeline to detect quality degradation in real-time with Slack notification.
-
Oversaw day-to-day technical operation of MRI research lab environment
-
Collected and managed high quality multimodal imaging data on human subjects for clinical research studies.
-
Debugged experimenter code and collaborated with GE engineers to solve MRI system failures; consulted researchers to optimize data collection and storage.
-
Assisted pre-processing and loading of data between collection site, databases, and analysis development cluster.
-
Devoted free time to studying javascript progressive web app development, cloud architecture, distributed computing, and dev ops techniques.
Projects
Data aggregation platform fueled by Next.js, GraphQL, ClickHouse, Superset, Kafka, NiFi, Airflow, deployable with Terraform to Nomad distributed environment
Assisted in optimizing the protocol for for acquiring neuromelanin-differentiating MRI data on human subjects, and provided brain images for the article
Developed imaging strategy for post-mortem substantia nigra sample which was used to correlate histological analysis of neuromelanin levels in brain tissue with imaging based biomarker
Side Projects
Translate emotional sentiment across mediums. Have a voice diary entry converted to a song, photograph, or semantic analysis and played back to you.
Tell vapetaper whenever you get a new vape, or replenish a consumable for any habit you want to track, and Vapetaper will interpolate your consumption patterns and generate visualizations and stats.
Education
Full participation audit -- completed in top 5% of the class. Taught by Dr. Steven Shapiro
Certifications
Awards
Granted travel expenses to attend and present at IEEE International Power Modulator and High Voltage Conference
Speaking
Volunteering
Invented eco-friendly alternatives to popular photodeveloping formulas and supplied photo lab with house-made solutions, cutting operating costs while minimizing ecological impact