Data Engineer with vast experience in Cloud (AWS, GCP) and high proficiency in python, SQL, dbt
Contact
Work Experience
• Using Python (Networkx), created Graph network Informational Visualizations and Dashboards to profile
University’s Thematic Research Areas
• Developed tools that allow for the exploration and characterization of current and trending research, as well as visualized
research collaborations through Co-Authorship Network using Topic Modeling(LDA)
• Designed an ETL pipeline to enable university teams to gain insight into the types of research being conducted across
the university.
• Using SQL and Python, created an automated ecosystem to discover, upgrade, and track the status of agents for the
Oracle Enterprise Manager 12c Cloud IT monitoring tools ecosystem
• With the automated ecosystem, we completed up-gradation and installation of Server Agents on more than 96%
of customer cloud servers in 60% of the time with less than 1% run-time issues
• Worked with cross-functional teams in the continuous improvement process and thereby increased Right First Time
from 92% to 95% with the new ecosystem
• Recognized with PACE-SETTER award for exceptional on-time delivery and team-coordination
Education
Industrial Engineering [Data Science, Analytics & Manufacturing]
Course Work:
Engineering Data Analysis, Machine Learning, Quantitative Risk Analysis, Quality Engineering, Lean Engineering, Stochastic Dynamic Programming, Design of Experiments, Statistical Computing
Certifications
Projects
Developed a Data Pipeline that creates an analytical database for querying information about the reviews and ratings of Airbnb.
Built an ELT Pipeline that extracts relevant data from S3 Bucket and load it to Data Warehouse, stages them in Redshift cluster, and transforming the data for usage of analytics teams.
Created automated data pipelines using Apache Airflow with thorough data quality checks and ability to run scheduled jobs.
Interpreted and Assessed the data to identify key insights and metrics of Netflix Origin to date and developed a Dashboard using Python, Plotly, Dash
Built an ETL pipeline for a Data Lake to allow analytics teams to find insights into what songs Sparkify users are listening to.
Extracted data hosted on S3 and process the data into analytics tables using Spark, and load them back into S3
• Developed a Practical Rule-Based Machine Learning Model to Identify Anomalies in electric consumption.
• Using new Extracted Features from the big data, built an ML model (XGBoost and Isolation Forest) and predicted consumption values for each day.
• Detected several Anomaly Patterns across the Time-Series Data of 4 Years with an Accuracy of 0.65.
Predicted the sales of 30490 products in Walmart Store for 28 — 100 days into the future
Visualized distinctive features with EDA and developed qualitative visualizations using matplotlib and Plotly
Developed and experimented with various forecasting models in Python on 9 hierarchical time-series models and was able to achieve max RMSE of 2.5 using the LightGBM model
Developed an end-to-end web application for detecting 30 household amenities using FlaskAPI and GCP
Used YOLOv5 model architecture on collected Dataset and achieved a mAP@0.5-0.95 of 0.4
Improved model Precision to 43% and mAP@0.5-0.95 to 0.5 using transfer learning with COCO weights
Implemented the Repelling-Attracting Metropolis Algorithm for sampling from the posterior distribution in searching for unknown sensor locations within a network which produces a multimodal joint posterior distribution.
Compared RAM with other optimization methods such as BFGS and Metropolis Hasting, in terms of speed and accuracy
Implemented RAM on 7D Exoplanet detection problem to sample posterior using a time-series radial velocity dataset.
Using Statistical techniques and Bayesian networks, developed clusters to assess and categorize the risk exposure for stakeholders of the health insurance industry
After exploratory analysis of the Texas Health Department data, using machine learning algorithms -- linear regression, K-means to device the risk predictions & clusters
Devised a decision-making model for health insurance companies to assess their financial risk exposure from policyholders
Developed methods in R to predict images of CIFAR data consisting of 10 image classes
Implemented machine learning algorithms - random forest, support vector machines, logistic regression in R; compared their performance and analyzed the best fit model using dimensionality reduction (PCA) and cross-validation error
Developed a convolution neural network using Keras in R for image classification task and achieved an accuracy of 85%
Awarded second-best innovation in the National event, BAJA SAE INDIA 2013
Developed 3D Modelling using Solid works, performed industry-specific crash simulations of the chassis using ANSYS
Engineered DFX for chassis design and assembly layout leading to a 30% increase in strength to weight ratio by 30% & FoS:2