Srujan Jabbireddy

Data Engineer in USA

Data Engineer with vast experience in Cloud (AWS, GCP) and high proficiency in python, SQL, dbt

Contact

LinkedIn
Twitter

Work Experience

2021 — Now
2020 — 2020

• Using Python (Networkx), created Graph network Informational Visualizations and Dashboards to profile
University’s Thematic Research Areas
• Developed tools that allow for the exploration and characterization of current and trending research, as well as visualized
research collaborations through Co-Authorship Network using Topic Modeling(LDA)
• Designed an ETL pipeline to enable university teams to gain insight into the types of research being conducted across
the university.

2013 — 2015
Associate Technical Consultant at Oracle
India

• Using SQL and Python, created an automated ecosystem to discover, upgrade, and track the status of agents for the
Oracle Enterprise Manager 12c Cloud IT monitoring tools ecosystem
• With the automated ecosystem, we completed up-gradation and installation of Server Agents on more than 96%
of customer cloud servers in 60% of the time with less than 1% run-time issues
• Worked with cross-functional teams in the continuous improvement process and thereby increased Right First Time
from 92% to 95% with the new ecosystem
• Recognized with PACE-SETTER award for exceptional on-time delivery and team-coordination

Education

2021 — 2021
2021 — 2021
2019 — 2021
College Station, Texas

Industrial Engineering [Data Science, Analytics & Manufacturing]

Course Work:
Engineering Data Analysis, Machine Learning, Quantitative Risk Analysis, Quality Engineering, Lean Engineering, Stochastic Dynamic Programming, Design of Experiments, Statistical Computing

2009 — 2013
Bachelor of Engineering at Osmania University College of Engineering
Hyderabad, India

Certifications

Projects

2021

Developed a Data Pipeline that creates an analytical database for querying information about the reviews and ratings of Airbnb.
Built an ELT Pipeline that extracts relevant data from S3 Bucket and load it to Data Warehouse, stages them in Redshift cluster, and transforming the data for usage of analytics teams.
Created automated data pipelines using Apache Airflow with thorough data quality checks and ability to run scheduled jobs.

2021

Interpreted and Assessed the data to identify key insights and metrics of Netflix Origin to date and developed a Dashboard using Python, Plotly, Dash

2021

Built an ETL pipeline for a Data Lake to allow analytics teams to find insights into what songs Sparkify users are listening to.

Extracted data hosted on S3 and process the data into analytics tables using Spark, and load them back into S3

2021
Detecting Anomalies in Electricity Consumption

• Developed a Practical Rule-Based Machine Learning Model to Identify Anomalies in electric consumption.
• Using new Extracted Features from the big data, built an ML model (XGBoost and Isolation Forest) and predicted consumption values for each day.
• Detected several Anomaly Patterns across the Time-Series Data of 4 Years with an Accuracy of 0.65.

2020

Predicted the sales of 30490 products in Walmart Store for 28 — 100 days into the future
Visualized distinctive features with EDA and developed qualitative visualizations using matplotlib and Plotly
Developed and experimented with various forecasting models in Python on 9 hierarchical time-series models and was able to achieve max RMSE of 2.5 using the LightGBM model

2020

Developed an end-to-end web application for detecting 30 household amenities using FlaskAPI and GCP
Used YOLOv5 model architecture on collected Dataset and achieved a mAP@0.5-0.95 of 0.4
Improved model Precision to 43% and mAP@0.5-0.95 to 0.5 using transfer learning with COCO weights

2020

Implemented the Repelling-Attracting Metropolis Algorithm for sampling from the posterior distribution in searching for unknown sensor locations within a network which produces a multimodal joint posterior distribution.
Compared RAM with other optimization methods such as BFGS and Metropolis Hasting, in terms of speed and accuracy
Implemented RAM on 7D Exoplanet detection problem to sample posterior using a time-series radial velocity dataset.

2019
Health Insurance Companies’ Risk Exposure– Asses and Categorize

Using Statistical techniques and Bayesian networks, developed clusters to assess and categorize the risk exposure for stakeholders of the health insurance industry
After exploratory analysis of the Texas Health Department data, using machine learning algorithms -- linear regression, K-means to device the risk predictions & clusters
Devised a decision-making model for health insurance companies to assess their financial risk exposure from policyholders

2019

Developed methods in R to predict images of CIFAR data consisting of 10 image classes
Implemented machine learning algorithms - random forest, support vector machines, logistic regression in R; compared their performance and analyzed the best fit model using dimensionality reduction (PCA) and cross-validation error
Developed a convolution neural network using Keras in R for image classification task and achieved an accuracy of 85%

2013
Design and Manufacturing of an All-Terrain Vehicle for BAJA SAE-INDIA

Awarded second-best innovation in the National event, BAJA SAE INDIA 2013
Developed 3D Modelling using Solid works, performed industry-specific crash simulations of the chassis using ANSYS
Engineered DFX for chassis design and assembly layout leading to a 30% increase in strength to weight ratio by 30% & FoS:2