Learn Data Skills
Beta
Kartikeya Shukla

Kartikeya Shukla

Senior Data Scientist

Slalom | Brooklyn

Technologies

My Portfolio Highlights

My New Course

Introduction to Python

Quantitative chef, mixing variables and algorithms to create delectable insights.

My Work

Take a look at my latest work.

course

Introduction to R

course

Introduction to Python

course

Intermediate Python

DataCamp Course Completion

Take a look at all the courses I’ve completed on DataCamp.

My Work Experience

Where I've interned and worked during my career.

Slalom | Dec 2022 - Present

Senior Data Scientist

• Developed yield optimization models for large scale impression allocation & campaign scheduling problems using CPLEX solvers, AWS & Snowflake. Increase revenue by 10% • Built a confluence document vectorization engine using PySpark, Qdrant and performed prompt engineering to feed vectors to LLM (AWS Bedrock). Reduced onboarding time by 30% • Led AI/ML workshops for cross-functional teams, successfully convincing key stakeholders to double the budget for AI initiatives and achieving a 50% increase in departmental buy-in for implementing advanced ML technologies
Show More

Slalom | Sep 2021 - Dec 2022

Data Scientist

• Optimized ETL pipelines on Snowflake that processed 10TB of Media & Advertising Data using multiprocessing, cluster keys & maximizing cache usage. Used Airflow for monitoring & orchestration • Built near linear time recommender systems for audience profiling of 300 GB of highly dimensional & spase data using PySpark & NLP on EMR. Reduced AWS costs by 18% & model latency from 9 hrs to 3 hrs • Built fully open-source Python webapps using Streamlit & Plotly to visualize customer segmentation results, data drifts & model performance and saved $10k+ in licensing costs for BI platforms

in4mation insights | Nov 2020 - Sep 2021

Data Engineer

• Automated data-prep by building ETL pipelines using AWS Glue, Lambda, SNS, Athena & PySpark to fetch over 6TB of mobility & geocoded data from NOAA, SafeGraph & multiple REST APIs. Normalized SQL data models & used partitioning schemes to reduce query & Glue job runtimes from 5 hrs to 1hr • Built a generalized data-profiler using SQL, Spark, Python & Pandas which utilized null analysis, regular expressions, fuzzy matching to handle outliers, merge duplicates and ensure key integrity. Reduced manual analysis hours by 2X • Optimized product taxonomies by leveraging NLP, NER, Topic Modeling & Clustering to identify 20000+ unique cross-selling opportunities

NYU Center for Urban Science + Progress | Nov 2018 - Aug 2020

Data Engineer

• Modeled an end to end course recommendation system using item-based collaborative filtering using Spark MLlib. Increased student retention rates by 18% • Built an ETL pipeline using PySpark, Airflow, and AWS, which transformed raw data for over 1000+ students from multiple sources and fed it to the course recommender on a scheduled basis • Performed data quality analysis of over 1900+ flat files and semi-structured datasets available on NYC Open Data and logged it to an AWS Redshift data warehouse • Created an automated data cleaning flow with Tableau Prep, Spark, which utilized null analysis, regular expressions, fuzzy lookup, and data mining to handle outliers, merge duplicates and ensure key integrity

University at Buffalo | Apr 2017 - Apr 2018

Computer Vision/Machine Learning Engineer

• Built an image processing system to denoise dirty historical documents and official records. Minimized RMSE to 7 while maintaining a 0.9 SSIM index. • Applied median filter, adaptive thresholding, and morphological operations such as Dilation and Erosion along with Canny edge detection using OpenCV to reduce noise. • Implemented different CNN architectures through a sequence of Convolutions, Pooling, and Activation functions. Comparative study of CV algorithms vs CNNs based on metrics such as PSNR, UQI, and RMSE. • Built an image processing system to denoise dirty historical documents and official records. Minimized RMSE to 7 while maintaining a 0.9 SSIM index. • Applied median filter, adaptive thresholding, and morphological operations such as Dilation and Erosion along with Canny edge detection using OpenCV to reduce noise. • Implemented different CNN architectures through a sequence of Convolutions, Pooling, and Activation functions. Comparative study of CV algorithms vs CNNs based on metrics such as PSNR, UQI, and RMSE.

My Education

Take a look at my formal education

Master of Science in Computer ScienceNew York University | 2020
Bachelor of Science in Computer ScienceUniversity at Buffalo | 2018

About Me

Kartikeya Shukla

Experienced Data Scientist and Data Engineer with 3+ years of experience. Prociencies:-- Python, SQL, PySpark, Airflow, AWS, Snowflake, ML, NLP etc. Deep domain knowledge of Media & Advertising (churn, media-mix models, yield optimization etc)

Powered by

  • Work
  • Courses
  • Experience
  • Education
  • About Me
  • Create Your Data Portfolio for Free