Introduction to R


Introduction to Python


Intermediate Python

Snowflake | Oct 2020 - Present

Data Engineer

Building & maintaining Data Ingestion and Automation pipelines for our org, as well as supporting Snowflake Data Platform Operations activities using SQL/ Javascript/Python. Also, for >1yr, progressively trailblazing implementation of Internal Audit SOX exception detection Workflows that trigger Alerts as well as Analytics dashboards for use by Application control owners as well as as the Internal Audit team. We are proud to use a Snowflake@Snowflake approach for all of this.
University of Waterloo | Apr 2020 - Jul 2020

Machine Learning Intern

Developed Python-based QSAR (Quantitative Structure-Activity Relationahip) model for protein thermostability engineering. Ingested data from NCBI, Uniprot, PDB, AAIndex etc. databases, performed topological, structural, physico-chemical feature extraction & trained ensemble model with protein feature vector. Speeded up model building with Bayasian hyperparameter tuning.

Lantern Institute | Sep 2019 - May 2020

Data Scientist/Engineer

Using Python libraries matplotlib, seaborn, plotly, numpy, pandas, datetime - cleaned & analyzed Samsumg Health device data. Then built DASH (built on Flask, Plotly.js, React.js) app on GCP to display interactive visualizations of ‘effect of exercise/stress/day-time sleep’ on ‘night-time sleep’ for any selected time-range & aggregation type (daily/weekly/monthly). Parallel computation cloud project: Identified exotic particles in the UCI Higgs Dataset (11-million datapoints) on a GCP Virtual Machine with XGBoost Deep Learning image & dask based parallel computation optimization in code. Kaggle Competitions: Credit Card Fraud Detection: Labeled anonymized credit card transactions as fraudulent or genuine. Designed Autoencoder neural network to learn genuine cases, then distinguished frauds by high error scores. Found best XGBoost model gave 12% performance improvement over the neural network. Real or Not? NLP with Disaster Tweets: Predicted which Tweets are about real disasters and which ones are not. Extracted hashtags/mentions/text length, converted keywords into probability distribution, applied TFIDF vectorization to lower-cased & lemmatized text to obtain 87% accuracy, 85% F1-score.

Multiple | Mar 2017 - Apr 2019

Research And Teaching Assistant

Investigated role of dehydrin proteins in Conifer cold tolerance. Contract research assistant for Natural Resources Canada (NRC). Python, R & Bioinformatics tools used for Data Analysis. Ex. Performed SNP quantitation/annotation & SNP-gene matching in microbial genome dataset.

Stem Shock Inc. | Mar 2016 - Mar 2017

Biological Technician

Supported development of a ‘Programmable genetic interference" mRNA herbicide. Received pay raise within 3 months.

Master of Science (M.Sc.), Molecular BiologyThe University of British Columbia | 2015
Bachelor of Technology (B.Tech.), BiotechnologyNational Institute of Technology Warangal | 2012

Orpita Das

