Justin Bois is a lecturer in the Division of Biology and Biological Engineering at the California Institute of Technology. He teaches nine different classes there, nearly all of which heavily feature Python. He is dedicated to empowering students in the biological sciences with quantitative tools, particularly data analysis skills. Beyond biologists, he is thrilled to develop courses for DataCamp. In this DataChat, Hugo and Justin discuss different aspects of data science, including data science education, and Justin shares his advice to those getting started in the field.
Chester builds (and helps instructors build) R and SQL courses for DataCamp. Chester has experience working as an actuary, as a professor, and as a statistical/data scientist consultant in academia. In addition, he has worked as a consultant for actuarial firms and the Portland Trailblazers NBA team. He is co-author of the fivethirtyeight R package and author of the thesisdown R package. He is also a co-author of ModernDive, an open source textbook for introductory statistics and data science students using R.
Richie chats to Oliver and Charlotte about the importance of web data, how Oliver isn’t a data scientist, how Charlotte uses data on the web for teaching, web APIs and R packages to access them, web scraping for social good, and data in the cloud vs. computing in the cloud.
Richie chats to Nina and John about their favorite types of regression, statistics vs. machine learning, running Win-Vector, interacting with data scientists vs. interacting with managers, business constraints on models, the vtreat R package, bangra dancing, and life in San Francisco.
Katharine Jarmul runs a data analysis company called kjamistan that specializes in helping companies analyze data and training others on data analysis best practices, particularly with Python. She has been using Python for 8 years for a variety of data work -- including telling stories at major national newspapers, building large scale aggregation software, making decisions based on customer analytics, and marketing spend and advising new ventures on the competitive landscape.
Richie chats to Julia about her amazing outfit, what is sentiment analysis, data science at Stack Overflow, the importance of community in data science, transferring skills from astrophysics and improv comedy to data science, how the course came about, and text mining gender effects.
Jason Myers is a software engineer and author. His area of expertise is in developing data analytics platforms. He has also written the Essential SQLAlchemy book, co-authored with Rick Copeland, that introduces you to working with relational databases in Python.
Rob J. Hyndman is Professor of Statistics at Monash University, Australia, and Editor-in-Chief of the International Journal of Forecasting. Rob is the author of over 150 research papers and books in statistical science. In 2007, he received the Moran medal from the Australian Academy of Science for his contributions to statistical research, especially in the area of statistical forecasting. He is the author of about 20 R packages, including the popular forecast package.
Richie chats to Colin about his academic work and consultancy, high performance computing in R, the joy of Fortran 77, how the kids these days have it easy, trends in R usage, efficient programming and premature optimization, and the problem with working while connected to the internet.
Richie and Deepayan chat about R-Core and R Foundation, how the lattice project came about, how R has changed over the years, CRAN vs. Bioconductor, how he uses R, and interactive graphics in R. Watch this to find out how to become a member of R-Core, and to hear about Deepayan’s secret use of Python!
Richie chats to Barry about tropical diseases, how he became a geographer, R tools for spatial statistics, R and QGIS, the trials of converting between different data structures, the spatial stellar cluster, interactive maps, and trends in spatials stats.
Dan Becker is a Data Scientist with expertise in deep learning. He has contributed to the Keras and Tensorflow libraries for deep learning, finished 2nd (out of 1353 teams) in the $3 million Heritage Health Prize data mining competition, supervised data science consulting projects for 6 companies in the Fortune 100 and taught deep learning workshops at events and conferences such as ODSC.
Here, Daniel talks about his upcoming book, whether to start learning python or R for data science, the best paths to becoming a data scientist and much more. Daniel is a Software Carpentry instructor and a doctoral student in Genetics, Bioinformatics, and Computational Biology at Virginia Tech, where he works in the Social and Decision Analytics Laboratory under the Biocomplexity Institute. He received his MPH at the Mailman School of Public Health in Epidemiology and is interested in integrating hospital data in order to perform predictive health analytics and build clinical support tools for clinicians. An advocate of open science, he aspires to bridge data science with epidemiology and health care.
Andy is a lecturer at the Data Science Institute at Columbia University and author of the O'Reilly book "Introduction to machine learning with Python", describing a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he has been co-maintaining it for several years. He is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as a Machine Learning Scientist at Amazon. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms. Here, Andy answers questions about his work at Columbia, gives advice to people starting with data science and answers what the most difficult part of his job is.
Peter is a co-founder of DrivenData. He earned his master's in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences. His work lies at the intersection of statistics and computer science, and he wants to help bring powerful new modeling techniques to the organizations that need them most. He previously worked as a software engineer at Microsoft and earned a BA in philosophy from Yale University. Here, Peter and Hugo discuss why use python for data science, the business case for data, DrivenData competitions on using yelp data to predict restaurant sanitary ratings and much more.
Dhavide Aruliah is Director of Training at Anaconda, the leading Open Data Science platform powered by Python. Dhavide was previously an Associate Professor at the University of Ontario Institute of Technology (UOIT). He served as Program Director for various undergraduate & postgraduate programs at UOIT. His research interests include computational inverse problems, numerical linear algebra, & high-performance computing. Together with Hugo, Dhavide goes over the process of designing a course, his work at Anaconda, his path to Python and more.
Ben is a machine learning specialist and the director of research at lateral.io. He is passionate about learning and has worked as a data scientist in real-time bidding, e-commerce, and recommendation. Ben holds a PhD in mathematics and a degree in computer science.
Ben is an Assistant Professor in the Statistical & Data Sciences Program at Smith College. He completed his Ph.D. in Mathematics at the Graduate Center of the City University of New York in 2012. He is an Accredited Professional Statistician™ by the American Statistical Association and was previously the Statistical Analyst for the Baseball Operations department of the New York Mets. Follow Ben here: https://twitter.com/BaumerBen Together with Nick, Ben talks about his path to becoming a sabermetrician, his passion for teaching, the importance of subject-matter expertise in data science and more.
Clifford is a Vice President at Compass Lexecon. He specializes in valuation, corporate finance, and damages, and has worked on hundreds of engagements involving companies across a broad spectrum of industries. He is the author of Analyzing Financial Data and Implementing Financial Models Using R. Together with Lore, Clifford goes over his interest in Finance, his course Bond Valuation and Analysis in R, his latest book and more.
Mine is the Director of Undergraduate Studies and an Associate Professor of the Practice in the Department of Statistical Science at Duke University. She received my Ph.D. in Statistics from the University of California, Los Angeles, and a B.S. in Actuarial Science from New York University’s Stern School of Business. Her work focuses on innovation in statistics pedagogy, with an emphasis on student-centered learning, computation, reproducible research, and open-source education. Find Mine's website here: http://www2.stat.duke.edu/~mc301/ Together with Nick, Mine talks about her path into becoming a Statistician, teaching R, trends in the broader data science community and much more.
Jo is a Professor of Mathematics at Pomona College with many years of R experience. She has a pure passion for education and has been working on the ASA’s undergraduate curriculum guidelines where she strongly advocated the infusion of data science into the undergraduate statistics curriculum. Together with Nick, Jo talks about R's place in the stats curriculum, the role of technology in education, what advice she would give to people just starting in statistics, bootstrapping, and much more.
David Stoffer is a Professor of Statistics at the University of Pittsburgh. He is member of the editorial board of the Journal of Time Series Analysis and Journal of Forecasting. David is the coauthor of the book "Time Series Analysis and Its Applications: With R Examples", which is the basis of his course. Another (free) book he wrote on Time Series Analysis is available here: http://www.stat.pitt.edu/stoffer/tsa4... Together with Lore, David talks about his path to Statistics, his teaching method, his latest book, how he got into R, and much more.
Ron holds a Ph.D. in Electrical Engineering and Computer Science from M.I.T. and has written or co-written five books. Furthermore, he is the author and maintainer of the GoodmanKruskal R package, and one of the authors of the datarobot R package. Together with Nick, Ron talks about his passion for exploratory data analysis, inliers (vs outliers), why he is so excited about the evolution of machine learning models and much more.
Charlotte is an Assistant Professor in the Department of Statistics at Oregon State University and an avid R programmer with a passion for teaching. She talks us through her first exposure to R, why a tool like GitHub is fantastic, and how to use your cat and a GPS tracker to collect data for your R coding experiments :-)
In this episode, Nick interviews Garrett Grolemund. Garrett is a Data Scientist at RStudio and the author of Hands-On Programming with R and R for Data Science from O'Reilly Media. He talks us through how he discovered R, the evolution of R, what data science means to him and much more.
In this episode of DataChats, Nick talks with Max Kuhn, the creator of the caret package for R. Max is a frequent speaker at many of the main data science conferences and is well known as the creator of the caret package for R, an essential tool in every R user’s machine learning toolbox. In this 30 min conversation, Max talks about how he originally wanted to become a journalist, why he defines himself as a statistician rather than a data scientist, his thoughts on deep learning, strategies for breaking into the field, and much more.
Join 1,600,000 Data Science Enthusiasts today!Create Free Account Now Get Full Access