Data Science Tutorials
Develop your data science skills with tutorials in our blog. We cover everything from intricate data visualizations in Tableau to version control features in Git.
Discover categories
Winsorized Mean: A Robust Approach to Handling Outliers
A winsorized mean reduces the influence of outliers by capping extreme values at specific percentiles, preserving the overall structure of the dataset. Read further to learn how to calculate the winsorized mean using Python for hands-on practice.
Arunn Thevapalan
September 10, 2024
AUC and the ROC Curve in Machine Learning
Learn how the AUC-ROC curve assesses binary classification models, focusing on performance across thresholds, particularly in imbalanced datasets. Use Python’s libraries to compute AUC values and compare classifiers in one workflow.
Vidhi Chugh
September 10, 2024
Getting Started with AWS Glue: A Step-by-Step Guide
Learn how to set up AWS Glue, create a crawler, catalog your data, and run jobs to convert CSV files into Parquet format, optimizing your ETL processes.
Zoumana Keita
September 9, 2024
ARIMA for Time Series Forecasting: A Complete Guide
Learn the key components of the ARIMA model, how to build and optimize it for accurate forecasts in Python, and explore its applications across industries.
Zaina Saadeddin
January 7, 2025
Reflection Llama-3.1 70B: Testing & Summary of What We Know
Reflection Llama-3.1 70B, trained with Reflection-Tuning, claims to surpass GPT-4o and Claude 3.5 Sonnet but has faced reproducibility and verification issues so far.
Ryan Ong
September 8, 2024
VBA Excel: How to Get Started and Make Your Work Easier
Learn how to effectively use VBA in Excel to automate tasks, create macros, and enhance your data processing skills with practical examples and best practices.
Laiba Siddiqui
September 7, 2024
CatBoost in Machine Learning: A Detailed Guide
Discover how CatBoost simplifies the handling of categorical data with the CatBoostClassifier() function. Understand the key differences between CatBoost vs. XGBoost to make informed choices in your machine learning projects.
Oluseye Jeremiah
September 6, 2024
How to Use a SQL Alias to Simplify Your Queries
Explore how using a SQL alias simplifies both column and table names. Learn why using a SQL alias is key for improving readability and managing complex joins.
Allan Ouko
September 6, 2024
RAG With Llama 3.1 8B, Ollama, and Langchain: Tutorial
Learn to build a RAG application with Llama 3.1 8B using Ollama and Langchain by setting up the environment, processing documents, creating embeddings, and integrating a retriever.
Ryan Ong
September 5, 2024
How to Use Goal Seek in Excel: A Guide with Real Examples
Simplify your data models with Excel’s Goal Seek, one of its powerful What-If Analysis tools. Solve real-world problems like loan payments and revenue targets.
Arunn Thevapalan
September 5, 2024
Python Tabulate: A Full Guide
Use the tabulate library in Python to create well-formatted tables. Learn about its advanced features and options for customizing table appearance.
Allan Ouko
September 5, 2024
Understanding Chebyshev Distance: A Comprehensive Guide
Learn how Chebyshev distance offers a unique approach to spatial problems. Uncover its applications in robotics, GIS, and game development with coding examples in Python and R.
Vinod Chugani
September 5, 2024