Blog

Kaggle Datasets Tutorial: Kaggle Notebooks

Learn about Kaggle datasets and notebooks and get a head start on creating your Kaggle profile.

Updated Mar 2022 · 7 min read

A "Kaggle Notebook" is a free jupyter notebook server that can be GPU integrated. Just like DataCamp Workspace notebooks, it allows you to perform machine learning operations on cloud computers instead of doing it on your own computer. Each time you create a Kaggle Notebook, you can edit and run its content in the browser. There is no need to set up your own jupyter notebook environment, just enter Kaggle, create a notebook and start using it on the browser. You can see the notebooks you have created before on the Kaggle Notebook page and you can also review other people's notebooks.

To create a Notebook, click on "New Notebook" after you navigate to the Kaggle Code page (Figure 3.1). After this process, cloud resources will be allocated for you and a notebook will be created instantly. You can give your notebook a name by clicking on the notebook text in the upper left corner. As you can see at a first glance, many options and features found in jupyter notebooks are also available here.

The most frequent questions about Kaggle Notebooks are how to share a notebook publicly, how to add another person as a collaborator, how to import a dataset to a notebook, and how to use the GPU. You can see the buttons required to perform each of these operations in Figure 3.2.

Figure 3.1: Kaggle Code

First, you have to commit a version of your Notebook to make it public. You can create a new version of your Notebook by clicking the "Save Version" button at the top right. Then, you can make your notebook accessible to everyone by clicking the "Share" button on the left of this button, or you can add others as collaborators using the same menu. To import a dataset, simply click on the "Add data" button under the "Save Version" button on the right menu, and select the dataset you want to add. To activate the GPU, you need to select the GPU option from the accelerator section in the menu on the right side. The maximum GPU time you can use on Kaggle is set at 30 hours per week.

Figure 3.2: Kaggle Notebook

All other features of notebooks are explained in detail in the Kaggle documentation.

KAGGLE DATASETS

WHAT ARE KAGGLE DATASETS?

Kaggle is a data science platform but it also supports dataset handling. "Kaggle Datasets" allows you to create your own custom datasets, share them with others and easily import them into your notebooks. Additionally, you can add private datasets which would only be visible to you.

What makes this feature one of the most important ones in Kaggle is that it gives you access to a wide variety of top-quality datasets shared by other users. You can easily find the datasets you want with just a few search and filtering methods.

To search for a dataset, write your keywords in the search field, as shown in Figure 4.1. Here you can see that we can access several datasets about the pandemic just by typing "Covid" in the search bar.

If you click on "Filters" on the right side of the search bar, more filtering options will appear (Figure 4.2). With these, you can narrow your search by entering dataset tags, file type, and other values like the minimum or maximum size of the dataset (Figure 4.3).

Figure 4.1: Dataset Search Filters

Kaggle allows you to download any dataset for free, but depending on what you are going to use it for, you may need to pay attention to the license type of the datasets. In some cases, it is possible that you may need to obtain additional permissions from their owners in case you want to use a dataset for an academic paper or in case you intend to use it for commercial purposes, for example.

Figure 4.2: Dataset Search Filters by Tags

There are three main license types on Kaggle:

Creative Commons: There are several kinds of Creative Commons licenses:

CCO, which stands for public domain and means that the dataset is available to everyone under any circumstances.
CC-BY, which requires the dataset user to credit its owner.
CC-BY-SA, which also requires the owner to be credited and adds the condition that the dataset keeps the same kind of license even after it's modified.

GPL: This license basically provides four main usage options:

Firstly, you get unlimited use of a dataset.
You also have the possibility to examine how the dataset works, and modify it.
Additionally, you are entitled to the unlimited distribution of copies of the dataset.
And lastly, you can distribute the modified version of the dataset as well.

Open Database: This allows users to share, modify and use the dataset but it makes it mandatory to establish the same kind of license for the modified dataset.

Figure 4.3: Dataset Search Filters

DATA EXPLORER

The Data Explorer section allows you to quickly browse through the content and structure of the datasets. It gives you an overview of the files and the columns in the data, as well as their histogram graphs (Figure 4.4).

Figure 4.4: Data Explorer

DATASETS FOR BEGINNERS

The following datasets are fun and easy to play with as a beginner. You can fetch these to your notebooks and start getting your hands dirty by visualizing the data. Once you come up with an idea, you can even build machine learning models with some of these datasets.

Digimon Database: A database of Digimon and their moves from Digimon Story CyberSleuth.
Animal Bites: Data on over 9,000 bites, including rabies tests.
80 Cereals: Nutrition data on 80 cereal products.
Women's Shoe Prices: A list of 10,000 women's shoes and the prices at which they are sold.
Groundhog Day Forecasts and Temperatures: How accurate is Punxsutawney Phil's winter weather forecast?

CUSTOM DATASETS

You can also upload and use your own datasets in Kaggle. This feature comes in handy when you have your own dataset or when you've modified a dataset and want to use it in your notebook. In order to upload a dataset, first, you need to zip your main dataset file. Then click on "New Dataset" in the Datasets section. Give your dataset a name and upload your zip file (Figure 4.5).

Figure 4.5: Importing Custom Datasets

And that's it. You can now fetch the uploaded dataset to your notebook and start using it, as shown in section 2. If you want to keep the dataset private, make sure that the label in the right bottom corner of the uploading screen reads "Private". If you don't want it to be private, you can click on the label and change it to "Public".

Topics

Data Science

The Complete Docker Certification (DCA) Guide for 2024

Unlock your potential in Docker and data science with our comprehensive guide. Explore Docker certifications, learning paths, and practical tips.

Matt Crabtree

8 min

Mastering API Design: Essential Strategies for Developing High-Performance APIs

Discover the art of API design in our comprehensive guide. Learn how to create APIs like Google Maps API with best practices in defining methods, data formats, and integrating security features.

Javeria Rahim

11 min

Data Science in Finance: Unlocking New Potentials in Financial Markets

Discover the role of data science in finance, shaping tomorrow's financial strategies. Gain insights into advanced analytics and investment trends.

Shawn Plummer

9 min

5 Common Data Science Challenges and Effective Solutions

Emerging technologies are changing the data science world, bringing new data science challenges to businesses. Here are 5 data science challenges and solutions.

DataCamp Team

8 min

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.

Mark Graus

10 min

Introduction to DynamoDB: Mastering NoSQL Database with Node.js | A Beginner's Tutorial

Learn to master DynamoDB with Node.js in this beginner's guide. Explore table creation, CRUD operations, and scalability in AWS's NoSQL database.

Gary Alway

11 min

See More See More

KAGGLE DATASETS

WHAT ARE KAGGLE DATASETS?

DATASET SEARCH FILTERS

DATA EXPLORER

DATASETS FOR BEGINNERS

CUSTOM DATASETS

The Complete Docker Certification (DCA) Guide for 2024

Mastering API Design: Essential Strategies for Developing High-Performance APIs

Data Science in Finance: Unlocking New Potentials in Financial Markets

5 Common Data Science Challenges and Effective Solutions

A Data Science Roadmap for 2024

Introduction to DynamoDB: Mastering NoSQL Database with Node.js | A Beginner's Tutorial

The Complete Docker Certification (DCA) Guide for 2024

Mastering API Design: Essential Strategies for Developing High-Performance APIs

Data Science in Finance: Unlocking New Potentials in Financial Markets

5 Common Data Science Challenges and Effective Solutions

A Data Science Roadmap for 2024

Introduction to DynamoDB: Mastering NoSQL Database with Node.js | A Beginner's Tutorial