Skip to main content

Kaggle Datasets Tutorial: Kaggle Notebooks

Learn about Kaggle datasets and notebooks and get a head start on creating your Kaggle profile.
Mar 2, 2022  · 7 min read

A "Kaggle Notebook" is a free jupyter notebook server that can be GPU integrated. Just like DataCamp Workspace notebooks, it allows you to perform machine learning operations on cloud computers instead of doing it on your own computer. Each time you create a Kaggle Notebook, you can edit and run its content in the browser. There is no need to set up your own jupyter notebook environment, just enter Kaggle, create a notebook and start using it on the browser. You can see the notebooks you have created before on the Kaggle Notebook page and you can also review other people's notebooks.

To create a Notebook, click on "New Notebook" after you navigate to the Kaggle Code page (Figure 3.1). After this process, cloud resources will be allocated for you and a notebook will be created instantly. You can give your notebook a name by clicking on the notebook text in the upper left corner. As you can see at a first glance, many options and features found in jupyter notebooks are also available here.

The most frequent questions about Kaggle Notebooks are how to share a notebook publicly, how to add another person as a collaborator, how to import a dataset to a notebook, and how to use the GPU. You can see the buttons required to perform each of these operations in Figure 3.2.

Figure 3.1: Kaggle Code

Figure 3.1: Kaggle Code

First, you have to commit a version of your Notebook to make it public. You can create a new version of your Notebook by clicking the "Save Version" button at the top right. Then, you can make your notebook accessible to everyone by clicking the "Share" button on the left of this button, or you can add others as collaborators using the same menu. To import a dataset, simply click on the "Add data" button under the "Save Version" button on the right menu, and select the dataset you want to add. To activate the GPU, you need to select the GPU option from the accelerator section in the menu on the right side. The maximum GPU time you can use on Kaggle is set at 30 hours per week.

Figure 3.2: Kaggle Notebook

Figure 3.2: Kaggle Notebook

All other features of notebooks are explained in detail in the Kaggle documentation.

KAGGLE DATASETS

WHAT ARE KAGGLE DATASETS?

Kaggle is a data science platform but it also supports dataset handling. "Kaggle Datasets" allows you to create your own custom datasets, share them with others and easily import them into your notebooks. Additionally, you can add private datasets which would only be visible to you.

What makes this feature one of the most important ones in Kaggle is that it gives you access to a wide variety of top-quality datasets shared by other users. You can easily find the datasets you want with just a few search and filtering methods.

DATASET SEARCH FILTERS

To search for a dataset, write your keywords in the search field, as shown in Figure 4.1. Here you can see that we can access several datasets about the pandemic just by typing "Covid" in the search bar.

If you click on "Filters" on the right side of the search bar, more filtering options will appear (Figure 4.2). With these, you can narrow your search by entering dataset tags, file type, and other values like the minimum or maximum size of the dataset (Figure 4.3).


Figure 4.1: Dataset Search Filters

Figure 4.1: Dataset Search Filters

Kaggle allows you to download any dataset for free, but depending on what you are going to use it for, you may need to pay attention to the license type of the datasets. In some cases, it is possible that you may need to obtain additional permissions from their owners in case you want to use a dataset for an academic paper or in case you intend to use it for commercial purposes, for example.

Figure 4.2: Dataset Search Filters by Tags

Figure 4.2: Dataset Search Filters by Tags

There are three main license types on Kaggle:

  1. Creative Commons: There are several kinds of Creative Commons licenses:
    1. CCO, which stands for public domain and means that the dataset is available to everyone under any circumstances.
    2. CC-BY, which requires the dataset user to credit its owner.
    3. CC-BY-SA, which also requires the owner to be credited and adds the condition that the dataset keeps the same kind of license even after it's modified.
  2. GPL: This license basically provides four main usage options:
    1. Firstly, you get unlimited use of a dataset.
    2. You also have the possibility to examine how the dataset works, and modify it.
    3. Additionally, you are entitled to the unlimited distribution of copies of the dataset.
    4. And lastly, you can distribute the modified version of the dataset as well.
  3. Open Database: This allows users to share, modify and use the dataset but it makes it mandatory to establish the same kind of license for the modified dataset.

Figure 4.3: Dataset Search Filters

Figure 4.3: Dataset Search Filters

DATA EXPLORER

The Data Explorer section allows you to quickly browse through the content and structure of the datasets. It gives you an overview of the files and the columns in the data, as well as their histogram graphs (Figure 4.4).

Figure 4.4: Data Explorer

Figure 4.4: Data Explorer

DATASETS FOR BEGINNERS

The following datasets are fun and easy to play with as a beginner. You can fetch these to your notebooks and start getting your hands dirty by visualizing the data. Once you come up with an idea, you can even build machine learning models with some of these datasets.

CUSTOM DATASETS

You can also upload and use your own datasets in Kaggle. This feature comes in handy when you have your own dataset or when you've modified a dataset and want to use it in your notebook. In order to upload a dataset, first, you need to zip your main dataset file. Then click on "New Dataset" in the Datasets section. Give your dataset a name and upload your zip file (Figure 4.5).

Figure 4.5: Importing Custom Datasets

Figure 4.5: Importing Custom Datasets

And that's it. You can now fetch the uploaded dataset to your notebook and start using it, as shown in section 2. If you want to keep the dataset private, make sure that the label in the right bottom corner of the uploading screen reads "Private". If you don't want it to be private, you can click on the label and change it to "Public".

Topics
Related

blog

Kaggle Competitions: The Complete Guide

Learn all about Kaggle Competitions. Discover what they are, how to succeed in them, and when and why you should do them.
Çağlar Uslu's photo

Çağlar Uslu

18 min

tutorial

Datasets from Images

This tutorial will demonstrate how you can make datasets in CSV format from images and use them for Data Science, on your laptop.
Rohit Peesa's photo

Rohit Peesa

4 min

tutorial

Kaggle Competition Tutorial: Machine Learning from the Titanic

Prepare for your first Kaggle Competition with this step-by-step tutorial.
Çağlar Uslu's photo

Çağlar Uslu

9 min

tutorial

How to Use Jupyter Notebooks: The Ultimate Guide

This article covers what Notebooks are and why you should use them. We also delve into hosted notebooks, which facilitate sharing and collaboration. This article also covers tips, tricks, and keyboard shortcuts.
Adam Shafi's photo

Adam Shafi

25 min

tutorial

Google Colab Tutorial for Data Scientists

A brief guide for navigating Google Colab to carry out data science coding and collaborating with other data scientists.
Bala Priya C's photo

Bala Priya C

12 min

tutorial

Kaggle Tutorial: Your First Machine Learning Model

Learn how to build your first machine learning model, a decision tree classifier, with the Python scikit-learn package, submit it to Kaggle and see how it performs!
Hugo Bowne-Anderson's photo

Hugo Bowne-Anderson

11 min

See MoreSee More