Skip to main content

Datasets from Images

This tutorial will demonstrate how you can make datasets in CSV format from images and use them for Data Science, on your laptop.
Aug 13, 2018  · 4 min read

In machine learning, Deep Learning, Datascience most used data files are in json or CSV, here we will learn about CSV and use it to make a dataset. CSV stands for Comma Separated Values. These database fields have been exported into a format that contains a single line where a comma separates each database record. Files with the .csv extension are similar to plain text files. This allows individuals who do not run the same database applications to share database files between one another.

For this tutorial we have few requirements:

1->Microsoft Excel
2->Few Images
3->Notepad++(I suggest it as it is easy to use, you can try with WordPad, notepad or any text editor)

With this, let’s get started.

Now let's install 'Notepad++', visit this link- https://notepad-plus-plus.org/download/v7.5.6.html and install the version that works well on your device.

For 32-Bit

download 32 bit

For 64-Bit

download 64 bit

Now download a few images

Make a new folder (I named it as a dataset), make a few folders in it and fill those folders with images. I have downloaded car number plates from a few parts of the world and stored them folders.

new dataset
car number plates

Open terminal/Command Prompt in the current directory, i.e., in the folder dataset and run commands that I will be giving. Now I will list out commands for windows users:

Command to get a list of folders and files in your directory:-- dir /b/s

Command to get file names and save to a text file:-- dir /b/s/w *.jpg > "filename.txt"

For Linux users(Ubuntu):

the command to get a list of folders and files in your directory:-- ls -LR

Command to get file names and save to a text file:-- ls -LR *.jpg > files.txt

For Mac OSX: macOS is POSIX compliant, so it contains the usual command line utilities found in Unix environments.

the command to get a list of folders and files in your directory:--ls /b/s

Command to get file names and save to a text file:-- ls /b/s/w*.jpg > filename.txt

terminal

terminal

Here file was named by me as 'filename' here you can anything of your wish like 'names.txt', it will be stored in the directory where you used the command prompt (here I wanted only images stored with extension .JPG so I used *.jpg to call them you can use .jpeg or XML anything depending on your extension of images)

filename
dataset

Now enter ctrl+f and remove main directory details, for excel to pull images into it we need to give details of subdirectories and filenames starting with a " ./ " so we replace first backslash \ with ./ and the second one with /

find next

After making changes to the file, save the text file.

save text file

Now open Microsoft Excel, copy all names in a text file and paste them in excel sheet.

excel

If you want to label images, then make another column named label and fill them depending on how you want to label them. Here I labeled them depending on their country

excel

Now save it with extension CSV(Comma delimited) in the folder(dataset) where you have folders containing images.

csv

Now remove text file from that folder and convert folder which we named dataset to zip file.

You're Done!!!

You have successfully made a dataset in CSV format.

Conclusion

This tutorial provides a quick guide on how to make datasets in CSV format from images for data science. I hope you find this tutorial useful when you want to make a dataset. Hurray!!! You have completed this tutorial. If you have any questions or thoughts on the tutorial, feel free to reach out in the comments below.

Topics

Learn more about Spreadsheets

course

Data Analysis in Google Sheets

3 hr
12.3K
Learn to use Google Sheets to clean, analyze, and draw insights from data. Discover how to sort, filter, and use VLOOKUP to combine data.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

Graphs in Spreadsheets

In this tutorial, you'll learn how to create visualizations to display data and gain more meaningful insights with spreadsheets.
Aditya Sharma's photo

Aditya Sharma

12 min

tutorial

Kaggle Datasets Tutorial: Kaggle Notebooks

Learn about Kaggle datasets and notebooks and get a head start on creating your Kaggle profile.
Çağlar Uslu's photo

Çağlar Uslu

7 min

tutorial

Spreadsheets with Tableau

In this tutorial, you will learn how to analyze and display spreadsheet data using Tableau and make more data-driven decisions.
Parul Pandey's photo

Parul Pandey

14 min

tutorial

Getting Started with Spreadsheets

This tutorial will give you a basic understanding of the terminology in spreadsheets along with learning how to create a basic table.
Ryan Sheehy's photo

Ryan Sheehy

5 min

tutorial

pandas read_csv() Tutorial: Importing Data

Importing data is the first step in any data science project. Learn why today's data scientists prefer the pandas read_csv() function to do this.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

tutorial

Time Series Analysis with Spreadsheets Tutorial

In this tutorial, you'll learn basic time-series concepts and basic methods for forecasting time series data using spreadsheets.
Avinash Navlani's photo

Avinash Navlani

9 min

See MoreSee More