Skip to main content

Datasets from Images

This tutorial will demonstrate how you can make datasets in CSV format from images and use them for Data Science, on your laptop.
Aug 2018  · 4 min read

In machine learning, Deep Learning, Datascience most used data files are in json or CSV, here we will learn about CSV and use it to make a dataset. CSV stands for Comma Separated Values. These database fields have been exported into a format that contains a single line where a comma separates each database record. Files with the .csv extension are similar to plain text files. This allows individuals who do not run the same database applications to share database files between one another.

For this tutorial we have few requirements:

1->Microsoft Excel
2->Few Images
3->Notepad++(I suggest it as it is easy to use, you can try with WordPad, notepad or any text editor)

With this, let’s get started.

Now let's install 'Notepad++', visit this link- https://notepad-plus-plus.org/download/v7.5.6.html and install the version that works well on your device.

For 32-Bit

download 32 bit

For 64-Bit

download 64 bit

Now download a few images

Make a new folder (I named it as a dataset), make a few folders in it and fill those folders with images. I have downloaded car number plates from a few parts of the world and stored them folders.

new dataset
car number plates

Open terminal/Command Prompt in the current directory, i.e., in the folder dataset and run commands that I will be giving. Now I will list out commands for windows users:

Command to get a list of folders and files in your directory:-- dir /b/s

Command to get file names and save to a text file:-- dir /b/s/w *.jpg > "filename.txt"

For Linux users(Ubuntu):

the command to get a list of folders and files in your directory:-- ls -LR

Command to get file names and save to a text file:-- ls -LR *.jpg > files.txt

For Mac OSX: macOS is POSIX compliant, so it contains the usual command line utilities found in Unix environments.

the command to get a list of folders and files in your directory:--ls /b/s

Command to get file names and save to a text file:-- ls /b/s/w*.jpg > filename.txt

terminal

terminal

Here file was named by me as 'filename' here you can anything of your wish like 'names.txt', it will be stored in the directory where you used the command prompt (here I wanted only images stored with extension .JPG so I used *.jpg to call them you can use .jpeg or XML anything depending on your extension of images)

filename
dataset

Now enter ctrl+f and remove main directory details, for excel to pull images into it we need to give details of subdirectories and filenames starting with a " ./ " so we replace first backslash \ with ./ and the second one with /

find next

After making changes to the file, save the text file.

save text file

Now open Microsoft Excel, copy all names in a text file and paste them in excel sheet.

excel

If you want to label images, then make another column named label and fill them depending on how you want to label them. Here I labeled them depending on their country

excel

Now save it with extension CSV(Comma delimited) in the folder(dataset) where you have folders containing images.

csv

Now remove text file from that folder and convert folder which we named dataset to zip file.

You're Done!!!

You have successfully made a dataset in CSV format.

Conclusion

This tutorial provides a quick guide on how to make datasets in CSV format from images for data science. I hope you find this tutorial useful when you want to make a dataset. Hurray!!! You have completed this tutorial. If you have any questions or thoughts on the tutorial, feel free to reach out in the comments below.

Learn more about Spreadsheets

Data Analysis in Spreadsheets

Beginner
3 hr
89.7K learners
Learn how to analyze data with spreadsheets using functions such as SUM(), AVERAGE(), and VLOOKUP().
See MoreRight Arrow
Related

Inside Our Favorite DataFramed Episodes of 2022

An inside look at our favorite episodes of the DataFramed podcast of 2022
Adel Nehme's photo

Adel Nehme

2 min

[Infographic] Data Science Project Checklist

Use this checklist when planning your next data science project.
Adel Nehme's photo

Adel Nehme

Introduction to Probability Rules Cheat Sheet

Learn the basics of probability with our Introduction to Probability Rules Cheat Sheet. Quickly reference key concepts and formulas for finding probability, conditional probability, and more.
Richie Cotton's photo

Richie Cotton

1 min

Data Governance Fundamentals Cheat Sheet

Master the fundamentals of data governance with our Data Governance Fundamentals Cheat Sheet. Quickly reference key concepts, best practices, and key components of a data governance program.
Richie Cotton's photo

Richie Cotton

1 min

ChatGPT Cheat Sheet for Data Science

In this cheat sheet, gain access to 60+ ChatGPT prompts for data science tasks.
Travis Tang's photo

Travis Tang

10 min

Docker for Data Science: An Introduction

In this Docker tutorial, discover the setup, common Docker commands, dockerizing machine learning applications, and industry-wide best practices.
Arunn Thevapalan's photo

Arunn Thevapalan

15 min

See MoreSee More