Skip to main content

Datasets from Images

This tutorial will demonstrate how you can make datasets in CSV format from images and use them for Data Science, on your laptop.
Aug 2018  · 4 min read

In machine learning, Deep Learning, Datascience most used data files are in json or CSV, here we will learn about CSV and use it to make a dataset. CSV stands for Comma Separated Values. These database fields have been exported into a format that contains a single line where a comma separates each database record. Files with the .csv extension are similar to plain text files. This allows individuals who do not run the same database applications to share database files between one another.

For this tutorial we have few requirements:

1->Microsoft Excel
2->Few Images
3->Notepad++(I suggest it as it is easy to use, you can try with WordPad, notepad or any text editor)

With this, let’s get started.

Now let's install 'Notepad++', visit this link- https://notepad-plus-plus.org/download/v7.5.6.html and install the version that works well on your device.

For 32-Bit

download 32 bit

For 64-Bit

download 64 bit

Now download a few images

Make a new folder (I named it as a dataset), make a few folders in it and fill those folders with images. I have downloaded car number plates from a few parts of the world and stored them folders.

new dataset
car number plates

Open terminal/Command Prompt in the current directory, i.e., in the folder dataset and run commands that I will be giving. Now I will list out commands for windows users:

Command to get a list of folders and files in your directory:-- dir /b/s

Command to get file names and save to a text file:-- dir /b/s/w *.jpg > "filename.txt"

For Linux users(Ubuntu):

the command to get a list of folders and files in your directory:-- ls -LR

Command to get file names and save to a text file:-- ls -LR *.jpg > files.txt

For Mac OSX: macOS is POSIX compliant, so it contains the usual command line utilities found in Unix environments.

the command to get a list of folders and files in your directory:--ls /b/s

Command to get file names and save to a text file:-- ls /b/s/w*.jpg > filename.txt

terminal

terminal

Here file was named by me as 'filename' here you can anything of your wish like 'names.txt', it will be stored in the directory where you used the command prompt (here I wanted only images stored with extension .JPG so I used *.jpg to call them you can use .jpeg or XML anything depending on your extension of images)

filename
dataset

Now enter ctrl+f and remove main directory details, for excel to pull images into it we need to give details of subdirectories and filenames starting with a " ./ " so we replace first backslash \ with ./ and the second one with /

find next

After making changes to the file, save the text file.

save text file

Now open Microsoft Excel, copy all names in a text file and paste them in excel sheet.

excel

If you want to label images, then make another column named label and fill them depending on how you want to label them. Here I labeled them depending on their country

excel

Now save it with extension CSV(Comma delimited) in the folder(dataset) where you have folders containing images.

csv

Now remove text file from that folder and convert folder which we named dataset to zip file.

You're Done!!!

You have successfully made a dataset in CSV format.

Conclusion

This tutorial provides a quick guide on how to make datasets in CSV format from images for data science. I hope you find this tutorial useful when you want to make a dataset. Hurray!!! You have completed this tutorial. If you have any questions or thoughts on the tutorial, feel free to reach out in the comments below.

Pivot Tables in Spreadsheets

Beginner
4 hours
51,849
Explore the world of Pivot Tables within Google Sheets, and learn how to quickly organize thousands of data points with just a few clicks of the mouse.
See DetailsRight Arrow
Start Course

Intermediate Spreadsheets

Beginner
4 hours
38,462
Expand your spreadsheets vocabulary by diving deeper into data types, including numeric data, logical data, and missing data.

Introduction to Spreadsheets

Beginner
2 hours
65,840
Learn the basics of spreadsheets by working with rows, columns, addresses, and ranges.
See all coursesRight Arrow
Related
Data Science Concept Vector Image

How to Become a Data Scientist in 8 Steps

Find out everything you need to know about becoming a data scientist, and find out whether it’s the right career for you!
Jose Jorge Rodriguez Salgado's photo

Jose Jorge Rodriguez Salgado

12 min

5 Ways to Use Data Science in Marketing

Discover five ways you can use data science in marketing. Get ahead of the game, improve your data skills, and work on a data science marketing project.
Natassha Selvaraj's photo

Natassha Selvaraj

DC Data in Soccer Infographic.png

How Data Science is Changing Soccer

With the Fifa 2022 World Cup upon us, learn about the most widely used data science use-cases in soccer.
Richie Cotton's photo

Richie Cotton

_Quote.png

The Deep Learning Revolution in Space Science

Justin Fletcher joins the show to talk about how the US Space Force is using deep learning with telescope data to monitor satellites, potentially lethal space debris, and identify and prevent catastrophic collisions. 

Richie Cotton's photo

Richie Cotton

53 min

Regular Expressions Cheat Sheet

Regular expressions (regex or regexp) are a pattern of characters that describe an amount of text. Regular expressions are one of the most widely used tools in natural language processing and allow you to supercharge common text data manipulation tasks. Use this cheat sheet as a handy reminder when working with regular expressions.
DataCamp Team's photo

DataCamp Team

How to Create a Dashboard in Excel in 3 Easy Steps

Learn everything you need to know about how to create a dashboard in Excel, with tips and examples.
Joleen Bothma's photo

Joleen Bothma

12 min

See MoreSee More