Fashion Forward is a new AI-based e-commerce clothing retailer. They want to use image classification to automatically categorize new product listings, making it easier for customers to find what they're looking for. It will also assist in inventory management by quickly sorting items.
As a data scientist tasked with implementing a garment classifier, my primary objective is to develop a machine learning model capable of accurately categorizing images of clothing items into distinct garment types such as shirts, trousers, shoes, etc.
# Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchmetrics import Accuracy, Precision, Recall
import torch
import torchvision
from torchvision import datasets
import torchvision.transforms as transforms# Load datasets
train_data = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_data = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())train_datatest_dataView the dataset
class_names = [
'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'
]fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for i in range(10):
image, label = train_data[i]
ax = axes[i // 5, i % 5]
ax.imshow(image.squeeze(), cmap='gray')
ax.set_title(class_names[label], fontsize=8)
ax.axis('off')
plt.tight_layout()
plt.show()Model Architecture
# Obtain the number of classes for prediction
num_classes = len(train_data.classes)
num_classesπ§ GarmentClassifier CNN Model Architecture
This convolutional neural network (CNN) is designed to classify grayscale clothing images from the FashionMNIST dataset into 10 predefined garment categories.
π Architecture Overview
The model has two main components:
- Feature Extractor: Extracts visual patterns using convolutional layers
- Classifier: Fully connected layer that maps extracted features to output classes
π Layer Breakdown
| Layer Type | Description |
|---|---|
Conv2d | Input: 1 channel β 32 filters, 3Γ3 kernel, stride 1, padding 1 |
ReLU | Adds non-linearity |
MaxPool2d | Reduces spatial dimension by half (2Γ2) |
Conv2d | 32 channels β 64 filters, 3Γ3 kernel, stride 1, padding 1 |
ReLU | Adds non-linearity |
MaxPool2d | Again reduces spatial dimension by half |
Flatten | Converts 3D feature maps into 1D vectors for dense layers |
Linear | Fully connected layer with output size equal to number of classes |
π§Ύ Input
- Shape:
(batch_size, 1, 28, 28) - Grayscale images from FashionMNIST
- Preprocessed using normalization and resizing
π§Ύ Output
- Shape:
(batch_size, num_classes) - Raw class scores (logits) for each clothing category
π Notes
- Uses only basic CNN components for simplicity and transparency
- No dropout or batch normalization layers included
- Suitable for image classification tasks with small and structured datasets
# Defining the CNN architecture
class GarmentClassifier(nn.Module):
def __init__(self, num_classes):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.relu1 = nn.ReLU()
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
self.fc = nn.Linear(64*5*5, num_classes)
def forward(self, x):
x = self.pool1(self.relu1(self.conv1(x)))
x = self.pool2(self.relu2(self.conv2(x)))
x = x.view(x.size(0), -1)
x = self.fc(x)
return xTraining the Model
# Hyperparameters
batch_size = 64
learning_rate = 0.001
num_epochs = 1β
β