Project: Developing Multi-Input Models For OCR

DigiNsure Inc. is an innovative insurance company focused on enhancing the efficiency of processing claims and customer service interactions. Their newest initiative is digitizing all historical insurance claim documents, which includes improving the labeling of some IDs scanned from paper documents and identifying them as primary or secondary IDs.

To help them in their effort, you'll be using multi-modal learning to train an Optical Character Recognition (OCR) model. To improve the classification, the model will use images of the scanned documents as input and their insurance type (home, life, auto, health, or other). Integrating different data modalities (such as image and text) enables the model to perform better in complex scenarios, helping to capture more nuanced information. The labels that the model will be trained to identify are of two types: a primary and a secondary ID, for each image-insurance type pair.

# Import the necessary libraries
import matplotlib.pyplot as plt
import numpy as np
from project_utils import ProjectDataset
import pickle
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# Load the data
dataset = pickle.load(open('ocr_insurance_dataset.pkl', 'rb'))

# Define a function to visualize codes with their corresponding types and labels
def show_dataset_images(dataset, num_images=5):
    fig, axes = plt.subplots(1, min(num_images, len(dataset)), figsize=(20, 4))
    for ax, idx in zip(axes, np.random.choice(len(dataset), min(num_images, len(dataset)), False)):
        img, lbl = dataset[idx]
        ax.imshow((img[0].numpy() * 255).astype(np.uint8).reshape(64,64), cmap='gray'), ax.axis('off')
        ax.set_title(f"Type: {list(dataset.type_mapping.keys())[img[1].tolist().index(1)]}\nLabel: {list(dataset.label_mapping.keys())[list(dataset.label_mapping.values()).index(lbl)]}")
    plt.show()

# Inspect 5 codes images from the dataset
show_dataset_images(dataset)

# Start coding here

1 - Define the OCRModel class

# Write a class to define the model's structure
class OCRModel(nn.Module):
    def __init__(self):
        super(OCRModel, self).__init__()
        # Define an image layer: initialize convolutional layers
        self.image_layer = nn.Sequential(
            # 1 for gray scale, 16 filters, kernel size 3, and padding 1
            nn.Conv2d(1, 16, kernel_size=3, padding=1),
            # Add pooling to the image layer with kernel size 2, and Activation using nn.ReLU()
            nn.MaxPool2d(kernel_size=2),
            nn.ReLU(),
            # Implement feature flattening, and conclude with nn.Linear() layer
            nn.Flatten(),
            nn.Linear(16*32*32, 128),
        )
        # Define a type layer: add a fully-connected layer
        self.type_layer = nn.Sequential(
            # Map encoded type vector(5 for 5 classes) to an intermediate representation(size 10), then activation function nn.ReLU()
            nn.Linear(5, 10),
            nn.ReLU(),
        )
        # Create a classifier for prediction
        self.classifier = nn.Sequential(
            # Concatenate 128 image features + 10 type features to final output size 64
            nn.Linear(128 + 10, 64),
            nn.ReLU(),
            # 2 for either primary (yes/no) or secondary (yes/no) IDs
            nn.Linear(64, 2),
        )

    # Define the forward pass
    def forward(self, x_image, x_type):
        # Process image through the image layer
        x_image = self.image_layer(x_image)
        # Process type vector through the type layer
        x_type = self.type_layer(x_type)
        # Concatenate outputs along feature dimension
        x = torch.cat((x_image, x_type), dim=1)
        # Pass combined vector through classifier to final predictions and return
        x = self.classifier(x)
        return x

2 - Define optimizer and loss functions

# Load the data in batches
train_dataloader = DataLoader(
    dataset,
    batch_size=10,
    shuffle=True,
)

# Call the model
model = OCRModel()

# Define the optimizer and loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

3 - Train the model

# Train the model for ten epochs and monitor training progress
for epoch in range(10):
    for (images, types), labels in train_dataloader:
        optimizer.zero_grad()
        outputs = model(images, types)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

Project: Developing Multi-Input Models For OCR

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}1 - Define the OCRModel class

2 - Define optimizer and loss functions

3 - Train the model

1 - Define the OCRModel class