当前位置：网站首页 > 技术分类 > 正文

利用pytorch CNN手写字母识别神经网络模型识别多手写字母(A-Z)

ztj100 2024-10-31 16:14 14 浏览 0 评论

往期的文章，我们分享了手写字母的训练与识别

使用EMNIST数据集训练第一个pytorch CNN手写字母识别神经网络

利用pytorch CNN手写字母识别神经网络模型识别手写字母

哪里的文章，我们只是分享了单个字母的识别，如何进行多个字母的识别，其思路与多数字识别类似，首先对图片进行识别，并进行每个字母的轮廓识别，然后进行字母的识别，识别完成后，直接在图片上进行多个字母识别结果的备注

Pytorch利用CNN卷积神经网络进行多数字（0-9）识别

搭建神经网络

根据上期文章的分享，我们搭建一个手写字母识别的神经网络

import torch
import torch.nn as nn
from PIL import Image  # 导入图片处理工具
import PIL.ImageOps
import numpy as np
from torchvision import transforms
import cv2
import matplotlib.pyplot as plt
# #####设置参数#######################
widthImg = 640
heightImg = 480
kernal = np.ones((5, 5))
minArea = 800
# 定义神经网络
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(  # input shape (1, 28, 28)
            nn.Conv2d(
                in_channels=1,  # 输入通道数
                out_channels=16,  # 输出通道数
                kernel_size=5,   # 卷积核大小
                stride=1,  #卷积步数
                padding=2,  # 如果想要 con2d 出来的图片长宽没有变化, 
                            # padding=(kernel_size-1)/2 当 stride=1
            ),  # output shape (16, 28, 28)
            nn.ReLU(),  # activation
            nn.MaxPool2d(kernel_size=2),  # 在 2x2 空间里向下采样, output shape (16, 14, 14)
        )
        self.conv2 = nn.Sequential(  # input shape (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),  # output shape (32, 14, 14)
            nn.ReLU(),  # activation
            nn.MaxPool2d(2),  # output shape (32, 7, 7)
        )
        self.out = nn.Linear(32 * 7 * 7, 37)  # 全连接层，A/Z,a/z一共37个类

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)  # 展平多维的卷积图成 (batch_size, 32 * 7 * 7)
        output = self.out(x)
        return output

第一层，我们输入Eminist的数据集，Eminist的数据图片是一维 28*28的图片，所以第一层的输入（1，28，28），高度为1，设置输出16通道，使用5*5的卷积核对图片进行卷积运算，每步移动一格，为了避免图片尺寸变化，设置pading为2，则经过第一层卷积就输出（16，28，28）数据格式

再经过relu与maxpooling （使用2*2卷积核）数据输出（16，14，14）

第二层卷积层是简化写法nn.Conv2d(16, 32, 5, 1, 2)的第一个参数为输入通道数in_channels=16，其第二个参数是输出通道数out_channels=32, # n_filters（输出通道数），第三个参数为卷积核大小，第四个参数为卷积步数，最后一个为pading,此参数为保证输入输出图片的尺寸大小一致

        self.conv2 = nn.Sequential(  # input shape (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),  # output shape (32, 14, 14)
            nn.ReLU(),  # activation
            nn.MaxPool2d(2),  # output shape (32, 7, 7)
        )

全连接层，最后使用nn.linear()全连接层进行数据的全连接数据结构（32*7*7,37）以上便是整个卷积神经网络的结构，

大致为：input-卷积-Relu-pooling-卷积
-Relu-pooling-linear-output

卷积神经网络建完后，使用forward（）前向传播神经网络进行输入图片的识别

step 2:图片预处理

# 预处理函数

def preProccessing(img):
    imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    imgBlur = cv2.GaussianBlur(imgGray, (5, 5), 1)
    imgCanny = cv2.Canny(imgBlur, 200, 200)
    imgDial = cv2.dilate(imgCanny, np.ones((5, 5)), iterations=2)  # 膨胀操作
    imgThres = cv2.erode(imgDial, np.ones((5, 5)), iterations=1)  # 腐蚀操作
    return imgThres

这里我们使用腐蚀，膨胀操作对图片进行一下预处理操作，方便神经网络的识别，当然，我们往期的字母数字识别也可以添加此预处理操作，方便神经网络进行预测，提高精度

step 3:图片轮廓检测获取每个数字的坐标位置

def getContours(img):
    x, y, w, h, xx, yy, ss = 0, 0, 10, 10, 20, 20, 10  # 因为图像大小不能为0
    imgGet = np.array([[], []])  # 不能为空
    contours, hierarchy = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)  # 检索外部轮廓
    for cnt in contours:  
        area = cv2.contourArea(cnt)
        if area > 800:  # 面积大于800像素为封闭图形
            cv2.drawContours(imgCopy, cnt, -1, (255, 0, 0), 3)  
            peri = cv2.arcLength(cnt, True)  # 计算周长
            approx = cv2.approxPolyDP(cnt, 0.02 * peri, True)  # 计算有多少个拐角
            x, y, w, h = cv2.boundingRect(approx)  # 得到外接矩形的大小
            a = (w + h) // 2
            dd = abs((w - h) // 2)  # 边框的差值
            imgGet = imgProcess[y:y + h, x:x + w]
            if w <= h:  # 得到一个正方形框，边界往外扩充20像素,黑色边框
                imgGet = cv2.copyMakeBorder(imgGet, 20, 20, 20 + dd, 20 + dd, cv2.BORDER_CONSTANT, value=[0, 0, 0])
                xx = x - dd - 10
                yy = y - 10
                ss = h + 20
                cv2.rectangle(imgCopy, (x - dd - 10, y - 10), (x + a + 10, y + h + 10), (0, 255, 0),
                              2)  # 看看框选的效果，在imgCopy中
                print(a + dd, h)
            else:  # 边界往外扩充20像素值
                imgGet = cv2.copyMakeBorder(imgGet, 20 + dd, 20 + dd, 20, 20, cv2.BORDER_CONSTANT, value=[0, 0, 0])
                xx = x - 10
                yy = y - dd - 10
                ss = w + 20
                cv2.rectangle(imgCopy, (x - 10, y - dd - 10), (x + w + 10, y + a + 10), (0, 255, 0), 2)
                print(a + dd, w)
            Temptuple = (imgGet, xx, yy, ss)  # 将图像及其坐标放在一个元组里面，然后再放进一个列表里面就可以访问了
            Borderlist.append(Temptuple)

    return Borderlist

getContours函数主要是进行图片中数字区域的区分，把每个数字的坐标检测出来，这样就可以把每个字母进行CNN卷积神经网络的识别，进而实现多个字母识别的目的

step 4模型处理

Borderlist = []  # 不同的轮廓图像及坐标
Resultlist = []  # 识别结果
img = cv2.imread('55.png')

imgCopy = img.copy()
imgProcess = preProccessing(img)
Borderlist = getContours(imgProcess)

train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Grayscale(),
    transforms.Resize((28, 28)),
    transforms.ToTensor(),
])

model = CNN()
model.load_state_dict(torch.load('./model/Eminist.pth', map_location='cpu'))
model.eval()

首先，输入一张需要检测的图片，通过preProccessing图片预处理与getContours函数获取图片中的每个字母的轮廓位置

transforms.Compose此函数可以把输入图片进行pytorch相关的图片操作，包括转换到torch，灰度空间转换，resize，缩放等等操作

然后加载我们前期训练好的模型

step 5 UTF8字符转换

def get_mapping(num, with_type='letters'):
    """
    根据 mapping，由传入的 num 计算 UTF8 字符。
    """
    if with_type == 'byclass':
        if num <= 9:
            return chr(num + 48)  # 数字
        elif num <= 35:
            return chr(num + 55)  # 大写字母
        else:
            return chr(num + 61)  # 小写字母
    elif with_type == 'letters':
        return chr(num + 64)   # 大写/小写字母
    elif with_type == 'digits':
        return chr(num + 96)
    else:
        return num

由于神经网络识别完成后，反馈给程序的是字母的UTF-8编码，我们通过查表来找到对应的字母

字符编码表（UTF-8）

step 6 神经网络识别

if len(Borderlist) != 0:  # 不能为空
    for (imgRes, x, y, s) in Borderlist:
        cv2.imshow('imgCopy', imgRes)
        cv2.waitKey(0)
        imgRes = cv2.flip(imgRes,1)
        (h, w) = imgRes.shape[:2] 
        (cX,cY) = (w // 2, h // 2) 
        M = cv2.getRotationMatrix2D((cX,cY), 90, 1.0) 
        cos = np.abs(M[0, 0])
        sin = np.abs(M[0, 1])
        nW = int((h * sin) + (w * cos))
        nH = int((h * cos) + (w * sin))
        M[0, 2] += (nW / 2) - cX
        M[1, 2] += (nH / 2) - cY
        imgRes = cv2.warpAffine(imgRes, M, (nW, nH)) 
        cv2.imshow('imgThres',imgRes)
        cv2.waitKey(0)        
        img = train_transform(imgRes)
        img = torch.unsqueeze(img, dim=0)
        with torch.no_grad():
            pre = model(img)
            output = torch.squeeze(pre)
            predict = torch.softmax(output, dim=0)
            predict_cla = torch.argmax(predict).numpy()
            print(get_mapping(predict_cla), predict[predict_cla].numpy())
            result = get_mapping(predict_cla)

        cv2.rectangle(imgCopy, (x, y), (x + s, y + s), color=(0, 255, 0), thickness=1)
        cv2.putText(imgCopy, result, (x + s // 2 - 5, y + s // 2 - 5), cv2.FONT_HERSHEY_COMPLEX, 1.5, (0, 0, 255), 2)
cv2.imshow('imgCopy', imgCopy)
cv2.waitKey(0)

通过上面的操作，我们已经识别出了图片中包括的字母轮廓，我们遍历每个字母轮廓，获取单个字母图片数据，这里需要特殊提醒一下：我们知道EMNIST数据库左右翻转图片后，又进行了图片的逆时针旋转90度

这里我们使用cv2.flip(imgRes,1)函数，进行图片的镜像，并使用getRotationMatrix2D函数与warpAffine函数配合来进行图片的旋转操作，这里就没有PIL来的方便些

				imgRes = cv2.flip(imgRes,1)
        (h, w) = imgRes.shape[:2] 
        (cX,cY) = (w // 2, h // 2) 
        M = cv2.getRotationMatrix2D((cX,cY), 90, 1.0) 
        cos = np.abs(M[0, 0])
        sin = np.abs(M[0, 1])
        nW = int((h * sin) + (w * cos))
        nH = int((h * cos) + (w * sin))
        M[0, 2] += (nW / 2) - cX
        M[1, 2] += (nH / 2) - cY
        imgRes = cv2.warpAffine(imgRes, M, (nW, nH))

然后，我们对图片数据进行torch转换train_transform(imgRes)，并传递给神经网络进行识别

待识别完成后，就可以把结果备注在原始图片上

transforms.resize

上一篇：Pytorch小抄宝典（pytorch经典书籍）
下一篇：LeNet:一个简单的卷积神经网络PyTorch实现

利用pytorch CNN手写字母识别神经网络模型识别多手写字母(A-Z)

搭建神经网络

step 2:图片预处理

step 3:图片轮廓检测获取每个数字的坐标位置

step 4模型处理

step 5 UTF8字符转换

step 6 神经网络识别

相关推荐

取消回复欢迎你发表评论:

Vue自定义Hook示例:useUrlState（vue中的自定义指令如何使用）

Vue-实现自定义插件弹窗（vue 实现弹窗）

旗舰机新标杆 OPPO Find X2系列正式发布售价5499元起

什么是幂等?分布式锁如何实现业务幂等?

【Python机器学习系列】建立多层感知机模型预测心脏疾病

如何发个 npm 包?

详解MySQL 字符串拼接之concat\concat_ws\group_concat

手把手教你搞定菜单权限设计，精确到按钮级别，建议收藏

如何快速切换node版本?利用n包快速切换nodejs版本

让Jenkins自动部署你的Vue项目「实践」

利用pytorch CNN手写字母识别神经网络模型识别多手写字母(A-Z)

搭建神经网络

step 2:图片预处理

step 3:图片轮廓检测获取每个数字的坐标位置

step 4模型处理

step 5 UTF8字符转换

step 6 神经网络识别

相关推荐

取消回复欢迎 你 发表评论:

Vue自定义Hook示例:useUrlState（vue中的自定义指令如何使用）

Vue-实现自定义插件弹窗（vue 实现弹窗）

旗舰机新标杆 OPPO Find X2系列正式发布 售价5499元起

什么是幂等?分布式锁如何实现业务幂等?

【Python机器学习系列】建立多层感知机模型预测心脏疾病

如何发个 npm 包?

详解MySQL 字符串拼接之concat\concat_ws\group_concat

手把手教你搞定菜单权限设计，精确到按钮级别，建议收藏

如何快速切换node版本?利用n包快速切换nodejs版本

让Jenkins自动部署你的Vue项目「实践」

取消回复欢迎你发表评论:

旗舰机新标杆 OPPO Find X2系列正式发布售价5499元起