長短期記憶網絡LSTM識別驗證碼、車牌識別

長短期記憶網絡LSTM、驗證碼識別、車牌識別

關於LSTM的介紹和認識，可以參考這篇文章

長短期記憶網絡LSTM：https://blog.csdn.net/eagleuniversityeye/article/details/91345671

數據處理：

…entry原圖 ———————— reshape展開 —————— permute換軸 ———————— 輸入LSTM

當我們傳入車牌時，會將車牌展開爲 C H W 形狀，然後轉爲(C*H) W 形狀，如下圖一樣，依次將1、2、3、4……傳入RNN模型進行識別，RNN的特點就是可以保留輸入的序列信息，所以當第3次輸入以後，模型輸出的就是包含“川”字的信息，當第5次輸入以後，模型輸出會包含“川A”的信息，以此類推。但是普通RNN有個致命的缺點：只能解決短期依賴，就是如果序列過長，很難保留下靠前的輸入。因此，一個變種的RNN——LSTM解決了這個問題，他在RNN裏面加入3個門來決定對前面的信息應該保留和丟棄哪些信息，所以這裏我們也是選用的LSTM網絡模型。

從上圖我們可以看出，只有1列1列的信息組合起來，才能在幾次輸入以後得到某個字符的完整信息（比如1、2、3次輸入以後可以得到川字的完整信息），如果1行1行輸入，很難保留單個字符的完整信息，而要將數據1列1列的切出來傳入模型，又會有點麻煩，所以我們在將上圖做一次變換，做permute換軸操作，得到如下圖的形狀。

現在將圖片一行一行循環索引，就能很好的得到單個字符的信息了。

一、LSTM識別驗證碼——一個模型

使用LSTM結合Seq2Seq結構實現驗證碼識別
驗證碼樣式如下圖：

代碼生成42000張驗證碼（train：40000， test：2000），驗證碼有清晰的，有低度模糊的，也有中度模糊的，位置也隨機。
驗證碼和標籤採用DataLoader加載，標籤採用4*10的one-hot編碼，網絡輸出每個圖片也是4*10，訓練20輪即達到了正確率100%，效果不錯。

下面是模型部分代碼，其他部分的代碼就不貼了，損失函數MSELoss，優化器Adam。

import torch
from torch import nn


class Lstm(nn.Module):
	def __init__(self):
		super().__init__()
		self.fc1 = nn.Sequential(
			nn.Linear(180, 128),
			nn.BatchNorm1d(128),
			nn.LeakyReLU(),
		)
		self.lstm1 = nn.LSTM(128, 256, 2, batch_first=True)
		self.lstm2 = nn.LSTM(256, 128, 2, batch_first=True)
		self.fc2 = nn.Sequential(
			nn.Linear(128, 10),
		)

	def forward(self, entry):							# N C H W		N * 3 * 60 * 120
		entry = entry.reshape(-1, 3*60, 120)			# N V S			N * 180 * 120
		entry = entry.permute(0, 2, 1)					# N S V			N * 120 * 180
		entry = entry.reshape(-1, 180)					# N V			120N * 180
		fc1_out = self.fc1(entry)						# N V			120N * 128
		fc1_out = fc1_out.reshape(-1, 120, 128)			# N S V			N * 120 * 128
		lstm1_out, _ = self.lstm1(fc1_out)				# N S V			N * 120 * 256網絡會輸出S次
		lstm1_out = lstm1_out[:, -1, :]					# N V 			N * 256只保留最後一次輸出
		lstm1_out = lstm1_out.reshape(-1, 1, 256)		# N 1 V			N * 1 * 256
		# 下行代碼：N 4 V		廣播爲N * 4 * 256,後面對每個256提取特徵輸出做損失，後面的優化使得每個V保留一個字符的特徵
		lstm1_out = lstm1_out.expand(lstm1_out.shape[0], 4, 256)
		lstm2_out, _ = self.lstm2(lstm1_out)			# N 4 V			N * 4 * 128
		lstm2_out = lstm2_out.reshape(-1, 128)			# 4N, V			4N * 128
		fc2_out = self.fc2(lstm2_out)					# 4N, V			4N * 10
		fc2_out = fc2_out.reshape(-1, 4, 10)			# N S V			N * 4 * 10

		return fc2_out

二、編碼器和解碼器分離

import torch
from torch import nn


class Encoder(nn.Module):
	def __init__(self):
		super().__init__()
		self.fc = nn.Sequential(
			nn.Linear(180, 128),
			nn.BatchNorm1d(128),
			nn.LeakyReLU(),
		)
		self.lstm = nn.LSTM(128, 256, 2, batch_first=True)			# V h num_layer

	def forward(self, x):							# N C H W		N 3 60 120
		x = x.reshape(-1, 180, 120)					# N V S			N 180 120
		x = x.permute(0, 2, 1)						# N S V			N 120 180
		x = x.reshape(-1, 180)						# N V			120N 180
		fc_out = self.fc(x)							# N V			120N 128
		fc_out = fc_out.reshape(-1, 120, 128)		# N S V			N 120 128
		lstm_out, _ = self.lstm(fc_out)				# N S V			N 120 256
		lstm_out = lstm_out[:, -1, :]				# N V			N 256
		lstm_out = lstm_out.reshape(-1, 1, 256)		# N 1 V			N 1 256
		lstm_out = lstm_out.expand(lstm_out.shape[0], 4, 256)		# N 4 256
		return lstm_out


class Decoder(nn.Module):
	def __init__(self):
		super().__init__()
		self.lstm = nn.LSTM(256, 128, 2, batch_first=True)
		self.fc = nn.Sequential(
			nn.Linear(128, 10),
		)

	def forward(self, x):
		lstm_out, _ = self.lstm(x)						# N S V			N 4 128
		lstm_out = lstm_out.reshape(-1, 128)			# N V			4N 128
		fc_out = self.fc(lstm_out)						# N V			4N 10
		fc_out = fc_out.reshape(-1, 4, 10)				# N S V			N 4 10
		return fc_out


class Net(nn.Module):
	def __init__(self):
		super().__init__()
		self.encoder = Encoder()
		self.decoder = Decoder()

	def forward(self, x):
		encoder = self.encoder(x)
		decoder = self.decoder(encoder)

		return decoder


# 直接實例化Net()即可，優化也是直接優化Net()的權重即可
# self.net = Net().to(self.device)
# self.opt = torch.optim.Adam(self.net.parameters())

可以修改LSTM參數以改變模型識別率，代價是計算量的增減。

三、車牌識別

車牌識別的原理和驗證碼識別相似，不過車牌識別最後將第一個漢字和後面6個字符分開輸出單獨做損失（當然也可以一起輸出做損失），漢字爲29個省及直轄市簡稱，相當於做29分類，字符爲24個字母（車牌中沒有字母I、O）+10個數字，相當於後6個字符做34分類

3.1訓練效果

17萬張車牌數據集進行訓練，訓練了8個epoch，第9個epoch在與訓練集無重複的驗證集上就達到了100%正確的精度。
（車牌是標準尺度、標準角度、標準光線，是用代碼生成的數據集）如下所示

損失圖（每個epoch保存17個數據，11個epoch得到的損失和正確率曲線圖），藍色：損失；橙色：當個字符識別正確率；綠色：車牌上7個字符都識別正確率

3.2 模型

輸入爲N C H W （N * 3 * 40 * 150）
輸出爲N V 和 N S V（N * 29， N * 6 * 34）
29爲漢字29分類，6*34爲6個（字符+數字34分類）

class Lstm(nn.Module):
	def __init__(self):
		super().__init__()
		self.fc1 = nn.Sequential(
			nn.Linear(120, 256),
			nn.BatchNorm1d(256),
			nn.LeakyReLU(),
		)
		self.lstm1 = nn.LSTM(256, 512, 2, batch_first=True)
		self.lstm2 = nn.LSTM(512, 256, 2, batch_first=True)
		self.fc_pai1 = nn.Sequential(
			nn.Linear(256, 29)
		)
		self.fc_pai6 = nn.Sequential(
			nn.Linear(256, 34)
		)

	def forward(self, entry):							# N C H W		N * 3 * 40 * 150
		entry = entry.reshape(-1, 3*40, 150)			# N V S			N * 120 * 150
		entry = entry.permute(0, 2, 1)					# N S V			N * 150 * 120
		entry = entry.reshape(-1, 120)					# N V			150N * 120
		fc1_out = self.fc1(entry)						# N V			150N * 256
		fc1_out = fc1_out.reshape(-1, 150, 256)			# N S V			N * 150 * 256
		lstm1_out, _ = self.lstm1(fc1_out)				# N S V			N * 150 * 512網絡會輸出S次
		lstm1_out = lstm1_out[:, -1, :]					# N V 			N * 512只保留最後一次輸出
		lstm1_out = lstm1_out.reshape(-1, 1, 512)		# N 1 V			N * 1 * 512

		# 下行代碼：N 7 V		廣播爲N * 7 * 512,後面對每個256提取特徵輸出做損失，後面的優化使得每個
		# V保留一個字符的特徵
		lstm1_out = lstm1_out.expand(lstm1_out.shape[0], 7, 512)
		lstm2_out, _ = self.lstm2(lstm1_out)			# N 7 V			N * 7 * 256
		pai1 = lstm2_out[:, 0, :]						# 切出第一位，漢字
		pai6 = lstm2_out[:, 1:, :]						# 切出後6位字符
		pai1_out = self.fc_pai1(pai1)					# N, V			N * 29
		pai6 = pai6.reshape(-1, 256)					# 6N, V			6N * 256
		pai6_out = self.fc_pai6(pai6)					# 6N, V			6N * 34
		pai6_out = pai6_out.reshape(-1, 6, 34)			# N S V			N * 6 * 34

		return pai1_out, pai6_out

print('The End !')

長短期記憶網絡LSTM識別驗證碼、車牌識別

長短期記憶網絡LSTM、驗證碼識別、車牌識別

數據處理：

一、LSTM識別驗證碼——一個模型

二、編碼器和解碼器分離

三、車牌識別

3.1訓練效果

3.2 模型

開源高性能結構化日誌模塊NanoLog

杭州的 IT 崩盤了麼？

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

Tools：IOU、NMS、RAdam、one-hot

人臉識別-arcface損失函數

神經網絡打印模型參數及參數名字和數量

Python解析Json文件

神經網絡中的梯度爆炸

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結