deep learning 淘寶驗證碼識別

2014年底驗證碼識別稱爲一個熱門的話題，各種12306搶票軟件層出不窮，百度，搜狗，360等公司推出了火車票搶票軟件。在給人們帶來便利的同時，當然給黃牛有了可乘之機。下面介紹幾種我們常見的驗證碼。

從驗證碼識別角度來看（除了百度貼吧驗證碼沒有研究過），小米的驗證碼是最好識別的。12306和淘寶驗證碼的差不多，都出現了粘連字符，在圖像上做粘連字符切割是一個難點。

下面就以淘寶驗證碼識別做一個算法的綜述

一、數據庫的建立

1、字符切割（圖像部分）

二、字符識別（算法識別部分）

三、Matlab GUI的製作和MFC demo

一、數據庫的建立

圖像和機器學習的算法研究中，建立一個自己需求的數據庫是最耗時耗力。如果要去購買一個數據庫也要花費很大的財力。下面給出淘寶驗證碼數據庫建立的過程。

在資源裏有一個產生淘寶驗證碼的小程序，這個程序是直接從淘寶網在線截取的，需要聯網才能使用。可以得到一系列的驗證碼，如下圖所示。

當然得到的驗證碼是無標籤的，因爲這種驗證碼識別是不能用無監督學習的，只能有監督學習才能進行識別，所以必須要進行打碼，就是爲驗證碼貼上標籤，這就避免了人爲貼標籤，人爲貼標籤非常耗時。如下所示。

w7zc YRZY 3KKG ppvt auku 8qaq

mpcf shcu rdcy fzpx EKW2 FTJE

wfpe 8FH8 PWRB dkp5 4rqc zccq

這個是不分大小寫的，當然增加了識別的難度。淘寶這套驗證碼沒有出現0，O，i，1，l，因爲這些人可能都會分錯，故這個驗證碼識別是一個31分類的問題。這套驗證碼的難點在於，字母會出現粗細，粘連交叉，部分大小寫這些難題，而且類別太多，這就爲驗證碼識別的正確與否增大了難度（4個都識別正確纔算正確）。

1、字符切割（圖像部分）

字符分割是驗證碼識別中最難的部分。特別是那些粘連特別嚴重的驗證碼往往會出現錯誤，這就是切割不好。下面分步驟說明驗證碼的切割。

1.1字符的定位

下面先看一張切割的過程驗證碼

I.字符定位切割 II.二值化處理 III.字符切割

首先是定位字符在整張圖片中所處的位置，然後對字符進行切割，分塊，得到子訓練圖像和識別圖像。得到子圖像當然是貼上標籤的。下面是這些圖像的類別2,3,4,5,6,7,8,9,Aa,Bb,Cc,Dd,Ee,Ff,Gg,Hh,Jj,Kk,Mm,Nn,Pp,Qq,Rr,Ss,Tt,Uu,Vv,Ww,Xx,Yy,Zz.下面給出訓練圖像的例子。

跟MNIST一樣，需要生成帶有標籤的batch。

一共在網上抓取了25000張驗證碼，這樣一共生成了10w張子塊，差不多一類3000張，其實對於這套驗證碼是遠遠不夠的，由於標記好的圖片需要打碼。這套驗證碼的圖片類型非常豐富，有細，有粗，傾斜，粘連等等，不像MNIST那麼規範，那麼需要的樣本就要更多了。另外一個原因就是我做的分割是不好的，在短時間內還沒有想到好的圖像分割算法，（現在在嘗試ing）。

一共生成了10個小batch，採用隨機批量學習，這樣不容易造成局部最優，和過擬合的問題，實驗結果證實是正確的。

下面是圖像處理部分的程序

main.m

I = imread('21.jpg');
I(I>=125)=125;
figure
subplot(3,4,[1,2]);
imshow(I);
title('原始驗證碼','FontSize',8);
gray = rgb2gray(I);
%        obj=imresize(obj,[60,80]);
I1 = double(gray)/256;
obj = imcomplement(I1);
obj= im2bw(obj, graythresh(obj));
[ix1,iy1]=xfenge(obj);
[jx1,jy1]=yfenge(obj);
subplot(3,4,[3,4]);
imshow(I);
hold on
x=jx1:jy1;
y=(ix1-1)*ones(jy1-jx1+1,1);
plot(x,y,'k-','linewidth',1.4);
hold on
x=ix1-1:iy1+1;
y=jx1*ones(iy1-ix1+3,1);
plot(y,x,'k-','linewidth',1.4);
hold on
x=ix1-1:iy1+1;
y=jy1*ones(iy1-ix1+3,1);
plot(y,x,'k-','linewidth',1.4);
hold on
x=jx1:jy1;
y=(iy1+1)*ones(jy1-jx1+1,1);
plot(x,y,'k-','linewidth',1.4)
title('定位後的圖像','FontSize',8);
subplot(3,4,[5,6])
obj=I(ix1:iy1,jx1:jy1,:);
imshow(obj);
title('切割後的圖像','FontSize',8);
gray=rgb2gray(obj);
bw=im2bw(gray,0.4);
subplot(3,4,[7,8]);
imshow(bw);
title('二值化之後的驗證碼','FontSize',8);
bw=imresize(bw,[32,128]);
subplot(3,4,9);
patch1=bw(:,1:32);
imshow(patch1);
title('patch1','FontSize',8);
subplot(3,4,10);
patch2=bw(:,33:64);
imshow(patch2);
title('patch2','FontSize',8);
subplot(3,4,11);
patch3=bw(:,65:96);
imshow(patch3);
title('patch3','FontSize',8);
subplot(3,4,12);
patch4=bw(:,97:128);
imshow(patch4);
title('patch4','FontSize',8);

xfenge.m

function [ix,iy]=xfenge(goal1)
[m,n]=size(goal1);
ix(m)=0;
xx=0;j=1;
for  x=1:m
    for y=1:n
        if goal1(x,y)==1;
            xx=1;
        end
    end
    if xx==1
        ix(j)=x;
        j=j+1;
    end
end
ix=ix(1);


iy(m)=0;
xx=0;j=1;
for  x=m:-1:1
    for y=n:-1:1
        if goal1(x,y)==1;
            xx=1;
        end
    end
    if xx==1
        iy(j)=x;
        j=j+1;
    end
end
iy=iy(1);

yfenge.m

function [jx,jy]=yfenge(goal1)
[m,n]=size(goal1);
jx(m)=0;
xx=0;j=1;
for  y=1:n
    for x=1:m
        if goal1(x,y)==1;
            xx=1;
        end
    end
    if xx==1
        jx(j)=y;
        j=j+1;
    end
end
jx=jx(1);


jy(m)=0;
xx=0;j=1;
for  y=n:-1:1
    for x=m:-1:1
        if goal1(x,y)==1;
            xx=1;
        end
    end
    if xx==1
        jy(j)=y;
        j=j+1;
    end
end
jy=jy(1);

二、字符識別（算法識別部分）

由於切割後的子塊是二值化之後，每個batch是稀疏矩陣，這樣用deep learning裏面的sae算法是最好的，速度也是最快的。下面是整個網絡的結構

輸入層是1024個神經元，隱含層1爲700個神經元，需要經過sae確定參數網絡前饋參數，隱含層2爲400個神經元需要sae確定前饋參數，輸出層爲31個神經元，連接一個softmax分類，然後整個網絡再進行微調。

注意：一定不要全批量訓練，雖然速度快，但是會產生局部最優，過擬合的問題，除此之外，內存太小根本跑不了。也要一個一個的輸入，會非常慢。

PS：需要數據集可以發emali：[email protected]。子程序見資源

<span style="font-family:Times New Roman;">clear all
clc
load('final_total_data.mat');
inputSize  = 32 *32;
numClasses = 31;     % Number of classes
numLabels  =31;
hiddenSize=700;
hiddenSize1=400;
sparsityParam = 0.1; 
lambda = 3e-3;       
beta = 3;            
lambda = 1e-4; 
%% ======================================================================

theta = initializeParameters(hiddenSize, inputSize);
opttheta = theta;
%-------------------------------------------------------------------
addpath minFunc/
options.Method = 'lbfgs';
options.maxIter = 1;
options.display = 'on';
for i=1:400
    fprintf('第一層,第');
    fprintf(num2str(i));
    fprintf('次迭代');
    for j=1:20
        sub_data=data(:,(j-1)*5000+1:j*5000);
        [opttheta, loss] = minFunc( @(p) sparseAutoencoderCost(p, ...
            inputSize, hiddenSize, ...
            lambda, sparsityParam, ...
            beta, sub_data), ...
            theta, options);
        theta=opttheta;
    end
end
trainFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...
    data);
%% ==================================================================

theta1 = initializeParameters(hiddenSize1, hiddenSize);
opttheta1 = theta1;
addpath minFunc/
options.Method = 'lbfgs';
options.maxIter = 1;
options.display = 'on';
for i=1:400
    fprintf('第二層,第');
    fprintf(num2str(i));
    fprintf('次迭代');
    fprintf('\n');
    for j=1:5
        trainFeaturesk=trainFeatures(:,(j-1)*20000+1:j*20000);
        [opttheta1, loss] = minFunc( @(p) sparseAutoencoderCost(p, ...
            hiddenSize, hiddenSize1, ...
            lambda, sparsityParam, ...
            beta, trainFeaturesk), ...
            theta1, options);
        theta1=opttheta1;
    end
end
trainFeatures1 = feedForwardAutoencoder(opttheta1, hiddenSize1, hiddenSize, ...
    trainFeatures);
%% ================================================
%STEP 3: 訓練Softmax分類器
fprintf('softmax分類器訓練');
fprintf('\n');
saeSoftmaxTheta = 0.005 * randn(hiddenSize1 * numClasses, 1);
softmaxLambda = 1e-4;
softoptions = struct;
softoptions.maxIter = 400;
softmaxModel = softmaxTrain(hiddenSize1,numClasses,softmaxLambda,...
    trainFeatures1,label,softoptions);
theta_new = softmaxModel.optTheta(:);
%% ============================================================

stack = cell(2,1);
stack{1}.w = reshape(opttheta(1:hiddenSize * inputSize), hiddenSize, inputSize);
stack{1}.b =opttheta(2*hiddenSize*inputSize+1:2*hiddenSize*inputSize+hiddenSize);
stack{2}.w = reshape(opttheta1(1:hiddenSize1 * hiddenSize), hiddenSize1, hiddenSize);
stack{2}.b =opttheta1(2*hiddenSize1*hiddenSize+1:2*hiddenSize*hiddenSize1+hiddenSize1);
[stackparams, netconfig] = stack2params(stack);

stackedAETheta = [theta_new;stackparams];
addpath minFunc/;
options = struct;
options.Method = 'lbfgs';
options.maxIter = 1;
options.display = 'on';
for i=1:1000
    fprintf('全局微調,第');
    fprintf(num2str(i));
    fprintf('次迭代');
    fprintf('\n');
    for j=1:20
        sub_data=data(:,(j-1)*5000+1:j*5000);
        sub_label=label((j-1)*5000+1:j*5000);
        [stackedAEOptTheta,cost] =  minFunc(@(p)stackedAECost(p,inputSize,hiddenSize1,numClasses, netconfig,lambda, sub_data, sub_label),stackedAETheta,options);
        stackedAETheta=stackedAEOptTheta;
    end
end
%% 測試
[pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSize1, ...
    numClasses, netconfig,data_new(:,100001:end));
acc1 = mean(label(100001:end) == pred(:));
fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc1 * 100);
[pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSize1, ...
    numClasses, netconfig,data(:,100001:end));
acc = mean(label(100001:end) == pred(:));
fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100);
save('stackedAEOptTheta.mat','stackedAEOptTheta');</span>

最後的參數保存在stackedAEOptTheta.mat中。

三、Matlab GUI的製作和MFC demo

MATLAB GUI需要stackAEPredict、xfenge、yfenge函數，和參數stackedAEOptTheta.mat，netconfig

程序見資源

MFC demo

見資源

最終識別率能達到50%。

懷柔滑雪場

deep learning 淘寶驗證碼識別

deep learning 淘寶驗證碼識別

轉自科學網：《評論：“副教授，我缺少的是什麼？”》

在大學裏我們應該學習什麼

混沌方法的數字圖像加密

語音識別（MFCC）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結