從條碼識別中學習到的（來自課程《OpenCV計算機視覺產品實戰2》)

零、基本情況

條碼（一維碼）已經廣泛應用於我們日常生產實際，

傳統的條碼識讀方法是使用專用的激光掃描器來掃描條碼，從而獲取條碼中的信息。這個過程人工介入程度較深、一般用於吞吐量較大的專業領域。

近年來隨着圖像處理技術的發展，特別是終端手持設備性能的增強，廣泛出現基於圖像進行識別的情況。

其中非常典型的情況就是在工業控制領域，存在一種專門用來高速讀碼的設備，用於處理工業場景下的讀碼需求

解決的問題是自然環境中，任意條形碼的識別問題。作爲比較，傳統的方法在這裏：

https://blog.csdn.net/qq_42722197/article/details/112691078

目前能夠解碼的數量比較有限

採用的主要流程是“先定位、再解碼”

1、首先，基於梯度一致性實現條碼定位

2、基於原理實現的條碼識別

3、較zxing和 zbar均識別準確率和速度的提高，缺陷是支持的識別碼較少（對比參考：數據集地址：https://github.com/SUSTech-OpenCV/BarcodeTestDataset，共250張條碼圖片）上進行了測試，我們的識別算法正確率達到了96%，速度爲20ms每張圖像。作爲對比，我們也測試了ZXing在該數據集上的表現，其正確率爲64.4%，速度爲90ms每張圖像。）

opencv中已經出現相關函數

具體的使用說明參考：

https://zhuanlan.zhihu.com/p/367162545?ivk_sa=1024320u

#include "opencv2/barcode.hpp"
#include "opencv2/imgproc.hpp"

using namespace cv;
Ptr<barcode::BarcodeDetector> bardet = makePtr<barcode::BarcodeDetector>("sr.prototxt", "sr.caffemodel"); //如果不使用超分辨率則可以不指定模型路徑
Mat input = imread("your file path");
Mat corners; //返回的檢測框的四個角點座標，如果檢測到N個條碼，那麼維度應該是[N][4][2]
std::vector<std::string> decoded_info; //返回的解碼結果，如果解碼失敗，則爲空string
std::vector<barcode::BarcodeType> decoded_format; //返回的條碼類型，如果解碼失敗，則爲BarcodeType::NONE
bool ok = bardet->detectAndDecode(input, decoded_info, decoded_format, corners);
 
import cv2
 
bardet = cv2.barcode_BarcodeDetector()
img = cv2.imread("your file path")
ok, decoded_info, decoded_type, corners = bardet.detectAndDecode(img)

提交的內容：

barcode目前被安排到contrib庫中，交流過程：https://github.com/opencv/opencv_contrib/pull/2757

形碼定位部分

一、條形碼定位

定位的結果是獲得條碼的位置

    vector<Point2f> points;
    bool ok = this->detect(img, points);

經過研究，選擇了“梯度方向一致性”的解決方法：

• 基於條碼區域的紋理信息和形狀信息來定位條碼
• 基本形態學操作來定位
• Hough變化檢測直線來進行條碼定位
• 利用條碼區域梯度方向的一致性定位條碼
• 基於神經網絡的條碼定位方法

所謂（局部）梯度方向一致性，就是首先對圖像進行分塊，而後計算每塊的梯度大小，最後進行篩選

而後基於改進的腐蝕，進一步處理

再經過“塊聯通”“最小外接矩形擬合”“候選區域篩選”“非極大值抑制”的方法定位：

這個Pipline雖然不是非常複雜，但是如果確實能夠提供好的效果，非常值得學習。

來看代碼：

bool BarcodeDetector::detect(InputArray img, OutputArray points) const
{
    Mat inarr;
    if (!checkBarInputImage(img, inarr))  //主要是判斷圖像是否存在
    {
        points.release();
        return false;
    }
 
    Detect bardet;         
    bardet.init(inarr); //主要是將圖像進行最佳識別尺度縮放
    bardet.localization();//主要是用於圖像定位
    if (!bardet.computeTransformationPoints())//計算出旋轉方法
    { return false; }
    vector<vector<Point2f>> pnts2f = bardet.getTransformationPoints();
    vector<Point2f> trans_points;
    for (auto &i : pnts2f)
    {
        for (const auto &j : i)
        {
            trans_points.push_back(j);
        }
    }
 
    updatePointsResult(points, trans_points);//輸出結果
    return true;
}

其中，checkBarInputImage實現的是對圖像的檢測，是值得複用的

static bool checkBarInputImage(InputArray img, Mat &gray)
{
    CV_Assert(!img.empty());
    CV_CheckDepthEQ(img.depth(), CV_8U, "");
    if (img.cols() <= 40 || img.rows() <= 40)
    {
        return false; // image data is not enough for providing reliable results
    }
    int incn = img.channels();
    CV_Check(incn, incn == 1 || incn == 3 || incn == 4, "");
    if (incn == 3 || incn == 4)
    {
        cvtColor(img, gray, COLOR_BGR2GRAY);
    }
    else
    {
        gray = img.getMat();
    }
    return true;
}

init主要實現的是縮放，我們知道對於識別來說，過大的圖片也是不合適的

void Detect::init(const Mat &src)
{
    const double min_side = std::min(src.size().width, src.size().height);
    if (min_side > 512.0)
    {
        purpose = SHRINKING;
        coeff_expansion = min_side / 512.0;
        width = cvRound(src.size().width / coeff_expansion);
        height = cvRound(src.size().height / coeff_expansion);
        Size new_size(width, height);
        resize(src, resized_barcode, new_size, 0, 0, INTER_AREA);
    }
    else
    {
        purpose = UNCHANGED;
        coeff_expansion = 1.0;
        width = src.size().width;
        height = src.size().height;
        resized_barcode = src.clone();
    }
}

localization纔是實際上PipleLine的主體。但是在具體實現上顯然是沒有一致對應的。

void Detect::localization()
{
    localization_bbox.clear();
    bbox_scores.clear();
    preprocess();  //獲得積分圖像
    //基於經驗的縮放設定
    static constexpr float SCALE_LIST[] = {0.01f, 0.03f, 0.06f, 0.08f};
    const auto min_side = static_cast<float>(std::min(width, height));
    int window_size;
    for (const float scale:SCALE_LIST)
    {
       window_size = cvRound(min_side * scale);
        calCoherence(window_size);  //計算連貫性
        barcodeErode();             //專用腐蝕
        regionGrowing(window_size);  //區域增長  
    }
}

積分圖像：作爲類的實現，這些過程變量都作爲類的參數，這裏的作用主要是計算積分圖

Mat resized_barcode, gradient_magnitude, coherence, orientation, edge_nums, integral_x_sq, integral_y_sq, integral_xy, integral_edges;
void Detect::preprocess()
{
    Mat scharr_x, scharr_y, temp;
    static constexpr double THRESHOLD_MAGNITUDE = 64.;
    Scharr(resized_barcode, scharr_x, CV_32F, 1, 0); //實現梯度
    Scharr(resized_barcode, scharr_y, CV_32F, 0, 1);
    // calculate magnitude of gradient and truncate
    magnitude(scharr_x, scharr_y, temp); //計算平方和
    threshold(temp, temp, THRESHOLD_MAGNITUDE, 1, THRESH_BINARY); //根據經驗過濾
    temp.convertTo(gradient_magnitude, CV_8U);
    integral(gradient_magnitude, integral_edges, CV_32F); //計算出積分，用於邊緣？
 
    for (int y = 0; y < height; y++)
    {
        auto *const x_row = scharr_x.ptr<float_t>(y);
        auto *const y_row = scharr_y.ptr<float_t>(y);
        auto *const magnitude_row = gradient_magnitude.ptr<uint8_t>(y);
        for (int pos = 0; pos < width; pos++)
        {
            if (magnitude_row[pos] == 0)//根據平方和反饋原始圖像
            {
                x_row[pos] = 0;
                y_row[pos] = 0;
                continue;
            }
            if (x_row[pos] < 0)//全部翻轉？
            {
                x_row[pos] *= -1;
                y_row[pos] *= -1;
            }
        }
    }
    integral(scharr_x, temp, integral_x_sq, CV_32F, CV_32F); //X的平方
    integral(scharr_y, temp, integral_y_sq, CV_32F, CV_32F); //Y的平方
    integral(scharr_x.mul(scharr_y), integral_xy, temp, CV_32F, CV_32F);//X乘以Y
}

 if (magnitude_row[pos] == 0)//根據平方和反饋原始圖像
            {
                x_row[pos] = 0;
                y_row[pos] = 0;
                continue;
            }
            if (x_row[pos] < 0)//全部翻轉？
            {
                x_row[pos] *= -1;
                y_row[pos] *= -1;
            }



    if (magnitude_row[pos] == 0)
            {
                x_row[pos] = 0;
                y_row[pos] = 0;
                continue;
            }

 if (x_row[pos] < 0)//全部翻轉？
            {
                x_row[pos] *= -1;
                y_row[pos] *= -1;
            }

計算連通性，這裏實際上爲計算“區域一致性”在做準備

//連貫性計算
// Change coherence orientation edge_nums
// depend on width height integral_edges integral_x_sq integral_y_sq integral_xy
void Detect::calCoherence(int window_size)
{
    static constexpr float THRESHOLD_COHERENCE = 0.9f;
    int right_col, left_col, top_row, bottom_row;
    float xy, x_sq, y_sq, d, rect_area;
    const float THRESHOLD_AREA = float(window_size * window_size) * 0.42f;
    Size new_size(width / window_size, height / window_size);
    coherence = Mat(new_size, CV_8U), orientation = Mat(new_size, CV_32F), edge_nums = Mat(new_size, CV_32F);
    float top_left, top_right, bottom_left, bottom_right;
    int integral_cols = width + 1;
    const auto *edges_ptr = integral_edges.ptr<float_t>(), *x_sq_ptr = integral_x_sq.ptr<float_t>(), *y_sq_ptr = integral_y_sq.ptr<float_t>(), *xy_ptr = integral_xy.ptr<float_t>();
    for (int y = 0; y < new_size.height; y++)
    {
        auto *coherence_row = coherence.ptr<uint8_t>(y);
        auto *orientation_row = orientation.ptr<float_t>(y);
        auto *edge_nums_row = edge_nums.ptr<float_t>(y);
        if (y * window_size >= height)
        {
            continue;
        }
        top_row = y * window_size;
        bottom_row = min(height, (y + 1) * window_size);
        for (int pos = 0; pos < new_size.width; pos++)
        {
            // then calculate the column locations of the rectangle and set them to -1
            // if they are outside the matrix bounds
            if (pos * window_size >= width)
            {
                continue;
            }
            left_col = pos * window_size;
            right_col = min(width, (pos + 1) * window_size);
            //we had an integral image to count non-zero elements
            CALCULATE_SUM(edges_ptr, rect_area)
            if (rect_area < THRESHOLD_AREA)
            {
                // smooth region
                coherence_row[pos] = 0;
                continue;
            }
            CALCULATE_SUM(x_sq_ptr, x_sq)
            CALCULATE_SUM(y_sq_ptr, y_sq)
            CALCULATE_SUM(xy_ptr, xy)
            // get the values of the rectangle corners from the integral image - 0 if outside bounds
            d = sqrt((x_sq - y_sq) * (x_sq - y_sq) + 4 * xy * xy) / (x_sq + y_sq);
            if (d > THRESHOLD_COHERENCE)  //根據梯度一致性的條碼檢測
            {
                coherence_row[pos] = 255;
                orientation_row[pos] = atan2(x_sq - y_sq, 2 * xy) / 2.0f;
                edge_nums_row[pos] = rect_area;
            }
            else
            {
                coherence_row[pos] = 0;
            }
        }
    }
}

專用腐蝕，我是第一次見到：

關於腐蝕內核，是人爲確定的

255, 0, 0,

0, 0, 0,

0, 0, 255

0, 0, 255,

0, 0, 0,

255, 0, 0

0, 0, 0,

255, 0, 255,

0, 0, 0

0, 255, 0,

0, 0, 0,

0, 255, 0

 
inline const std::array<Mat, 4> &getStructuringElement()
{
    static const std::array<Mat, 4> structuringElement{
            Mat_<uint8_t>{{3,   3},
                          {255, 0, 0, 0, 0, 0, 0, 0, 255}}, Mat_<uint8_t>{{3, 3},
                                                                          {0, 0, 255, 0, 0, 0, 255, 0, 0}},
            Mat_<uint8_t>{{3, 3},
                          {0, 0, 0, 255, 0, 255, 0, 0, 0}}, Mat_<uint8_t>{{3, 3},
                                                                          {0, 255, 0, 0, 0, 0, 0, 255, 0}}};
    return structuringElement;
}

最後把這幾個核加起來

void Detect::barcodeErode()
{
    static const std::array<Mat, 4> &structuringElement = getStructuringElement();
    Mat m0, m1, m2, m3;
    dilate(coherence, m0, structuringElement[0]);
    dilate(coherence, m1, structuringElement[1]);
    dilate(coherence, m2, structuringElement[2]);
    dilate(coherence, m3, structuringElement[3]);
    int sum;
    for (int y = 0; y < coherence.rows; y++)
    {
        auto coherence_row = coherence.ptr<uint8_t>(y);
        auto m0_row = m0.ptr<uint8_t>(y);
        auto m1_row = m1.ptr<uint8_t>(y);
        auto m2_row = m2.ptr<uint8_t>(y);
        auto m3_row = m3.ptr<uint8_t>(y);
        for (int pos = 0; pos < coherence.cols; pos++)
        {
            if (coherence_row[pos] != 0)
            {
                sum = m0_row[pos] + m1_row[pos] + m2_row[pos] + m3_row[pos];
                //more than 2 group
                coherence_row[pos] = sum > 600 ? 255 : 0;
            }
        }
    }

雖然比較複雜，但是在包邊性上，好像真得有點意思：關於這些東西，我都是可以專門來寫一些的，不要急。

// will change localization_bbox bbox_scores
// will change coherence,
// depend on coherence orientation edge_nums
void Detect::regionGrowing(int window_size)
{
    static constexpr float LOCAL_THRESHOLD_COHERENCE = 0.95f, THRESHOLD_RADIAN =
            PI / 30, LOCAL_RATIO = 0.5f, EXPANSION_FACTOR = 1.2f;
    static constexpr uint THRESHOLD_BLOCK_NUM = 35;
    Point pt_to_grow, pt;                       //point to grow
    float src_value;
    float cur_value;
    float edge_num;
    float rect_orientation;
    float sin_sum, cos_sum;
    uint counter;
    //grow direction
    static constexpr int DIR[8][2] = {{-1, -1},
                                      {0,  -1},
                                      {1,  -1},
                                      {1,  0},
                                      {1,  1},
                                      {0,  1},
                                      {-1, 1},
                                      {-1, 0}};
    vector<Point2f> growingPoints, growingImgPoints;
    for (int y = 0; y < coherence.rows; y++)
    {
        auto *coherence_row = coherence.ptr<uint8_t>(y);
        for (int x = 0; x < coherence.cols; x++)
        {
            if (coherence_row[x] == 0)
            {
                continue;
            }
            // flag
            coherence_row[x] = 0;
            growingPoints.clear();
            growingImgPoints.clear();
            pt = Point(x, y);
            cur_value = orientation.at<float_t>(pt);
            sin_sum = sin(2 * cur_value);
            cos_sum = cos(2 * cur_value);
            counter = 1;
            edge_num = edge_nums.at<float_t>(pt);
            growingPoints.push_back(pt);
            growingImgPoints.push_back(Point(pt));
            while (!growingPoints.empty())
            {
                pt = growingPoints.back();
                growingPoints.pop_back();
                src_value = orientation.at<float_t>(pt);
                //growing in eight directions
                for (auto i : DIR)
                {
                    pt_to_grow = Point(pt.x + i[0], pt.y + i[1]);
                    //check if out of boundary
                    if (!isValidCoord(pt_to_grow, coherence.size()))
                    {
                        continue;
                    }
                    if (coherence.at<uint8_t>(pt_to_grow) == 0)
                    {
                        continue;
                    }
                    cur_value = orientation.at<float_t>(pt_to_grow);
                    if (abs(cur_value - src_value) < THRESHOLD_RADIAN ||
                        abs(cur_value - src_value) > PI - THRESHOLD_RADIAN)
                    {
                        coherence.at<uint8_t>(pt_to_grow) = 0;
                        sin_sum += sin(2 * cur_value);
                        cos_sum += cos(2 * cur_value);
                        counter += 1;
                        edge_num += edge_nums.at<float_t>(pt_to_grow);
                        growingPoints.push_back(pt_to_grow);                 //push next point to grow back to stack
                        growingImgPoints.push_back(pt_to_grow);
                    }
                }
            }
            //minimum block num
            if (counter < THRESHOLD_BLOCK_NUM)
            {
                continue;
            }
            float local_coherence = (sin_sum * sin_sum + cos_sum * cos_sum) / static_cast<float>(counter * counter);
            // minimum local gradient orientation_arg coherence_arg
            if (local_coherence < LOCAL_THRESHOLD_COHERENCE)
            {
                continue;
            }
            RotatedRect minRect = minAreaRect(growingImgPoints);
            if (edge_num < minRect.size.area() * float(window_size * window_size) * LOCAL_RATIO ||
                static_cast<float>(counter) < minRect.size.area() * LOCAL_RATIO)
            {
                continue;
            }
            const float local_orientation = atan2(cos_sum, sin_sum) / 2.0f;
            // only orientation_arg is approximately equal to the rectangle orientation_arg
            rect_orientation = (minRect.angle) * PI / 180.f;
            if (minRect.size.width < minRect.size.height)
            {
                rect_orientation += (rect_orientation <= 0.f ? HALF_PI : -HALF_PI);
                std::swap(minRect.size.width, minRect.size.height);
            }
            if (abs(local_orientation - rect_orientation) > THRESHOLD_RADIAN &&
                abs(local_orientation - rect_orientation) < PI - THRESHOLD_RADIAN)
            {
                continue;
            }
            minRect.angle = local_orientation * 180.f / PI;
            minRect.size.width *= static_cast<float>(window_size) * EXPANSION_FACTOR;
            minRect.size.height *= static_cast<float>(window_size);
            minRect.center.x = (minRect.center.x + 0.5f) * static_cast<float>(window_size);
            minRect.center.y = (minRect.center.y + 0.5f) * static_cast<float>(window_size);
            localization_bbox.push_back(minRect);
            bbox_scores.push_back(edge_num);
        }
    }

關於區域增長的一些代碼，我也是一直比較缺乏有效積累。所以還是首先要從實驗開始，對比現有傳統方法和這裏的比較複雜的方法的差異，而後找到這裏存在的問題，最後獲得較好的收穫。

bool Detect::computeTransformationPoints()
{
    bbox_indices.clear();
    transformation_points.clear();
    transformation_points.reserve(bbox_indices.size());
    RotatedRect rect;
    Point2f temp[4];
    const float THRESHOLD_SCORE = float(width * height) / 300.f;
    dnn::NMSBoxes(localization_bbox, bbox_scores, THRESHOLD_SCORE, 0.1f, bbox_indices);//非最大值抑制
    for (const auto &bbox_index : bbox_indices)
    {
        rect = localization_bbox[bbox_index];
        if (purpose == ZOOMING)
        {
            rect.center /= coeff_expansion;
            rect.size.height /= static_cast<float>(coeff_expansion);
            rect.size.width /= static_cast<float>(coeff_expansion);
        }
        else if (purpose == SHRINKING)
        {
            rect.center *= coeff_expansion;
            rect.size.height *= static_cast<float>(coeff_expansion);
            rect.size.width *= static_cast<float>(coeff_expansion);
        }
        rect.points(temp);
        transformation_points.emplace_back(vector<Point2f>{temp[0], temp[1], temp[2], temp[3]});
    }
    return !transformation_points.empty();
}

這個函數主要實現的就是一個非最大值的抑制，並且將結果填入到可以被調用的地方。

以目標檢測爲例：目標檢測的過程中在同一目標的位置上會產生大量的候選框，這些候選框相互之間可能會有重疊，此時我們需要利用非極大值抑制找到最佳的目標邊界框，消除冗餘的邊界框。Demo如

左圖是人臉檢測的候選框結果，每個邊界框有一個置信度得分(confidence score)，如果不使用非極大值抑制，就會有多個候選框出現。右圖是使用非極大值抑制之後的結果，符合我們人臉檢測的預期結果。

二、條形碼識別

這裏，直接調用了較爲成熟的“微信超分辨率”模塊，也是值得關注的。

其中：優化的超分辨率策略指的是對較小的條碼進行超分辨率放大，不同大小條碼做不同處理。

解碼算法的核心是基於條碼編碼方式的向量距離計算。因爲條碼的編碼格式爲固定的數個"條空"，所以可以在約定好"條空"間隔之後。將固定的條空讀取爲一個向量，接下來與約定好的編碼格式向匹配，取匹配程度最高的編碼爲結果。
在解碼步驟中，解碼的單位爲一條線，由於噪點，條空的粘連等原因，單獨條碼的解碼結果存在較大的不確定性，因此我們加入了對多條線的掃碼，通過對均勻分佈的掃描與解碼，能夠將二值化過程中的一些不完美之處加以抹除。
具體實現爲：首先在檢測線上尋找起始符，尋找到起始符之後，對前半部分進行讀取與解碼，接着尋找中間分割符，接着對後半部分進行讀取與解碼，最後尋找終結符，並對整個條碼進行首位生成與校驗（此處以EAN13格式舉例，不同格式不盡相同）。最後，每條線都會存在一個解碼結果，所以對其進行投票，只將最高且總比例在有效結果50%以上的結果返回。這一部分我們基於ZXing的算法實現做了一些改進(投票等)。
更換二值化和解碼器指的是在爲解碼成功遍歷使用每種解碼器和二值化嘗試解碼。
在檢測中識別中，都是有內循環的。這個部分我在之前能夠部分實現，但是也缺乏系統梳理。

三、opencv PR經驗

1、儘可能早

2、commit少量多次

3、符合規範

4、充分交流

此外，opencv_extra的提交方法上有特別需要注意的地方，這個問題我應該還沒有解決。

參考資料：

void cv::magnitude	(	InputArray	x,
		InputArray	y,
		OutputArray	magnitude
	)

Python:
	cv.magnitude(	x, y[, magnitude]	) ->	magnitude

#include <opencv2/core.hpp>

Calculates the magnitude of 2D vectors.

The function cv::magnitude calculates the magnitude of 2D vectors formed from the corresponding elements of x and y arrays:

Parameters

x	floating-point array of x-coordinates of the vectors.
y	floating-point array of y-coordinates of the vectors; it must have the same size as x.
magnitude	output array of the same size and type as x.

void cv::integral	(	InputArray	src,
		OutputArray	sum,
		OutputArray	sqsum,
		OutputArray	tilted,
		int	sdepth = `-1`,
		int	sqdepth = `-1`
	)


cv.integral(	src[, sum[, sdepth]]	) ->	sum
cv.integral2(	src[, sum[, sqsum[, sdepth[, sqdepth]]]]	) ->	sum, sqsum
cv.integral3(	src[, sum[, sqsum[, tilted[, sdepth[, sqdepth]]]]]	) ->	sum, sqsum, tilted

從條碼識別中學習到的（來自課程《OpenCV計算機視覺產品實戰2》)

通過撰寫代碼理解向量計算，並用於文本分類

嘗試使用kimi解析體能表格

Hessian矩陣以及在血管增強中的應用——OpenCV實現【2024年更新】

基於vllm，探索產業級llm的部署

【內部項目預研】對信息分類進行探索

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結