人臉識別mtcnn原理

人臉檢測，也就是在圖片中找到人臉的位置。輸入是一張可能含有人臉的圖片，輸出是人臉位置的矩形框。

人臉對齊。原始圖片中人臉的姿態、位置可能有較大的區別，爲了之後統一處理，要把人臉“擺正”。爲此，需要檢測人臉中的關鍵點（Landmark），如眼睛的位置、鼻子的位置、嘴巴的位置、臉的輪廓點等。根據這些關鍵點可以使用仿射變換將人臉統一校準，以儘量消除姿勢不同帶來的誤差。

MTCNN網絡結構

MTCNN由三個神經網絡組成，分別是P-Net、R-Net、O-Net。在使用這些網絡之前，首先要將原始圖片縮放到不同尺度，形成一個“圖像金字塔”。接着會對每個尺度的圖片通過神經網絡計算一遍。這樣做的原因在於：原始圖片中的人臉存在不同的尺度，如有的人臉比較大，有的人臉比較小。對於比較小的人臉，可以在放大後的圖片上檢測；對於比較大的人臉，可以在縮小後的圖片上檢測。這樣，就可以在統一的尺度下檢測人臉了。

P-Net

P-Net的輸入是一個寬和高皆爲12像素，同時是3通道的RGB圖像，該網絡要判斷這個12×12的圖像中是否含有人臉，並且給出人臉框和關鍵點的位置。

輸出由三部分組成:

判斷該圖像是否是人臉，輸出向量的形狀爲1×1×2，圖像是否是人臉的概率。
給出框的精確位置，一般稱之爲框迴歸。P-Net輸入的12×12的圖像塊可能並不是完美的人臉框的位置，如有的時候人臉並不正好爲方形，有的時候12×12的圖像塊可能偏左或偏右，因此需要輸出當前框位置相對於完美的人臉框位置的偏移。對於圖像中的框，可以用四個數來表示它的位置：框左上角的橫座標、框左上角的縱座標、框的寬度、框的高度。因此，框迴歸輸出的值是：框左上角的橫座標的相對偏移、框左上角的縱座標的相對偏移、框的寬度的誤差、框的高度的誤差。輸出向量的形狀就是圖中的1×1×4。
給出人臉的5個關鍵點的位置。5個關鍵點分別爲：左眼的位置、右眼的位置、鼻子的位置、左嘴角的位置、右嘴角的位置。每個關鍵點又需要橫座標和縱座標兩維來表示，因此輸出一共是10維（即1×1×10）。

R-Net

對每個P-Net輸出可能爲人臉的區域都放縮到24×24的大小，再輸入到R-Net中，進行進一步判定。

O-Net

進一步把所有得到的區域縮放成48×48的大小，輸入到最後的O-Net中

從P-Net到R-Net，最後再到O-Net，網絡輸入的圖片越來越大，卷積層的通道數越來越多，內部的層數也越來越多，因此它們識別人臉的準確率應該是越來越高的。同時，P-Net的運行速度是最快的，R-Net的速度其次，O-Net的運行速度最慢。之所以要使用三個網絡，是因爲如果一開始直接對圖中的每個區域使用O-Net，速度會非常慢。實際上P-Net先做了一遍過濾，將過濾後的結果再交給R-Net進行過濾，最後將過濾後的結果交給效果最好但速度較慢的O-Net進行判別。這樣在每一步都提前減少了需要判別的數量，有效降低了處理時間。

中心損失 Center Loss

參考論文：A Discriminative Feature Learning Approach for Deep Face Recognition（http://ydwen.github.io/papers/WenECCV16.pdf）

在理想的狀況下，希望“向量表示”之間的距離可以直接反映人臉的相似度：

對於同一個人的兩張人臉圖像，對應的向量之間的歐幾里得距離應該比較小。
對於不同人的兩張人臉圖像，對應的向量之間的歐幾里得距離應該比較大。

在原始的CNN模型中，使用的是Softmax損失。Softmax是類別間的損失，對於人臉來說，每一類就是一個人。儘管使用Softmax損失可以區別出每個人，但其本質上沒有對每一類的向量表示之間的距離做出要求。

中心損失（Center Loss）不直接對距離進行優化，它保留了原有的分類模型，但又爲每個類（人）指定了一個類別中心。同一類的圖像對應的特徵都應該儘量靠近自己的類別中心，不同類的類別中心儘量遠離。

還是設輸入的人臉圖像爲 $x_{i}$ ，該人臉對應的類別爲 $y_{i}$ ，對每個類別都規定一個類別中心，記作 $c_{y_{i}}$ 。希望每個人臉圖像對應的特徵 $f(x_{i})$ 都儘可能接近其中心 $c_{y_{i}}$ 。因此定義中心損失爲

多張圖像的中心損失就是將它們的值加在一起

這是一個非常簡單的定義。不過還有一個問題沒有解決，那就是如何確定每個類別的中心 $c_{y_{i}}$ 呢？從理論上來說，類別 $y_{i}$ 的最佳中心應該是它對應的所有圖片的特徵的平均值。但如果採取這樣的定義，那麼在每一次梯度下降時，都要對所有圖片計算一次 $c_{y_{i}}$ ，計算複雜度就太高了。針對這種情況，不妨近似一處理下，在初始階段，先隨機確定 $c_{y_{i}}$ ，接着在每個batch內，使用 $L_{i}=\frac{1}{2}\left \| f(x_{i}-c_{y_{i}})\right \|^{2}$ 對當前batch內的 $c_{y_{i}}$ 也計算梯度，並使用該梯度更新 $c_{y_{i}}$ 。此外，不能只使用中心損失來訓練分類模型，還需要加入Softmax損失，也就是說，最終的損失由兩部分構成，即 $L=L_{softmax}+\lambda L_{center}$ ，其中λ是一個超參數。

從圖中可以看出，當中心損失的權重λ越大時，生成的特徵就會具有越明顯的“內聚性”。

def center_loss(features, label, alfa, nrof_classes):
    Center loss based on the paper "A Discriminative Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    :param features: 深度卷積網絡提取的特徵，[batch_size, feature_dim]
    :param label: 類別標籤， [batch_size, 1]
    :param alfa: 
    :param nrof_classes: 類別總數， int
    :return:
    nrof_features = features.get_shape()[1]
    centers = tf.get_variable('centers', [nrof_classes, nrof_features], dtype=tf.float32,
        initializer=tf.constant_initializer(0), trainable=False)
    label = tf.reshape(label, [-1])
    centers_batch = tf.gather(centers, label)
    diff = (1 - alfa) * (centers_batch - features)  # 計算梯度
    centers = tf.scatter_sub(centers, label, diff)  # 更新類別中心
    loss = tf.reduce_mean(tf.square(features - centers_batch))
    return loss, centers

三元組損失 Triplet Loss

每次都在訓練數據中取出三張人臉圖像，第一張圖像記爲 $x_{i}^{a}$ ，第二張圖像記爲 $x_{i}^{p}$ ，第三張圖像記爲 $x_{i}^{n}$ 。在這樣一個“三元組”中， $x_{i}^{a}$ 和 $x_{i}^{p}$ 對應的是同一個人的圖像，而 $x_{i}^{n}$ 是另外一個不同的人的人臉圖像。因此，距離 $\left \| f(x_{i}^{a})-f(x_{i}^{p}) \right \|_{2}$ 應該較小，而距離 $\left \| f(x_{i}^{a})-f(x_{i}^{n}) \right \|_{2}$ 應該較大。嚴格來說，三元組損失要求下面的式子成立

即相同人臉間的距離平方至少要比不同人臉間的距離平方小 $\alpha$ ，據此，設計損失函數爲

這樣的話，當三元組的距離滿足時，不產生任何損失，此時 $L_{i}=0$ 。當距離不滿足上述等式時，就會有值爲的損失。此外，在訓練時會固定 $\left \| f(x) \right \|=1$ ，以保證特徵不會無限地“遠離”。

三元組損失直接對距離進行優化，因此可以解決人臉的特徵表示問題。但是在訓練過程中，三元組的選擇非常地有技巧性。如果每次都是隨機選擇三元組，雖然模型可以正確地收斂，但是並不能達到最好的性能。如果加入“難例挖掘”，即每次都選擇最難分辨的三元組進行訓練，模型又往往不能正確地收斂。對此，又提出每次都選取那些“半難”（Semi-hard）的數據進行訓練，讓模型在可以收斂的同時也保持良好的性能。此外，使用三元組損失訓練人臉模型通常還需要非常大的人臉數據集，才能取得較好的效果。

def triplet_loss(anchor, positive, negative, alpha):
    """Calculate the triplet loss according to the FaceNet paper
    
    Args:
      anchor: the embeddings for the anchor images.
      positive: the embeddings for the positive images.
      negative: the embeddings for the negative images.
  
    Returns:
      the triplet loss according to the FaceNet paper as a float tensor.
    """
    with tf.variable_scope('triplet_loss'):
        pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
        neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
        
        basic_loss = tf.add(tf.subtract(pos_dist,neg_dist), alpha)
        loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
      
    return loss

def select_triplets(embeddings, nrof_images_per_class, image_paths, people_per_batch, alpha):
    """
    Select the triplets for training
    :param embeddings: 深度神經網絡提取的圖片特徵向量 [?, embedding_dim]
    :param nrof_images_per_class: list,每個人的圖片數量列表
    :param image_paths:
    :param people_per_batch:  每個batch包含的類別（人）數量
    :param alpha:
    :return:
    """
    trip_idx = 0
    emb_start_idx = 0
    num_trips = 0
    triplets = []

    for i in range(people_per_batch):
        nrof_images = int(nrof_images_per_class[i])
        for j in range(1,nrof_images):
            a_idx = emb_start_idx + j - 1 # anchor index
            neg_dists_sqr = np.sum(np.square(embeddings[a_idx] - embeddings), 1) # 計算anchor 圖片和其他人臉的距離
            for pair in range(j, nrof_images): 
                p_idx = emb_start_idx + pair # positive index
                pos_dist_sqr = np.sum(np.square(embeddings[a_idx]-embeddings[p_idx])) # 計算anchor 和positive人臉距離
                neg_dists_sqr[emb_start_idx:emb_start_idx+nrof_images] = np.NaN  # 將anchor人臉與同類的人臉距離mask爲Nan
                all_neg = np.where(neg_dists_sqr-pos_dist_sqr<alpha)[0] # 篩選出 不同人臉之間的距離比相同人臉之間的距離大alpha的 負例圖片
                nrof_random_negs = all_neg.shape[0]
                if nrof_random_negs>0:
                    rnd_idx = np.random.randint(nrof_random_negs) # 從滿足要求的負例集中隨機選取一張圖片作爲負例
                    n_idx = all_neg[rnd_idx]
                    triplets.append((image_paths[a_idx], image_paths[p_idx], image_paths[n_idx]))
                    trip_idx += 1

                num_trips += 1

        emb_start_idx += nrof_images

    np.random.shuffle(triplets)
    return triplets, num_trips, len(triplets)

def train(args, sess, dataset, epoch, image_paths_placeholder, labels_placeholder, labels_batch,
          batch_size_placeholder, learning_rate_placeholder, phase_train_placeholder, enqueue_op, input_queue, global_step, 
          embeddings, loss, train_op, summary_op, summary_writer, learning_rate_schedule_file,
          embedding_size, anchor, positive, negative, triplet_loss):
    batch_number = 0
    
    if args.learning_rate>0.0:
        lr = args.learning_rate
    else:
        lr = facenet.get_learning_rate_from_file(learning_rate_schedule_file, epoch)
    while batch_number < args.epoch_size:
        # 從總數據中隨機選擇people_per_batch*images_per_person 張照片，同類的照片放在一起
        image_paths, num_per_class = sample_people(dataset, args.people_per_batch, args.images_per_person)
        
        print('Running forward pass on sampled images: ', end='')
        start_time = time.time()
        nrof_examples = args.people_per_batch * args.images_per_person
        labels_array = np.reshape(np.arange(nrof_examples),(-1,3))
        image_paths_array = np.reshape(np.expand_dims(np.array(image_paths),1), (-1,3))
        # 將people_per_batch*images_per_person 張照片入隊列
        sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array})
        emb_array = np.zeros((nrof_examples, embedding_size))
        nrof_batches = int(np.ceil(nrof_examples / args.batch_size))
        # 計算people_per_batch*images_per_person 張照片的向量表示， 計算的同時出隊列，計算完成後，隊列爲空
        for i in range(nrof_batches):
            batch_size = min(nrof_examples-i*args.batch_size, args.batch_size)
            emb, lab = sess.run([embeddings, labels_batch], feed_dict={batch_size_placeholder: batch_size, 
                learning_rate_placeholder: lr, phase_train_placeholder: True})
            emb_array[lab,:] = emb
        print('%.3f' % (time.time()-start_time))

        # 選擇出“半難的”數據進行訓練
        print('Selecting suitable triplets for training')
        triplets, nrof_random_negs, nrof_triplets = select_triplets(emb_array, num_per_class, 
            image_paths, args.people_per_batch, args.alpha)
        selection_time = time.time() - start_time
        print('(nrof_random_negs, nrof_triplets) = (%d, %d): time=%.3f seconds' % 
            (nrof_random_negs, nrof_triplets, selection_time))

        # Perform training on the selected triplets
        nrof_batches = int(np.ceil(nrof_triplets*3/args.batch_size))
        triplet_paths = list(itertools.chain(*triplets))
        labels_array = np.reshape(np.arange(len(triplet_paths)),(-1,3))
        triplet_paths_array = np.reshape(np.expand_dims(np.array(triplet_paths),1), (-1,3))
        # 將“半難的”數據入隊列
        sess.run(enqueue_op, {image_paths_placeholder: triplet_paths_array, labels_placeholder: labels_array})
        nrof_examples = len(triplet_paths)
        train_time = 0
        i = 0
        emb_array = np.zeros((nrof_examples, embedding_size))
        loss_array = np.zeros((nrof_triplets,))
        # 按批次訓練
        while i < nrof_batches:
            start_time = time.time()
            batch_size = min(nrof_examples-i*args.batch_size, args.batch_size)
            feed_dict = {batch_size_placeholder: batch_size, learning_rate_placeholder: lr, phase_train_placeholder: True}
            err, _, step, emb, lab = sess.run([loss, train_op, global_step, embeddings, labels_batch], feed_dict=feed_dict)
            emb_array[lab,:] = emb
            loss_array[i] = err
            duration = time.time() - start_time
            print('Epoch: [%d][%d/%d]\tTime %.3f\tLoss %2.3f' %
                  (epoch, batch_number+1, args.epoch_size, duration, err))
            batch_number += 1
            i += 1
            train_time += duration
            
        # Add validation loss and accuracy to summary
        summary = tf.Summary()
        #pylint: disable=maybe-no-member
        summary.value.add(tag='time/selection', simple_value=selection_time)
        summary_writer.add_summary(summary, step)
    return step

人臉識別mtcnn原理

MTCNN網絡結構

中心損失 Center Loss

三元組損失 Triplet Loss

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

基於用戶的協同過濾算法(UserCF)

Q Learning 和SARSA算法

樸素貝葉斯算法(Naive Bayes) 原理總結

論文：Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

基於物品的協同過濾算法(ItemCF)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結