facenet 人臉識別原理理解（三）

在前兩篇文章已經介紹了facenet人臉識別代碼的使用和具體操作，但相關的原理還是沒有說，這篇文章進行簡單的講解一下。

1. 原理

在人臉識別中，當我們需要加在圖片數據庫入新的一張人臉圖片時，是怎麼做到識別的呢，難道要我們重新修改網絡最後的輸出函數softmax，添加一個輸出，然後再重新訓練整個網絡？這是不現實的吧！

那我們要怎麼做呢？更多的做法是採用歐氏距離D來衡量這兩張圖片的差距，進行判別是否屬於同一個人。如果兩張人臉圖片越相似，空間距離D越小；差別越大，則空間距離D越大。

我們要設置一個閾值τ，距離小於τ時屬於同一個人臉，距離大於τ時就判斷爲不同的人臉。

2. 網絡結構

下面我從facenet的網絡結構說起。

從網絡中可以看到Batch之後是Deep architeture（Inception ResNet v1），再到L2範數，然後就是嵌入層（embedding），最後就是三元組損失了。

L2：在L2範數前要進行歸一化，要不然數值太大了！

EMBEDDING： 嵌入層，是一種映射關係，從一種特徵空間映射到另外一種特徵空間。

Triplet Loss：三元組損失。三元組由Anchor(A), Negative(N), Positive(P)這三個組成，從字面意思我們就可以猜想，我們想讓Anchor和Positive儘量的靠近（Positive意味這同一個人），Anchor和Negative儘量的遠離（Negative表示不同的人）。

但我們在訓練之前，會有 Anchor和Negative離得近，Anchor和Positive離得遠的情況，如左邊的圖片一樣。經過學習之後轉變爲右邊我們想要的效果：Anchor與Negative離得遠，與Positive離得近。

3. Triplet Loss損失函數

我們想讓它小於等f(A)到f(P)之間的距離小於f(A)到f(P)的距離，或者說是比較它們的範數的平方，得到下式子。

其中a是一個常數，防止把所有的東西都學成 0，如果f總是輸出 0，即 0-0 < 0，這種情況。同時a代表着間隔距離，它拉大了 Anchor 和 Positive 圖片對和 Anchor 與 Negative 圖片對之間的差距。看代碼中常默認爲0.2。

總的損失函數：

左邊爲同類距離，右邊爲不同的類之間的距離。使用梯度下降法優化的過程就是讓類內距離不斷下降，類間距離不斷提升，這樣損失函數才能不斷地縮小。但爲了防止loss小於0，通過代碼裏面我們可以看到會和0比較一下。

def triplet_loss(anchor, positive, negative, alpha):
    """Calculate the triplet loss according to the FaceNet paper
    
    Args:
      anchor: the embeddings for the anchor images.
      positive: the embeddings for the positive images.
      negative: the embeddings for the negative images.
  
    Returns:
      the triplet loss according to the FaceNet paper as a float tensor.
    """
    with tf.variable_scope('triplet_loss'):
        pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
        neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
        
        basic_loss = tf.add(tf.subtract(pos_dist,neg_dist), alpha)
        loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
      
    return loss

center_loss損失函數

關於損失函數，除了上面的triplet_loss函數外，其實facenet.py文件裏面代碼還有一個center_loss損失函數，這是Deep Face 使用的方法。它會在某一個類中找到一個center，讓這類所有樣本的特徵到中心的距離最短，讓同一類別更加緊湊一些。

def center_loss(features, label, alfa, nrof_classes):
    """Center loss based on the paper "A Discriminative Feature Learning Approach for Deep Face Recognition"
       (http://ydwen.github.io/papers/WenECCV16.pdf)
    """
    nrof_features = features.get_shape()[1]
    centers = tf.get_variable('centers', [nrof_classes, nrof_features], dtype=tf.float32,
        initializer=tf.constant_initializer(0), trainable=False)
    label = tf.reshape(label, [-1])
    centers_batch = tf.gather(centers, label)
    diff = (1 - alfa) * (centers_batch - features)
    centers = tf.scatter_sub(centers, label, diff)
    with tf.control_dependencies([centers]):
        loss = tf.reduce_mean(tf.square(features - centers_batch))
    return loss, centers

4. Training 訓練

4.1 Triplet Selection

關於如何選擇這些三元組來形成訓練集，隨機地選擇A、 P和N,遵守A和P是同一個人，而A和N是不同的人這一原則。有個問題就是：如果隨機的選擇它們,那麼這個約束條件(d(A, P) + a ≤ d(A, N))很容易達到，因爲隨機選擇的圖片,A和N比A和P差別很大的概率很大。這樣網絡就很難學到東西，穩健性就很差了。引用吳恩達老師Deep learning 課程中的一段話：

那我們要做的就是儘可能選擇難訓練的三元組A、P和N。難訓練的三元組就是,你的A、P和N的選擇使得d(A, P)很接近d(A, N),即d(A, P) ≈ d(A, N)，這樣你的學習算法會竭盡全力使右邊這個式子變大(d(A, N))，或者使左邊這個式子(d(A, P))變小，這樣左右兩邊至少有一個a的間隔。

同樣在facenet論文中我們也可以看到，我們要選擇最遠的相同人臉 $\LARGE x_{i}^{p}$ (hard positive)，和最近的不同人臉 $\LARGE x_{i}^{n}$ (hard negative)來訓練，如上圖所示。

4.2 Classifier

我們在生成 .pkl 文件的時候用到calssifier.py 文件，裏面用SVM來訓練一個分類器。整體流程大致是：CNN forward 輸出後經L2傳入Embedding層，得到embedding ouput的特徵進行傳給SVM classifier來訓練一個分類器。然後把訓練好的分類器保存爲pickle文件。在執行指令讀取分類器的時候加載SVM分類器模型，自己所用的測試圖片數據就會SVM分類器模型中的類別做對比判斷。.pkl文件保存的參數包括模型（model）和類別（class_names），我們是可以直接讀出來的。

代碼裏面SVM有兩種模式：TRAIN （用來訓練SVM模型）；CLASSIFY （用來加載SVM模型）。看代碼會有更好的理解：

# Run forward pass to calculate embeddings
            print('Calculating features for images')
            nrof_images = len(paths)
            nrof_batches_per_epoch = int(math.ceil(1.0*nrof_images / args.batch_size))
            emb_array = np.zeros((nrof_images, embedding_size))
            for i in range(nrof_batches_per_epoch):
                start_index = i*args.batch_size
                end_index = min((i+1)*args.batch_size, nrof_images)
                paths_batch = paths[start_index:end_index]
                images = facenet.load_data(paths_batch, False, False, args.image_size)
                feed_dict = { images_placeholder:images, phase_train_placeholder:False }
                emb_array[start_index:end_index,:] = sess.run(embeddings, feed_dict=feed_dict)
            
            classifier_filename_exp = os.path.expanduser(args.classifier_filename)

            if (args.mode=='TRAIN'):
                # Train classifier
                print('Training classifier')
                model = SVC(kernel='linear', probability=True)      # use SVM classifier
                model.fit(emb_array, labels)
            
                # Create a list of class names
                class_names = [ cls.name.replace('_', ' ') for cls in dataset]

                # Saving classifier model
                with open(classifier_filename_exp, 'wb') as outfile:
                    pickle.dump((model, class_names), outfile)
                print('Saved classifier model to file "%s"' % classifier_filename_exp)
                
            elif (args.mode=='CLASSIFY'):
                # Classify images
                print('Testing classifier')
                with open(classifier_filename_exp, 'rb') as infile:
                    (model, class_names) = pickle.load(infile)

                print('Loaded classifier model from file "%s"' % classifier_filename_exp)

                predictions = model.predict_proba(emb_array)
                best_class_indices = np.argmax(predictions, axis=1)
                best_class_probabilities = predictions[np.arange(len(best_class_indices)), best_class_indices]
                
                for i in range(len(best_class_indices)):
                    print('%4d  %s: %.3f' % (i, class_names[best_class_indices[i]], best_class_probabilities[i]))
                    
                accuracy = np.mean(np.equal(best_class_indices, labels))
                print('Accuracy: %.3f' % accuracy)

關於對facenet代碼的理解，網上已經有一些比較詳細的講解了，可以讀這篇博客：FaceNet源碼解讀2：史上最全的FaceNet源碼使用方法和講解（二）。

而人臉識別的各種在應用方面還存在各種問題，隨之而來的是新算法的出現，瞭解更改前沿技術可以看這篇文章，個人覺得還是不錯的：格靈深瞳：人臉識別最新進展以及工業級大規模人臉識別實踐探討

上篇文章：facenet 人臉識別庫的搭建和使用方法（二）

參考資料：

https://blog.csdn.net/fire_light_/article/details/79592804

facenet 人臉識別原理理解（三）

1. 原理

2. 網絡結構

3. Triplet Loss損失函數

4. Training 訓練

4.1 Triplet Selection

4.2 Classifier

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

keras 中的SeparableConv2D和DepthwiseConv2D 卷積

COCO2017 數據集分類統計

Ubuntu系統修復GUID格式GPT硬盤的引導

facenet 人臉識別原理理解（三）

實現鏈表反轉的最好理解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結