Mahout中關於MultiLayer Perceptron模塊的源碼解析

Mahout中關於MultiLayer Perceptron模塊的源碼解析

前段時間學習NN時使用到了BPNN,考慮到模型的分佈式擴展,我想到使用Mahout的MultiLayer Perceptron(mlp)來實現。於是下載研讀了Mahout中該模塊的源碼

,這會兒希望能把學習筆記記錄下來,一來怕自己後面遺忘,二來與大夥兒一同學習。

這裏我使用的Mahout版本是0.10,直接因爲Apache貌似在Mahout0.11版本中刪去了mlp板塊(反正我是沒找到。。。。)

模塊路徑:mr.src.main.java.org.apache.mahout.classifer.mlp

該模塊路徑下存放有5個.java文件,分別是:

</pre>        <img src="" alt="" /></p><p><span style="white-space:pre">	</span>其中每一個文件下都定義了一個以該文件名命名的類,MultilayerPerceptron是NeuralNetwork的子類,後者是整個NN模塊的核心,</p><p>NeuralNetworkFunctions專門定義實現了NN模塊中用到的數學計算公式;最後兩個文件則是分別封裝了NN模塊的訓練過程(TrainMultilayerPerceptron)</p><p>和預測過程(RunMultilayerPerceptron),這裏我們主要學習NeuralNetwork類及其實現</p><p><span style="white-space:pre">	NeuralNetwork</span>類中,包含了多個參量和成員方法,這裏列舉其中一些主要的:</p><p></p><p> <span style="white-space:pre">	</span>Mahout神經網絡模塊主要成員變量及獲取/配置方法</p><p><span style="font-size:14px;"></span><table border="1" cellspacing="0" cellpadding="0"> <tbody><tr>  <td valign="top"><p>成員變量</p></td>  <td valign="top"><p>獲取方法</p></td>  <td valign="top"><p>配置方法</p></td> </tr> <tr>  <td valign="top"><p>LearningRate</p></td>  <td valign="top"><p>getLearningRate()</p></td>  <td valign="top"><p>setLearningRate()</p></td> </tr> <tr>  <td valign="top"><p>MomentumWeight</p></td>  <td valign="top"><p>getMomentumWeight()</p></td>  <td valign="top"><p>setMomentumWeight()</p></td> </tr> <tr>  <td valign="top"><p>RegularizationWeight</p></td>  <td valign="top"><p>getRegularizationWeight()</p></td>  <td valign="top"><p>setRegularizationWeight()</p></td> </tr> <tr>  <td valign="top"><p>TrainingMethod</p></td>  <td valign="top"><p>getTrainingMethod()</p></td>  <td valign="top"><p>setTrainingMethod()</p></td> </tr> <tr>  <td valign="top"><p>CostFunction</p></td>  <td valign="top"><p>getCostFunction()</p></td>  <td valign="top"><p>setCostFunction()</p></td> </tr></tbody></table></p><p><span style="white-space:pre">	</span>Mahout神經網絡模塊主要成員方法及描述</p><div align="center"></div><table border="1" cellspacing="0" cellpadding="0" width="623"><tbody><tr><td valign="top"><p>成員方法</p></td><td valign="top"><p>描述</p></td></tr><tr><td valign="top"><p>addLayer(int size, boolean isFinalLayer, String squashingFuctionName)</p></td><td valign="top"><p>爲神經網絡模型添加新的網絡層,其中參數“size”表示當前層下的神經元個數;參數“isFinalLayer”表示是否當前層級爲神經網絡的最後一層;參數“squashingFunctionName”則表示當前層級下的激勵函數(又稱擠壓函數)</p></td></tr><tr><td valign="top"><p>trainOnline(Vector trainingInstance)</p></td><td valign="top"><p>在線訓練模型,輸入參數爲輸入特徵與實際輸出特徵形成的向量。</p></td></tr><tr><td valign="top"><p>getOutput(Vectoe instance)</p></td><td valign="top"><p>計算模型輸出,輸入參數爲輸入特徵與實際輸出特徵形成的向量</p></td></tr><tr><td valign="top"><p>setModelPath(String modelPath)</p></td><td valign="top"><p>設置模型路徑爲:modelPath</p></td></tr><tr><td valign="top"><p>writeModelToFile()</p></td><td valign="top"><p>將模型寫入已指定的modelPath下</p><div></div></td></tr></tbody></table><p><span style="white-space:pre">	</span>trainOnline()方法實現了模型訓練過程,看一下它的內部:</p><p><pre name="code" class="java">  public void trainOnline(Vector trainingInstance) {
    Matrix[] matrices = trainByInstance(trainingInstance);
    updateWeightMatrices(matrices);
  }
即先執行trainByInstance(),將結果傳入matrices,再執行updateWeightMatrices(matrices),下面來到trainByInstance:

 public Matrix[] trainByInstance(Vector trainingInstance) {
    // validate training instance
    int inputDimension = layerSizeList.get(0) - 1;
    int outputDimension = layerSizeList.get(this.layerSizeList.size() - 1);
    Preconditions.checkArgument(inputDimension + outputDimension == trainingInstance.size(),
        String.format("The dimension of training instance is %d, but requires %d.", trainingInstance.size(),
            inputDimension + outputDimension));


    if (trainingMethod.equals(TrainingMethod.GRADIENT_DESCENT)) {
      return trainByInstanceGradientDescent(trainingInstance);
    }
    throw new IllegalArgumentException("Training method is not supported.");
  }
在這個方法中,輸入參數trainingInstance 的維數等於輸入特徵維數與輸出特徵維數之和,函數接收參數後,首先根據類成員變量layerSizeList找到神經

網絡中第一層(輸入層)節點數與最後一層(輸出層)節點數,此後checkArgument()執行判斷確保輸入特徵與輸出特徵之和等於傳入的參數特徵數,經過這一步驟後的訓練樣本,以GRADIENT_DESCENT訓練方法被模型訓練(目前該模塊僅支持這一種訓練方法),返回trainByInstanceGradientDescent():

private Matrix[] trainByInstanceGradientDescent(Vector trainingInstance) {
    int inputDimension = layerSizeList.get(0) - 1;

    Vector inputInstance = new DenseVector(layerSizeList.get(0));
    inputInstance.set(0, 1); // add bias
    for (int i = 0; i < inputDimension; ++i) {
      inputInstance.set(i + 1, trainingInstance.get(i));
    }

    Vector labels =
        trainingInstance.viewPart(inputInstance.size() - 1, trainingInstance.size() - inputInstance.size() + 1);

    // initialize weight update matrices
    Matrix[] weightUpdateMatrices = new Matrix[weightMatrixList.size()];
    for (int m = 0; m < weightUpdateMatrices.length; ++m) {
      weightUpdateMatrices[m] =
          new DenseMatrix(weightMatrixList.get(m).rowSize(), weightMatrixList.get(m).columnSize());
    }

    List<Vector> internalResults = getOutputInternal(inputInstance);

    Vector deltaVec = new DenseVector(layerSizeList.get(layerSizeList.size() - 1));
    Vector output = internalResults.get(internalResults.size() - 1);

    final DoubleFunction derivativeSquashingFunction =
        NeuralNetworkFunctions.getDerivativeDoubleFunction(squashingFunctionList.get(squashingFunctionList.size() - 1));

    final DoubleDoubleFunction costFunction =
        NeuralNetworkFunctions.getDerivativeDoubleDoubleFunction(costFunctionName);

    Matrix lastWeightMatrix = weightMatrixList.get(weightMatrixList.size() - 1);

    for (int i = 0; i < deltaVec.size(); ++i) {
      double costFuncDerivative = costFunction.apply(labels.get(i), output.get(i + 1));
      // Add regularization
      costFuncDerivative += regularizationWeight * lastWeightMatrix.viewRow(i).zSum();
      deltaVec.set(i, costFuncDerivative);
      deltaVec.set(i, deltaVec.get(i) * derivativeSquashingFunction.apply(output.get(i + 1)));
    }

    // Start from previous layer of output layer
    for (int layer = layerSizeList.size() - 2; layer >= 0; --layer) {
      deltaVec = backPropagate(layer, deltaVec, internalResults, weightUpdateMatrices[layer]);
    }

    prevWeightUpdatesList = Arrays.asList(weightUpdateMatrices);

    return weightUpdateMatrices;
  }
這一部分代碼相對較多,我們逐塊分析:

首先該方法解析輸入參量,將輸入特徵和輸出特徵分離後分別寫入inputInstance和labels,之後初始化一個weightUpdateMatrices,

然後通過getOutputInternal()方法獲得輸出,將輸出值的輸入特徵和輸出特徵分別寫入deltaVal 和output;分別獲取當前網絡層級下的

derivativeSquashingFunction(新建NN實例時就定義好了的)、costFuction(新建NN實例時就定義好了的)以及lastWeightMatrices(初始化weightUpdateMatrices時定義的)

在這之後,逐個依據每一位的labels和output計算costFuncDerivative(默認爲MSE),再分別考慮regularizationWeight和SquashingFuction,得到最終的deltaVec.

完成這一步後,將此時得到的deltaVec與之前的各層網絡做誤差反向傳播(backPropagate()方法),以此更新deltaVec,最終返回跟新後的weightUpdataMatrices

  private Vector backPropagate(int currentLayerIndex, Vector nextLayerDelta,
                               List<Vector> outputCache, Matrix weightUpdateMatrix) {

    // Get layer related information
    final DoubleFunction derivativeSquashingFunction =
        NeuralNetworkFunctions.getDerivativeDoubleFunction(squashingFunctionList.get(currentLayerIndex));
    Vector curLayerOutput = outputCache.get(currentLayerIndex);
    Matrix weightMatrix = weightMatrixList.get(currentLayerIndex);
    Matrix prevWeightMatrix = prevWeightUpdatesList.get(currentLayerIndex);

    // Next layer is not output layer, remove the delta of bias neuron
    if (currentLayerIndex != layerSizeList.size() - 2) {
      nextLayerDelta = nextLayerDelta.viewPart(1, nextLayerDelta.size() - 1);
    }

    Vector delta = weightMatrix.transpose().times(nextLayerDelta);

    delta = delta.assign(curLayerOutput, new DoubleDoubleFunction() {
      @Override
      public double apply(double deltaElem, double curLayerOutputElem) {
        return deltaElem * derivativeSquashingFunction.apply(curLayerOutputElem);
      }
    });

    // Update weights
    for (int i = 0; i < weightUpdateMatrix.rowSize(); ++i) {
      for (int j = 0; j < weightUpdateMatrix.columnSize(); ++j) {
        weightUpdateMatrix.set(i, j, -learningRate * nextLayerDelta.get(i) *
            curLayerOutput.get(j) + this.momentumWeight * prevWeightMatrix.get(i, j));
      }
    }

    return delta;
  }

以上爲mlp中核心算法的實現,其中上文未提及的一些方法實現例如如何計算costFuncDerivative、如何使用SquashingFuction以及如何backPropagate等,大家可以查閱

NN相關書籍資料,這裏的實現與書籍上介紹的算法完全一致,因此不再贅述。這裏想要說明的是關於模型的序列化和反序列化過程,因爲這一步驟是一個模型進行分佈式擴展的必要步驟:

在mlp模塊中,模型的序列化和反序列化通過write()和readFields()方法來實現,源碼如下:

 public void write(DataOutput output) throws IOException {
    // Write model type
    WritableUtils.writeString(output, modelType);
    // Write learning rate
    output.writeDouble(learningRate);
    // Write model path
    if (modelPath != null) {
      WritableUtils.writeString(output, modelPath);
    } else {
      WritableUtils.writeString(output, "null");
    }

    // Write regularization weight
    output.writeDouble(regularizationWeight);
    // Write momentum weight
    output.writeDouble(momentumWeight);
    // Write cost function
    WritableUtils.writeString(output, costFunctionName);

    // Write layer size list
    output.writeInt(layerSizeList.size());
    for (Integer aLayerSizeList : layerSizeList) {
      output.writeInt(aLayerSizeList);
    }

    WritableUtils.writeEnum(output, trainingMethod);

    // Write squashing functions
    output.writeInt(squashingFunctionList.size());
    for (String aSquashingFunctionList : squashingFunctionList) {
      WritableUtils.writeString(output, aSquashingFunctionList);
    }

    // Write weight matrices
    output.writeInt(this.weightMatrixList.size());
    for (Matrix aWeightMatrixList : weightMatrixList) {
      MatrixWritable.writeMatrix(output, aWeightMatrixList);
    }
  }

  /**
   * Read the fields of the model from input.
   * 
   * @param input The input instance.
   * @throws IOException
   */
  public void readFields(DataInput input) throws IOException {
    // Read model type
    modelType = WritableUtils.readString(input);
    if (!modelType.equals(this.getClass().getSimpleName())) {
      throw new IllegalArgumentException("The specified location does not contains the valid NeuralNetwork model.");
    }
    // Read learning rate
    learningRate = input.readDouble();
    // Read model path
    modelPath = WritableUtils.readString(input);
    if (modelPath.equals("null")) {
      modelPath = null;
    }

    // Read regularization weight
    regularizationWeight = input.readDouble();
    // Read momentum weight
    momentumWeight = input.readDouble();

    // Read cost function
    costFunctionName = WritableUtils.readString(input);

    // Read layer size list
    int numLayers = input.readInt();
    layerSizeList = new ArrayList<>();
    for (int i = 0; i < numLayers; i++) {
      layerSizeList.add(input.readInt());
    }

    trainingMethod = WritableUtils.readEnum(input, TrainingMethod.class);

    // Read squash functions
    int squashingFunctionSize = input.readInt();
    squashingFunctionList = new ArrayList<>();
    for (int i = 0; i < squashingFunctionSize; i++) {
      squashingFunctionList.add(WritableUtils.readString(input));
    }

    // Read weights and construct matrices of previous updates
    int numOfMatrices = input.readInt();
    weightMatrixList = new ArrayList<>();
    prevWeightUpdatesList = new ArrayList<>();
    for (int i = 0; i < numOfMatrices; i++) {
      Matrix matrix = MatrixWritable.readMatrix(input);
      weightMatrixList.add(matrix);
      prevWeightUpdatesList.add(new DenseMatrix(matrix.rowSize(), matrix.columnSize()));
    }
  }




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章