Spark MLlib LinearRegression線性迴歸算法源碼解析

線性迴歸

一元線性迴歸
- $h_{θ} (x) = θ_{0} + θ_{1} x$ ——————–1
多元線性迴歸
- $h_{θ} (x) = \sum_{i = 1}^{m} θ_{i} x_{i} = θ^{T} X$ —————–2
損失函數
- $J (θ) = 1 / 2 \sum_{i = 1}^{m} (h_{θ} (x^{i}) - y^{i})^{2}$ —————3
- 1/2 是爲了求導時係數爲1，平方里是真實值減去估計值
- 我們的目的就是求其最小值
最小二乘法要求較爲苛刻，求矩陣的逆比較慢，要求 $X$ 是滿秩

梯度下降法

梯度下降法
- 我們的目的是爲了求 $J (θ)$ 的極小值
- 梯度方向有 $J (θ)$ 對 $θ$ 的偏導數確定，由於求的是極小值，因此梯度方向是偏導數的反方向，如果在區間內是凸函數，就是說梯度在更新 $θ$ 時，在梯度是負數時增加 $θ$
- $θ_{j} := θ_{j} - α \frac{\partial}{\partial θ_{j}} J (θ)$ ——————-4
- 其中 $α$ 爲學習速率，過大容易越過最小值，過小容易造成迭代次數過多，收斂變慢
- 如果只有一條樣本
- $\frac{\partial}{\partial θ_{j}} J (θ) = (h_{θ} (x) - y) \frac{\partial}{\partial θ_{j}} (\sum_{i = 0}^{m} θ_{i} x_{i} - y) = (h_{θ} (x) - y) x_{j}$ —————–5
- 當數量不唯一時，將(5)帶入(3)求偏導那麼每個參數 $θ_{j}$ 沿梯度方向變化如下：
- $θ_{j} := θ_{j} + α \sum_{i = 0}^{m} (y^{i} - h_{θ} (x^{i})) x_{j}^{i}$ ——————————6
- 這裏 $m$ 代表的時所有樣本
隨機梯度下降
- 梯度下降法會訓練所有樣本然後更新梯度，每次迭代複雜度O(mn)，樣本很大我們採用隨機梯度下降
- 每讀取一條樣本，對 $θ^{T}$ 進行更新
- 雖然速度快但是會在最小值（極小？）附近震盪，造成永遠不能收斂
- 爲減少計算複雜度，對於 $θ^{T}$ 的變化程度設置一個閾值

源碼分析

MLlib源碼分析

建立線性迴歸
org/apache/spark/mllib/regression/LinearRegression.scala

object LinearRegressionWithSGD //是LogisticRegressionWithSGD的伴生對象（可以理解爲單例）
//主要定義train方法，傳遞訓練參數和RDD給run方法，是LogisticRegressionWithSGD類的入口

//基於隨機梯度下降法的線性迴歸模型
//損失函數f(weights) = 1/n ||A weights-y||^2^
class LinearRegressionWithSGD private[mllib] (//在mllib包下的
    private var stepSize: Double,//迭代步長
    private var numIterations: Int,//迭代次數
    private var regParam: Double,//正則化參數
    private var miniBatchFraction: Double）//迭代參與樣本的比例
  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable
  //採用最小平方損失函數的梯度下降法
  private val gradient = new LeastSquaresGradient()
  //採用簡單梯度更新，無正則化
  private val updater = new SimpleUpdater()
  @Since("0.8.0")
  //新建梯度優化計算方法
  override val optimizer = new GradientDescent(gradient, updater)
    .setStepSize(stepSize)
    .setNumIterations(numIterations)
    .setRegParam(regParam)
    .setMiniBatchFraction(miniBatchFraction)

訓練的run方法
org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala

  /**
   * Run the algorithm with the configured parameters on an input RDD
   * of LabeledPoint entries starting from the initial weights provided.
   *
   */
  @Since("1.0.0")
  def run(input: RDD[LabeledPoint], initialWeights: Vector): M = {
    //特徵維度
    //求RDD中features的數量
    if (numFeatures < 0) {
      numFeatures = input.map(_.features.size).first()
    }
    //輸入樣本檢測
    if (input.getStorageLevel == StorageLevel.NONE) {
      logWarning("The input data is not directly cached, which may hurt performance if its"
        + " parent RDDs are also uncached.")
    }

    // Check the data properties before running the optimizer
    if (validateData && !validators.forall(func => func(input))) {
      throw new SparkException("Input validation failed.")
    }

    /**
        數據降維，優化過程中，收斂取決於訓練數據的維度，降維提高收斂速度
        目前只適用於邏輯迴歸
     */
    val scaler = if (useFeatureScaling) {
      new StandardScaler(withStd = true, withMean = false).fit(input.map(_.features))
    } else {
      null
    }

    // Prepend an extra variable consisting of all 1.0's for the intercept.
    //增加偏置項bias：θ0常數項
    // TODO: Apply feature scaling to the weight vector instead of input data.
    val data =
      if (addIntercept) {
        if (useFeatureScaling) {
          input.map(lp => (lp.label, appendBias(scaler.transform(lp.features)))).cache()
        } else {
          //在feature後添加bias，持久化到內存
          input.map(lp => (lp.label, appendBias(lp.features))).cache()
        }
      } else {
        if (useFeatureScaling) {
          input.map(lp => (lp.label, scaler.transform(lp.features))).cache()
        } else {
          input.map(lp => (lp.label, lp.features))
        }
      }

    /**
     * TODO: For better convergence, in logistic regression, the intercepts should be computed
     * from the prior probability distribution of the outcomes; for linear regression,
     * the intercept should be set as the average of response.
     * 初始權重，包括增加偏置項
     */
    val initialWeightsWithIntercept = if (addIntercept && numOfLinearPredictor == 1) {
      appendBias(initialWeights)
    } else {
      /** If `numOfLinearPredictor > 1`, initialWeights already contains intercepts. */
      initialWeights
    }
    //權重訓練優化，進行梯度下降學習，返回最優權重
    val weightsWithIntercept = optimizer.optimize(data, initialWeightsWithIntercept)
    //獲取偏置項
    val intercept = if (addIntercept && numOfLinearPredictor == 1) {
      weightsWithIntercept(weightsWithIntercept.size - 1)
    } else {
      0.0
    }
    //獲取權重
    var weights = if (addIntercept && numOfLinearPredictor == 1) {
      Vectors.dense(weightsWithIntercept.toArray.slice(0, weightsWithIntercept.size - 1))
    } else {
      weightsWithIntercept
    }

    /**
     * The weights and intercept are trained in the scaled space; we're converting them back to
     * the original scale.
     * 對於降維，權重要進行還原
     * Math shows that if we only perform standardization without subtracting means, the intercept
     * will not be changed. w_i = w_i' / v_i where w_i' is the coefficient in the scaled space, w_i
     * is the coefficient in the original space, and v_i is the variance of the column i.
     */
    if (useFeatureScaling) {
      if (numOfLinearPredictor == 1) {
        weights = scaler.transform(weights)
      } else {
        /**
         * For `numOfLinearPredictor > 1`, we have to transform the weights back to the original
         * scale for each set of linear predictor. Note that the intercepts have to be explicitly
         * excluded when `addIntercept == true` since the intercepts are part of weights now.
         */
        var i = 0
        val n = weights.size / numOfLinearPredictor
        val weightsArray = weights.toArray
        while (i < numOfLinearPredictor) {
          val start = i * n
          val end = (i + 1) * n - { if (addIntercept) 1 else 0 }

          val partialWeightsArray = scaler.transform(
            Vectors.dense(weightsArray.slice(start, end))).toArray

          System.arraycopy(partialWeightsArray, 0, weightsArray, start, partialWeightsArray.length)
          i += 1
        }
        weights = Vectors.dense(weightsArray)
      }
    }

    // Warn at the end of the run as well, for increased visibility.
    if (input.getStorageLevel == StorageLevel.NONE) {
      logWarning("The input data was not directly cached, which may hurt performance if its"
        + " parent RDDs are also uncached.")
    }

    // Unpersist cached data
    if (data.getStorageLevel != StorageLevel.NONE) {
      data.unpersist(false)
    }
    //返回訓練模型
    createModel(weights, intercept)
  }
}

梯度下降求解權重
org/apache/spark/mllib/optimization/GradientDescent.scala
optimizer.optimize->GradientDescent.optimize->GradientDescent.runMiniBatchSGD

  def runMiniBatchSGD(
      data: RDD[(Double, Vector)],
      gradient: Gradient,
      updater: Updater,
      stepSize: Double,
      numIterations: Int,
      regParam: Double,
      miniBatchFraction: Double,
      initialWeights: Vector,
      convergenceTol: Double): (Vector, Array[Double]) = {
    //歷史迭代誤差數組
    val stochasticLossHistory = new ArrayBuffer[Double](numIterations)
    // Record previous weight and current one to calculate solution vector difference

    var previousWeights: Option[Vector] = None
    var currentWeights: Option[Vector] = None

    //訓練樣本數m
    val numExamples = data.count()

    // if no data, return initial weights to avoid NaNs
    if (numExamples == 0) {
      logWarning("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
      return (initialWeights, stochasticLossHistory.toArray)
    }

    if (numExamples * miniBatchFraction < 1) {
      logWarning("The miniBatchFraction is too small")
    }

    // 初始化權重Initialize weights as a column vector
    var weights = Vectors.dense(initialWeights.toArray)
    val n = weights.size

    /**
     * For the first iteration, the regVal will be initialized as sum of weight squares
     * if it's L2 updater; for L1 updater, the same logic is followed.
     */
    //這裏的compute進行了第一次迭代，正則化值初始化爲權重的加權平方和
    var regVal = updater.compute(
      weights, Vectors.zeros(weights.size), 0, 1, regParam)._2

    var converged = false // indicates whether converged based on convergenceTol
    var i = 1
    //權重迭代計算
    while (!converged && i <= numIterations) {
      //廣播權重變量，每個Executer都接受到當前權重
      val bcWeights = data.context.broadcast(weights)
      // Sample a subset (fraction miniBatchFraction) of the total data
      // 隨機抽樣樣本，對抽樣的樣本子集，採用treeAggregate的RDD方法，進行聚合計算
      //計算每個樣本的權重向量，誤差值，然後對所有樣本權重向量和誤差值進行累加，這是一次map-reduce
      // compute and sum up the subgradients on this subset (this is one map-reduce)
      val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i)
        .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(//初始值
          //將v合併到c
          seqOp = (c, v) => {
            // c: (grad, loss, count), v: (label, features)
            //返回的是loss
            val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1))
            (c._1, c._2 + l, c._3 + 1)
          },
          //合併兩個c
          combOp = (c1, c2) => {
            // c: (grad, loss, count)
            //
            (c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3)
          })
      bcWeights.destroy(blocking = false)

      if (miniBatchSize > 0) {
        /**
         * lossSum is computed using the weights from the previous iteration
         * and regVal is the regularization value computed in the previous iteration as well.
         */
        //保存誤差，更新權重
        stochasticLossHistory += lossSum / miniBatchSize + regVal
        val update = updater.compute(
          weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble),
          stepSize, i, regParam)
        weights = update._1
        regVal = update._2

        previousWeights = currentWeights
        currentWeights = Some(weights)
        if (previousWeights != None && currentWeights != None) {
          converged = isConverged(previousWeights.get,
            currentWeights.get, convergenceTol)
        }
      } else {
        logWarning(s"Iteration ($i/$numIterations). The size of sampled batch is zero")
      }
      i += 1
    }

    logInfo("GradientDescent.runMiniBatchSGD finished. Last 10 stochastic losses %s".format(
      stochasticLossHistory.takeRight(10).mkString(", ")))
    //返回模型和誤差數組
    (weights, stochasticLossHistory.toArray)

  }

  //判斷是否收斂
  private def isConverged(
      previousWeights: Vector,
      currentWeights: Vector,
      convergenceTol: Double): Boolean = {
    // To compare with convergence tolerance.
    val previousBDV = previousWeights.asBreeze.toDenseVector
    val currentBDV = currentWeights.asBreeze.toDenseVector

    // This represents the difference of updated weights in the iteration.
    val solutionVecDiff: Double = norm(previousBDV - currentBDV)

    solutionVecDiff < convergenceTol * Math.max(norm(currentBDV), 1.0)
  }

}

梯度計算
gradient.compute計算每個樣本的梯度和誤差，線性迴歸中使用LeastSquaresGradient。最小二乘
org/apache/spark/mllib/optimization/Gradient.scala

/**
 * :: DeveloperApi ::
 * Compute gradient and loss for a Least-squared loss function, as used in linear regression.
 * This is correct for the averaged least squares loss function (mean squared error)
 *              L = 1/2n ||A weights-y||^2
 * See also the documentation for the precise formulation.
 */
@DeveloperApi
class LeastSquaresGradient extends Gradient {
  override def compute(data: Vector, label: Double, weights: Vector): (Vector, Double) = {
    //y-h0(x)
    val diff = dot(data, weights) - label

    val loss = diff * diff / 2.0
    val gradient = data.copy
    scal(diff, gradient) //梯度值：x*(y-h0(x))
    (gradient, loss)
  }

  override def compute(
      data: Vector,
      label: Double,
      weights: Vector,
      cumGradient: Vector): Double = {
    val diff = dot(data, weights) - label //y-h0(x)
    axpy(diff, data, cumGradient) // y= x* (y-h0(x))+cumGradient
    diff * diff / 2.0
  }
}

權重更新

/**
* :: DeveloperApi ::
* A simple updater for gradient descent *without* any regularization.
* Uses a step-size decreasing with the square root of the number of iterations.
*/
@DeveloperApi
class SimpleUpdater extends Updater {
override def compute(
  weightsOld: Vector,
  gradient: Vector,
  stepSize: Double,
  iter: Int,
  regParam: Double): (Vector, Double) = {
//當前迭代次數的平方根的倒數作爲趨近的因子α
val thisIterStepSize = stepSize / math.sqrt(iter)
val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector
brzAxpy(-thisIterStepSize, gradient.asBreeze, brzWeights)

(Vectors.fromBreeze(brzWeights), 0)
}
}

Spark MLlib LinearRegression線性迴歸算法源碼解析

線性迴歸

梯度下降法

源碼分析

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

HDU 4302 Holedox Eating (兩個優先隊列)

C++ reverse_iterator 遍歷刪除問題源碼解析

Effective C++拾遺之條款20&21：值和引用雙刃劍

HDU 4355 Party All the Time (三分水題。。。留着TLE)

HDU 4349 Xiao Ming's Hope (組合數的奇偶性&&Lucas定理)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結