spark MLlib學習-卡方檢測

spark-卡方檢測

  • 卡方檢測基本原理
  • 卡方檢測基本步驟
  • 代碼實現
  • 運行結果

代碼塊

import org.apache.log4j.{Level, Logger}
import org.apache.spark.mllib.linalg.{Matrices, Matrix, Vectors}
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.{SparkConf, SparkContext}

/**
  * Created by Administrator on 2017/2/8 0008.
  */
object ChiSqlTest {
  /*
  分別對Vector和Matrix 進行卡方檢驗

      *********************************************************
      * 卡方檢測表示統計樣本的實際觀測值和預測值之間的偏離程度,
      * 實際觀測值與預測值之間的偏離程度決定卡方值的大小,卡方值
      * 越大,表示越偏離樣本的實際值,反之,越小表示越接近實際值
      * 如果卡方爲0,表示預測值和實際值完全吻合。
      * *********************************************************
   */
  def main(args: Array[String]) {
    val conf = new SparkConf()
      .setMaster("local")   .setAppName(this.getClass.getSimpleName.filter(!_.equals('$')))
    val sc = new SparkContext(conf)
    Logger.getRootLogger.setLevel(Level.WARN)
    val vd = Vectors.dense(1, 2, 3, 4, 5)
    val vResult = Statistics.chiSqTest(vd)
    println(s"向量卡方檢測 :$vResult")
    val mtx = Matrices.dense(3, 2, Array(1, 3, 5, 2, 4, 6))
    val mtxResult = Statistics.chiSqTest(mtx)
    println(s"矩陣的卡方檢測:$mtxResult")

    val mtx2 = Matrices.dense(2, 2, Array(1, 2, 3, 4))
    printChiSqTest(mtx2)
    sc.stop()
    //打印信息 方差,自由度,統計量,p值
  }

  def printChiSqTest(matrix: Matrix): Unit = {
    val mtxResult = Statistics.chiSqTest(matrix)
    println(mtxResult)
  }
}

運行結果

17/04/01 18:55:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/04/01 18:55:46 INFO SparkUI: Started SparkUI at http://121.48.185.192:4040
17/04/01 18:55:46 INFO Executor: Starting executor ID driver on host localhost
17/04/01 18:55:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56289.
17/04/01 18:55:46 INFO NettyBlockTransferService: Server created on 56289
17/04/01 18:55:46 INFO BlockManagerMaster: Trying to register BlockManager
17/04/01 18:55:46 INFO BlockManagerMasterEndpoint: Registering block manager localhost:56289 with 457.9 MB RAM, BlockManagerId(driver, localhost, 56289)
17/04/01 18:55:46 INFO BlockManagerMaster: Registered BlockManager
向量卡方檢測 :Chi squared test summary:
method: pearson
degrees of freedom = 4 
statistic = 3.333333333333333 
pValue = 0.5036682742334986 
No presumption against null hypothesis: observed follows the same distribution as expected..
矩陣的卡方檢測:Chi squared test summary:
method: pearson
degrees of freedom = 2 
statistic = 0.14141414141414144 
pValue = 0.931734784568187 
No presumption against null hypothesis: the occurrence of the outcomes is statistically independent..
Chi squared test summary:
method: pearson
degrees of freedom = 1 
statistic = 0.07936507936507939 
pValue = 0.7781596861761658 
No presumption against null hypothesis: the occurrence of the outcomes is statistically independent..

Process finished with exit code 0
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章