mahout 常見機器學習算法及分類

最近一直在學習hadoop 一直沒有梳理接觸到的東西,常見算法分類:

推薦系統(推薦引擎):

  1. 基於用戶的協同過濾算法UserCF      近鄰算法,容易實現

  2. 基於物品的協同過濾算法ItemCF       速度快,容易實現分佈式計算

  3. SlopeOne算法       @Deprecated at mahout 0.8

  4. KNN Linear interpolation item–based推薦算法    最近鄰算法   @Deprecated at mahout 0.8

  5. SVD推薦算法   奇異值分解, 需要降維, 大量預處理

  6. Tree Cluster-based 推薦算法   樹形聚類 大量預處理  @Deprecated at mahout 0.8


分類算法:


    1. 支持向量機(SVM)

    2. 邏輯迴歸(LR)

    3. 梯度下降法(SGD)

    4. 神經網絡

    5. 隨機森林(RF) ,天貓推薦算法大戰中經常用到(RF + GBDT) 可並行 mapreduce 

    6. 樸素貝葉斯(Naive Beyes),還有一種補充的貝葉斯算法 cbeyes,效果一般比beyes 要好, 可並行 mapreduce



聚類算法:


    1. canopy clustering

    2. kmeans clustering

    3. 層次聚類法


頻繁模式挖掘


mahout(0.9) 最新版支持的常用算法

Latest release version 0.9 has

  • User and Item based recommenders

  • Matrix factorization based recommenders

  • K-Means, Fuzzy K-Means clustering

  • Latent Dirichlet Allocation

  • Singular Value Decomposition

  • Logistic regression classifier

  • (Complementary) Naive Bayes classifier

  • Random forest classifier

  • High performance java collections

  • A vibrant community



另外:注意 mahout 官網公告,mahout 已經不再支持新的算法了,請關注 最新的 spark。

原文:

Mahout News

25 April 2014 - Goodbye MapReduce

The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce algorithms in the codebase and maintain them.

We are building our future implementations on top of a DSL for linear algebraic operations which has been developed over the last months. Programs written in this DSL are automatically optimized and executed in parallel on Apache Spark.

Furthermore, there is an experimental contribution undergoing which aims to integrate the h20 platform into Mahout.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章