最近一直在學習hadoop 一直沒有梳理接觸到的東西,常見算法分類:
推薦系統(推薦引擎):
基於用戶的協同過濾算法UserCF 近鄰算法,容易實現
基於物品的協同過濾算法ItemCF 速度快,容易實現分佈式計算
SlopeOne算法 @Deprecated at mahout 0.8
KNN Linear interpolation item–based推薦算法 最近鄰算法 @Deprecated at mahout 0.8
SVD推薦算法 奇異值分解, 需要降維, 大量預處理
Tree Cluster-based 推薦算法 樹形聚類 大量預處理 @Deprecated at mahout 0.8
分類算法:
支持向量機(SVM)
邏輯迴歸(LR)
梯度下降法(SGD)
神經網絡
隨機森林(RF) ,天貓推薦算法大戰中經常用到(RF + GBDT) 可並行 mapreduce
樸素貝葉斯(Naive Beyes),還有一種補充的貝葉斯算法 cbeyes,效果一般比beyes 要好, 可並行 mapreduce
聚類算法:
canopy clustering
kmeans clustering
層次聚類法
頻繁模式挖掘
mahout(0.9) 最新版支持的常用算法
Latest release version 0.9 has
User and Item based recommenders
Matrix factorization based recommenders
K-Means, Fuzzy K-Means clustering
Latent Dirichlet Allocation
Singular Value Decomposition
Logistic regression classifier
(Complementary) Naive Bayes classifier
Random forest classifier
High performance java collections
A vibrant community
另外:注意 mahout 官網公告,mahout 已經不再支持新的算法了,請關注 最新的 spark。
原文:
Mahout News
25 April 2014 - Goodbye MapReduce
The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce algorithms in the codebase and maintain them.
We are building our future implementations on top of a DSL for linear algebraic operations which has been developed over the last months. Programs written in this DSL are automatically optimized and executed in parallel on Apache Spark.
Furthermore, there is an experimental contribution undergoing which aims to integrate the h20 platform into Mahout.