在http://libimseti.cz中下載約會的數據集
裏面的ratings.dat有257MB,以逗號分隔,包含用戶ID,檔案ID和評分(檔案ID和用戶ID不是採用同一個匿名方法)
這個數據集經過了預處理,剔除了生成評分個數不到20個的用戶,也剔除了對每個檔案都給出相同分值的用戶
根據《mahout 實戰》中所說,最優的配置是基於用戶的推薦,採用歐氏距離,近鄰數量爲2.
由此寫出的評估程序如下:
public static void evaluateDateData() throws IOException, TasteException
{
DataModel model = new FileDataModel(new File("F:\\mahout\\libimseti\\libimseti-complete\\libimseti\\ratings.dat"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderBuilder builder = new RecommenderBuilder(){
@Override
public Recommender buildRecommender(DataModel model)
throws TasteException {
// TODO Auto-generated method stub
UserSimilarity similarity = new EuclideanDistanceSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(builder, null, model, 0.95, 0.05);
System.out.println(score);
}
運行結果爲0.8415841584158418
在這裏的參數設爲-Xmx1024m