mahout 爲約會數據集推薦

在http://libimseti.cz中下載約會的數據集

裏面的ratings.dat有257MB,以逗號分隔,包含用戶ID,檔案ID和評分(檔案ID和用戶ID不是採用同一個匿名方法)

這個數據集經過了預處理,剔除了生成評分個數不到20個的用戶,也剔除了對每個檔案都給出相同分值的用戶

根據《mahout 實戰》中所說,最優的配置是基於用戶的推薦,採用歐氏距離,近鄰數量爲2.

由此寫出的評估程序如下:

    public static void evaluateDateData() throws IOException, TasteException
    {
        DataModel model = new FileDataModel(new File("F:\\mahout\\libimseti\\libimseti-complete\\libimseti\\ratings.dat"));
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        RecommenderBuilder builder = new RecommenderBuilder(){

            @Override
            public Recommender buildRecommender(DataModel model)
                    throws TasteException {
                // TODO Auto-generated method stub
                UserSimilarity similarity = new EuclideanDistanceSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
            }
            
        };
        double score = evaluator.evaluate(builder, null, model, 0.95, 0.05);
        System.out.println(score);                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

   }

運行結果爲0.8415841584158418

在這裏的參數設爲-Xmx1024m

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章