mahout測試naive Bayes算法

根據mahout in action中的14.6章節做的測試,記錄如下:

1:將20news-bydate-train和20news-bydata-test中的每個目錄中的數據轉換爲以目錄名稱開始的包含所有單詞的簡單文本文件,使用的mahout命令如下:

mahout prepare20newsgroups -p 20news-bydate-train/ -o 20news-train/ -a org.apache.lucene.analysis.atandard.StandardAnalyzer -c UTF-8

 mahout prepare20newsgroups -p 20news-bydate-test/ -o 20news-test/ -a org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8

2:啓動集羣

start-all.sh

3:將第一步中生成的20news-train目錄拷貝到hdfs中。

hadoop fs -put 20news-train /user/root/

4:通過naive Bayes算法訓練樣本生成20news-model,命令及運行過程如下:

mahout trainclassifier -i 20news-train -o 20news-model -type cbays -ng 1 -source hdfs


Running on hadoop, using HADOOP_HOME=/usr/Hadoop/hadoop-0.20.2

No HADOOP_CONF_DIR set, using /usr/Hadoop/hadoop-0.20.2/conf

13/06/19 09:52:11 INFO bayes.TrainClassifier: Training Bayes Classifier

13/06/19 09:52:12 INFO bayes.BayesDriver: Reading features...

13/06/19 09:52:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/06/19 09:52:12 INFO mapred.FileInputFormat: Total input paths to process : 20

13/06/19 09:52:13 INFO mapred.JobClient: Running job: job_201306190949_0001

13/06/19 09:52:14 INFO mapred.JobClient:  map 0% reduce 0%

13/06/19 09:52:27 INFO mapred.JobClient:  map 7% reduce 0%

13/06/19 09:52:30 INFO mapred.JobClient:  map 9% reduce 0%

13/06/19 09:52:33 INFO mapred.JobClient:  map 10% reduce 0%

13/06/19 09:52:42 INFO mapred.JobClient:  map 18% reduce 3%

13/06/19 09:52:45 INFO mapred.JobClient:  map 20% reduce 3%

13/06/19 09:52:54 INFO mapred.JobClient:  map 24% reduce 3%

13/06/19 09:52:57 INFO mapred.JobClient:  map 29% reduce 6%

13/06/19 09:53:00 INFO mapred.JobClient:  map 30% reduce 6%

13/06/19 09:53:06 INFO mapred.JobClient:  map 35% reduce 8%

13/06/19 09:53:09 INFO mapred.JobClient:  map 39% reduce 8%

13/06/19 09:53:12 INFO mapred.JobClient:  map 40% reduce 11%

13/06/19 09:53:15 INFO mapred.JobClient:  map 44% reduce 11%

13/06/19 09:53:18 INFO mapred.JobClient:  map 45% reduce 11%

13/06/19 09:53:21 INFO mapred.JobClient:  map 49% reduce 13%

13/06/19 09:53:24 INFO mapred.JobClient:  map 50% reduce 13%

13/06/19 09:53:27 INFO mapred.JobClient:  map 55% reduce 15%

13/06/19 09:53:33 INFO mapred.JobClient:  map 60% reduce 15%

13/06/19 09:53:36 INFO mapred.JobClient:  map 60% reduce 16%

13/06/19 09:53:39 INFO mapred.JobClient:  map 65% reduce 16%

13/06/19 09:53:42 INFO mapred.JobClient:  map 70% reduce 20%

13/06/19 09:53:51 INFO mapred.JobClient:  map 80% reduce 23%

13/06/19 09:53:57 INFO mapred.JobClient:  map 80% reduce 25%

13/06/19 09:54:00 INFO mapred.JobClient:  map 90% reduce 26%

13/06/19 09:54:09 INFO mapred.JobClient:  map 100% reduce 26%

13/06/19 09:54:12 INFO mapred.JobClient:  map 100% reduce 30%

13/06/19 09:54:18 INFO mapred.JobClient:  map 100% reduce 33%

13/06/19 09:54:24 INFO mapred.JobClient:  map 100% reduce 67%

13/06/19 09:54:30 INFO mapred.JobClient:  map 100% reduce 100%

13/06/19 09:54:32 INFO mapred.JobClient: Job complete: job_201306190949_0001

13/06/19 09:54:32 INFO mapred.JobClient: Counters: 18

13/06/19 09:54:32 INFO mapred.JobClient:   Job Counters

13/06/19 09:54:32 INFO mapred.JobClient:     Launched reduce tasks=1

13/06/19 09:54:32 INFO mapred.JobClient:     Launched map tasks=20

13/06/19 09:54:32 INFO mapred.JobClient:     Data-local map tasks=20

13/06/19 09:54:32 INFO mapred.JobClient:   FileSystemCounters

13/06/19 09:54:32 INFO mapred.JobClient:     FILE_BYTES_READ=95754881

13/06/19 09:54:32 INFO mapred.JobClient:     HDFS_BYTES_READ=16537368

13/06/19 09:54:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=148988140

13/06/19 09:54:32 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=36447002

13/06/19 09:54:32 INFO mapred.JobClient:   Map-Reduce Framework

13/06/19 09:54:32 INFO mapred.JobClient:     Reduce input groups=901416

13/06/19 09:54:32 INFO mapred.JobClient:     Combine output records=1473727

13/06/19 09:54:32 INFO mapred.JobClient:     Map input records=11314

13/06/19 09:54:32 INFO mapred.JobClient:     Reduce shuffle bytes=51554535

13/06/19 09:54:32 INFO mapred.JobClient:     Reduce output records=754846

13/06/19 09:54:32 INFO mapred.JobClient:     Spilled Records=4131595

13/06/19 09:54:32 INFO mapred.JobClient:     Map output bytes=205586582

13/06/19 09:54:32 INFO mapred.JobClient:     Map input bytes=16537368

13/06/19 09:54:32 INFO mapred.JobClient:     Combine input records=6337086

13/06/19 09:54:32 INFO mapred.JobClient:     Map output records=6337086

13/06/19 09:54:32 INFO mapred.JobClient:     Reduce input records=1473727

13/06/19 09:54:32 INFO bayes.BayesDriver: Calculating Tf-Idf...

13/06/19 09:54:32 INFO common.BayesTfIdfDriver: Counts of documents in Each Label

13/06/19 09:54:32 INFO common.BayesTfIdfDriver: {rec.motorcycles=598.0, comp.windows.x=593.0, talk.politics.guns=546.0, talk.politics.mideast=564.0, talk.religion.misc=377.0, rec.sport.baseball=597.0, rec.autos=594.0, rec.sport.hockey=600.0, comp.sys.mac.hardware=578.0, comp.sys.ibm.pc.hardware=590.0, sci.space=593.0, talk.politics.misc=465.0, sci.electronics=591.0, comp.graphics=584.0, sci.crypt=595.0, sci.med=594.0, soc.religion.christian=599.0, alt.atheism=480.0, misc.forsale=585.0, comp.os.ms-windows.misc=591.0}

13/06/19 09:54:32 INFO common.BayesTfIdfDriver: {dataSource=hdfs, alpha_i=1.0, minDf=1, gramSize=1}

13/06/19 09:54:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/06/19 09:54:32 INFO mapred.FileInputFormat: Total input paths to process : 3

13/06/19 09:54:33 INFO mapred.JobClient: Running job: job_201306190949_0002

13/06/19 09:54:34 INFO mapred.JobClient:  map 0% reduce 0%

13/06/19 09:54:49 INFO mapred.JobClient:  map 66% reduce 0%

13/06/19 09:55:01 INFO mapred.JobClient:  map 100% reduce 0%

13/06/19 09:55:07 INFO mapred.JobClient:  map 100% reduce 22%

13/06/19 09:55:13 INFO mapred.JobClient:  map 100% reduce 100%

13/06/19 09:55:15 INFO mapred.JobClient: Job complete: job_201306190949_0002

13/06/19 09:55:15 INFO mapred.JobClient: Counters: 18

13/06/19 09:55:15 INFO mapred.JobClient:   Job Counters

13/06/19 09:55:15 INFO mapred.JobClient:     Launched reduce tasks=1

13/06/19 09:55:15 INFO mapred.JobClient:     Launched map tasks=3

13/06/19 09:55:15 INFO mapred.JobClient:     Data-local map tasks=3

13/06/19 09:55:15 INFO mapred.JobClient:   FileSystemCounters

13/06/19 09:55:15 INFO mapred.JobClient:     FILE_BYTES_READ=54669917

13/06/19 09:55:15 INFO mapred.JobClient:     HDFS_BYTES_READ=36446070

13/06/19 09:55:15 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82004984

13/06/19 09:55:15 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=15645417

13/06/19 09:55:15 INFO mapred.JobClient:   Map-Reduce Framework

13/06/19 09:55:15 INFO mapred.JobClient:     Reduce input groups=304129

13/06/19 09:55:15 INFO mapred.JobClient:     Combine output records=608257

13/06/19 09:55:15 INFO mapred.JobClient:     Map input records=754826

13/06/19 09:55:15 INFO mapred.JobClient:     Reduce shuffle bytes=27334946

13/06/19 09:55:15 INFO mapred.JobClient:     Reduce output records=304129

13/06/19 09:55:15 INFO mapred.JobClient:     Spilled Records=1824770

13/06/19 09:55:15 INFO mapred.JobClient:     Map output bytes=28610110

13/06/19 09:55:15 INFO mapred.JobClient:     Map input bytes=36445773

13/06/19 09:55:15 INFO mapred.JobClient:     Combine input records=754826

13/06/19 09:55:15 INFO mapred.JobClient:     Map output records=754826

13/06/19 09:55:15 INFO mapred.JobClient:     Reduce input records=608257

13/06/19 09:55:15 INFO bayes.BayesDriver: Calculating weight sums for labels and features...

13/06/19 09:55:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/06/19 09:55:15 INFO mapred.FileInputFormat: Total input paths to process : 1

13/06/19 09:55:16 INFO mapred.JobClient: Running job: job_201306190949_0003

13/06/19 09:55:17 INFO mapred.JobClient:  map 0% reduce 0%

13/06/19 09:55:31 INFO mapred.JobClient:  map 100% reduce 0%

13/06/19 09:55:43 INFO mapred.JobClient:  map 100% reduce 100%

13/06/19 09:55:45 INFO mapred.JobClient: Job complete: job_201306190949_0003

13/06/19 09:55:45 INFO mapred.JobClient: Counters: 18

13/06/19 09:55:45 INFO mapred.JobClient:   Job Counters

13/06/19 09:55:45 INFO mapred.JobClient:     Launched reduce tasks=1

13/06/19 09:55:45 INFO mapred.JobClient:     Launched map tasks=2

13/06/19 09:55:45 INFO mapred.JobClient:     Data-local map tasks=2

13/06/19 09:55:45 INFO mapred.JobClient:   FileSystemCounters

13/06/19 09:55:45 INFO mapred.JobClient:     FILE_BYTES_READ=11395006

13/06/19 09:55:45 INFO mapred.JobClient:     HDFS_BYTES_READ=15646192

13/06/19 09:55:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=17092570

13/06/19 09:55:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=5156501

13/06/19 09:55:45 INFO mapred.JobClient:   Map-Reduce Framework

13/06/19 09:55:45 INFO mapred.JobClient:     Reduce input groups=146591

13/06/19 09:55:45 INFO mapred.JobClient:     Combine output records=201494

13/06/19 09:55:45 INFO mapred.JobClient:     Map input records=304128

13/06/19 09:55:45 INFO mapred.JobClient:     Reduce shuffle bytes=5697500

13/06/19 09:55:45 INFO mapred.JobClient:     Reduce output records=146591

13/06/19 09:55:45 INFO mapred.JobClient:     Spilled Records=604482

13/06/19 09:55:45 INFO mapred.JobClient:     Map output bytes=23703690

13/06/19 09:55:45 INFO mapred.JobClient:     Map input bytes=15645194

13/06/19 09:55:45 INFO mapred.JobClient:     Combine input records=912384

13/06/19 09:55:45 INFO mapred.JobClient:     Map output records=912384

13/06/19 09:55:45 INFO mapred.JobClient:     Reduce input records=201494

13/06/19 09:55:45 INFO bayes.BayesDriver: Calculating the weight Normalisation factor for each class...

13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each Label

13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: {rec.motorcycles=10950.08247078713, comp.windows.x=9140.40229191363, talk.politics.guns=9717.884898541553, talk.politics.mideast=9774.792829912312, talk.religion.misc=6253.280625101324, rec.sport.baseball=9964.975295683822, rec.autos=10318.471983615944, rec.sport.hockey=9689.106187278217, comp.sys.mac.hardware=9294.329591214286, comp.sys.ibm.pc.hardware=9261.965098786126, sci.space=10877.81456432966, talk.politics.misc=8292.138753814019, sci.electronics=10382.850213940757, comp.graphics=9327.325741885199, sci.crypt=10401.387454343632, sci.med=10654.852600564873, soc.religion.christian=9581.585347264707, alt.atheism=7503.494393077384, misc.forsale=10119.779786780977, comp.os.ms-windows.misc=9063.881127401353}

13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j for each Label and for each Features

13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: 190570.40125624838

13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count

13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: 146570.0

13/06/19 09:55:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/06/19 09:55:46 INFO mapred.FileInputFormat: Total input paths to process : 1

13/06/19 09:55:46 INFO mapred.JobClient: Running job: job_201306190949_0004

13/06/19 09:55:47 INFO mapred.JobClient:  map 0% reduce 0%

13/06/19 09:55:58 INFO mapred.JobClient:  map 100% reduce 0%

13/06/19 09:56:10 INFO mapred.JobClient:  map 100% reduce 100%

13/06/19 09:56:12 INFO mapred.JobClient: Job complete: job_201306190949_0004

13/06/19 09:56:12 INFO mapred.JobClient: Counters: 18

13/06/19 09:56:12 INFO mapred.JobClient:   Job Counters

13/06/19 09:56:12 INFO mapred.JobClient:     Launched reduce tasks=1

13/06/19 09:56:12 INFO mapred.JobClient:     Launched map tasks=2

13/06/19 09:56:12 INFO mapred.JobClient:     Data-local map tasks=2

13/06/19 09:56:12 INFO mapred.JobClient:   FileSystemCounters

13/06/19 09:56:12 INFO mapred.JobClient:     FILE_BYTES_READ=757

13/06/19 09:56:12 INFO mapred.JobClient:     HDFS_BYTES_READ=15646192

13/06/19 09:56:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1584

13/06/19 09:56:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=932

13/06/19 09:56:12 INFO mapred.JobClient:   Map-Reduce Framework

13/06/19 09:56:12 INFO mapred.JobClient:     Reduce input groups=20

13/06/19 09:56:12 INFO mapred.JobClient:     Combine output records=21

13/06/19 09:56:12 INFO mapred.JobClient:     Map input records=304128

13/06/19 09:56:12 INFO mapred.JobClient:     Reduce shuffle bytes=397

13/06/19 09:56:12 INFO mapred.JobClient:     Reduce output records=20

13/06/19 09:56:12 INFO mapred.JobClient:     Spilled Records=42

13/06/19 09:56:12 INFO mapred.JobClient:     Map output bytes=10423028

13/06/19 09:56:12 INFO mapred.JobClient:     Map input bytes=15645194

13/06/19 09:56:12 INFO mapred.JobClient:     Combine input records=304128

13/06/19 09:56:12 INFO mapred.JobClient:     Map output records=304128

13/06/19 09:56:12 INFO mapred.JobClient:     Reduce input records=21

13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-docCount

13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-termDocCount

13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-featureCount

13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-wordFreq

13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-tfIdf/trainer-vocabCount

13/06/19 09:56:12 INFO driver.MahoutDriver: Program took 240921 ms


5:上述過程生成的20news-model是在hdfs上的,拷貝到本地文件系統中。

 hadoop fs -get /user/root/20news-model /usr/Mahout/dataset/


6:測試模型,命令及運行過程如下:

mahout testclassifier -d 20news-test -m 20news-model -type cbayes -ng 1 -source hdfs -method sequential

Running on hadoop, using HADOOP_HOME=/usr/Hadoop/hadoop-0.20.2

No HADOOP_CONF_DIR set, using /usr/Hadoop/hadoop-0.20.2/conf

13/06/19 10:11:54 INFO bayes.TestClassifier: Loading model from: {basePath=20news-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=20news-test}

13/06/19 10:11:54 INFO bayes.TestClassifier: Testing Bayes Classifier

13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-weights/Sigma_j/part-00000

13/06/19 10:11:55 INFO io.SequenceFileModelReader: Read 50000 feature weights

13/06/19 10:11:55 INFO io.SequenceFileModelReader: Read 100000 feature weights

13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-weights/Sigma_k/part-00000

13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-weights/Sigma_kSigma_j/part-00000

13/06/19 10:11:55 INFO io.SequenceFileModelReader: 190570.40125624838

13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-thetaNormalizer/part-00000

13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-tfIdf/trainer-tfIdf/part-00000

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.sport.baseball -127395.14399316712 547567.2698760114 -0.23265660860630674

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.crypt -189010.62350617294 547567.2698760114 -0.3451824714595736

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.sport.hockey -166203.2548335905 547567.2698760114 -0.3035302947731423

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.politics.guns -198793.14260997035 547567.2698760114 -0.3630478911841903

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: soc.religion.christian -158106.48187003663 547567.2698760114 -0.2887434851718539

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.electronics -138650.82033374818 547567.2698760114 -0.25321239592195427

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.os.ms-windows.misc -547567.2698760114 547567.2698760114 -1.0

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: misc.forsale -141981.48005545404 547567.2698760114 -0.2592950453148956

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.religion.misc -134885.60852883724 547567.2698760114 -0.2463361416020722

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: alt.atheism -134262.4272892253 547567.2698760114 -0.24519805086163582

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.windows.x -172513.19965389522 547567.2698760114 -0.3150538922696353

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.politics.mideast -189368.63272082788 547567.2698760114 -0.3458362892356726

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.sys.ibm.pc.hardware -134535.56471897085 547567.2698760114 -0.24569687072317975

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.sys.mac.hardware -121323.62827571077 547567.2698760114 -0.22156844455510047

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.space -189203.04544769705 547567.2698760114 -0.3455338838834164

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.motorcycles -138625.26282429774 547567.2698760114 -0.2531657212741868

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.autos -136935.18434679657 547567.2698760114 -0.25007919917821886

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.graphics -161979.38306986375 547567.2698760114 -0.29581640828631267

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.politics.misc -159579.70032298338 547567.2698760114 -0.29143396455949216

13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.med -183835.5334355675 547567.2698760114 -0.3357314133790253

13/06/19 10:11:58 INFO bayes.TestClassifier: Classified instances from talk.politics.mideast.txt

13/06/19 10:11:58 INFO bayes.TestClassifier: Classified instances from comp.sys.mac.hardware.txt

13/06/19 10:11:58 INFO bayes.TestClassifier: Classified instances from rec.sport.baseball.txt

13/06/19 10:11:59 INFO bayes.TestClassifier: Classified instances from misc.forsale.txt

13/06/19 10:11:59 INFO bayes.TestClassifier: Classified instances from talk.religion.misc.txt

13/06/19 10:11:59 INFO bayes.TestClassifier: Classified instances from rec.motorcycles.txt

13/06/19 10:12:00 INFO bayes.TestClassifier: Classified instances from sci.electronics.txt

13/06/19 10:12:00 INFO bayes.TestClassifier: Classified instances from sci.space.txt

13/06/19 10:12:01 INFO bayes.TestClassifier: Classified instances from talk.politics.guns.txt

13/06/19 10:12:01 INFO bayes.TestClassifier: Classified instances from rec.sport.hockey.txt

13/06/19 10:12:02 INFO bayes.TestClassifier: Classified instances from alt.atheism.txt

13/06/19 10:12:02 INFO bayes.TestClassifier: Classified instances from comp.graphics.txt

13/06/19 10:12:03 INFO bayes.TestClassifier: Classified instances from comp.sys.ibm.pc.hardware.txt

13/06/19 10:12:03 INFO bayes.TestClassifier: Classified instances from comp.windows.x.txt

13/06/19 10:12:04 INFO bayes.TestClassifier: Classified instances from talk.politics.misc.txt

13/06/19 10:12:04 INFO bayes.TestClassifier: Classified instances from rec.autos.txt

13/06/19 10:12:05 INFO bayes.TestClassifier: Classified instances from sci.crypt.txt

13/06/19 10:12:05 INFO bayes.TestClassifier: Classified instances from sci.med.txt

13/06/19 10:12:06 INFO bayes.TestClassifier: Classified instances from comp.os.ms-windows.misc.txt

13/06/19 10:12:07 INFO bayes.TestClassifier: Classified instances from soc.religion.christian.txt

13/06/19 10:12:07 INFO bayes.TestClassifier: =======================================================

Summary

-------------------------------------------------------

Correctly Classified Instances          :       5997       79.6203%

Incorrectly Classified Instances        :       1535       20.3797%

Total Classified Instances              :       7532


=======================================================

Confusion Matrix

-------------------------------------------------------

a        b        c        d        e        f        g        h        i        j        k        l        m        n        o        p        q        r        s        t        <--Classified as

385      0        7        0        0        0        0        3        0        0        0        0        0        1        0        0        1        0        0        0         |  397       a     = rec.sport.baseball

3        372      1        1        0        3        1        1        0        0        1        0        0        3        0        0        1        7        0        2         |  396       b     = sci.crypt

7        2        384      0        1        2        0        1        0        0        0        0        0        0        0        2        0        0        0        0         |  399       c     = rec.sport.hockey

3        12       0        327      1        4        1        3        0        0        0        1        0        0        1        5        2        1        0        3         |  364       d     = talk.politics.guns

5        0        1        0        368      2        2        2        0        2        0        0        1        0        2        3        0        3        0        7         |  398       e     = soc.religion.christian

1        14       0        0        0        321      7        6        0        0        0        0        11       5        3        7        4        11       0        3         |  393       f     = sci.electronics

4        9        0        0        0        2        258      4        0        0        4        0        51       7        5        7        1        41       0        1         |  394       g     = comp.os.ms-windows.misc

1        0        1        0        0        5        2        343      0        0        0        0        11       8        1        6        9        2        0        1         |  390       h     = misc.forsale

9        9        2        33       102      0        0        0        16       24       0        5        1        3        13       11       6        5        0        12        |  251       i     = talk.religion.misc

4        13       2        10       85       7        1        0        1        134      0        5        2        1        13       18       4        2        0        17        |  319       j     = alt.atheism

1        5        0        0        0        4        11       4        0        0        287      0        11       6        2        3        0        60       0        1         |  395       k     = comp.windows.x

5        7        0        3        11       1        0        2        0        0        0        337      0        0        0        5        2        1        0        2         |  376       l     = talk.politics.mideast

0        1        0        0        0        24       20       12       0        0        1        0        292      29       1        2        0        10       0        0         |  392       m     = comp.sys.ibm.pc.hardware

3        1        0        0        0        14       7        10       0        0        0        0        6        329      4        2        3        6        0        0         |  385       n     = comp.sys.mac.hardware

1        2        0        1        1        4        0        1        0        0        1        0        0        0        370      0        0        9        0        4         |  394       o     = sci.space

1        0        0        0        0        2        0        3        0        0        0        0        1        0        0        384      6        0        0        1         |  398       p     = rec.motorcycles

1        0        2        0        0        6        0        11       0        0        0        0        2        0        1        8        364      1        0        0         |  396       q     = rec.autos

5        10       0        0        0        10       7        6        0        0        14       1        11       11       8        1        2        301      0        2         |  389       r     = comp.graphics

8        32       1        109      9        2        0        1        0        0        0        1        0        3        19       20       5        1        87       12        |  310       s     = talk.politics.misc

3        0        2        1        4        13       0        7        0        0        0        0        1        1        5        10       3        8        0        338       |  396       t     = sci.med

Default Category: unknown: 20



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章