數據集——用於數據挖掘、信息檢索、知識發現等

1、氣候監測數據集 http://cdiac.ornl.gov/ftp/ndp026b

2、幾個實用的測試數據集下載的網站

http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網址可以找到reuters數據集http://www.research.att.com/~lewis/reuters21578.html

以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html

進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

3、找了很多測試數據集,寫論文的同志們肯定需要的,至少能用來檢驗算法的效果
可能有一些不能訪問,但是總有能訪問的吧:

UCI收集的機器學習數據集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm

statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/

樣本數據庫
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html

關於基金的數據挖掘的網站
http://www.gotofund.com/index.asp

http://lans.ece.utexas.edu/~strehl/

reuters數據集
http://www.research.att.com/~lewis/reuters21578.html

各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/

進行文本分類&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html

 

 

數據生成器的鏈接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html
關聯:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData
原文地址 http://www.cnblogs.com/bobomouse/archive/2007/05/26/760513.html

WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar

癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

金融數據:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

 

另一個人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的網址可以找到reuters數據集
http://www.research.att.com/~lewis/reuters21578.html

以下網址上有各種數據集:
http://kdd.ics.uci.edu/summary.data.type.html

進行文本分類,還有一個數據集是可以用的,即rainbow的數據集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm


kdnuggets 相關鏈接數據集:
http://www.kdnuggets.com/datasets/index.html

 

還有另外一個很好的資源網址爲:http://kdd.ics.uci.edu/,裏面包含的數據資源如下(按應用領域劃分):

 

Direct Marketing
  KDD CUP 1998 Data

GIS
  Forest CoverType

Indexing
  Corel Image Features
  Pseudo Periodic Synthetic Time Series

Intrusion Detection
  KDD CUP 1999 Data

Process Control
  Synthetic Control Chart Time Series

Recommendation Systems
  Entree Chicago Recommendation Data

Robots
  Pioneer-1 Mobile Robot Data
  Robot Execution Failures

Sign Language Recognition
  Australian Sign Language Data
  High-quality Australian Sign Language Data

Text Categorization
  20 Newsgroups Data
  Reuters-21578 Text Categorization Collection
  NSF Research Awards Abstracts 199 0-2003

World Wide Web
  Microsoft Anonymous Web Data
  MSNBC Anonymous Web Data
  Syskill Webert Web Data

這裏又找到一個,在一個老外的blog上找到的http://www.fs.fed.us/fire/fuelman/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章