本程序調用Weka API及libsvm工具包,對基金數據庫進行數據預處理,然後將處理好的數據通過Chart.js 框架來實現數據的可視化,接下來要說明實現流程。
數據讀取
連接數據庫
Class.forName("com.mysql.jdbc.Driver").newInstance();
String url = "jdbc:mysql://localhost:3306/test";
String user = "root";
String password = "";
Connection conn = DriverManager.getConnection(url, user, password);
Statement st = conn.createStatement();
查詢數據
查詢在2015年10月12日(週一)這一天的所有股票交易記錄
ResultSet rs = st.executeQuery("SELECT * FROM history where Time='2015-10-12,1'");
數據處理
這部分使用weka API中的j48算法和svm算法來對數據進行處理
Weka連接數據庫
查詢漲幅不小於0.1的數據
InstanceQuery query = new InstanceQuery();
query.setDatabaseURL("jdbc:mysql://localhost:3306/test");
query.setUsername("root");
query.setPassword("");
query.setQuery("SELECT `Open`,Highest,Lowest,`Close`, `Change`,Increase,Amplitude,HandsAll,Money from history where Increase>=0.1;");
數據預處理
據預處理包括數據的缺失值處理、標準化、規範化和離散化處理。
- 數據的缺失值處理
weka.filters.unsupervised.attribute.ReplaceMissingValues。 對於數值屬性,用平均值代替缺失值,對於nominal屬性,用它的mode(出現最多的值)來代替缺失值。
- 標準化
類(standardize)weka.filters.unsupervised.attribute.Standardize。標準化給定數據集中所有數值屬性的值到一個0均值和單位方差的正態分佈。
- 規範化(Nomalize)
類weka.filters.unsupervised.attribute.Normalize。規範化給定數據集 中的所有數值屬性值,類屬性除外。結果值默認在區間[0,1],但是利用縮放和平移參數,我們能將數值屬性值規範到任何區間。如:但 scale=2.0,translation=-1.0時,你能將屬性值規範到區間[-1,+1]。
- 離散化(discretize)
weka.filters.supervised.attribute.Discretize
weka.filters.unsupervised.attribute.Discretize
。
分別進行監督和無監督的數值屬性的離散化,用來離散數 據集中的一些數值屬性到分類屬性。
數據預處理是所有數據挖掘算法的前提基礎。拿到一個數據源,不太可能直接用於數據挖掘算法。
爲了既不破壞業務數據的數據結構,又能爲數據挖掘算法所使用,就需要進行數據預處理的過程,將數據源進行一定的處理,得到數據挖掘算法的輸入數據。
非監督過濾
1.Add爲數據庫添加一個新的屬性,新的屬性將會包含所有缺失值。可選參數:attributeIndex:屬性位置,從1開始算,last是最後一個,first是第一個attributeName:屬性名稱attributeType:屬性類型,一般是4選1dateFormat:數據格式,參考ISO-8601nominalLabels:名義標籤,多個值用逗號隔開
3.AddID字面意思,添加一個ID
4.AddNoise只對名義屬性有效,依照一定比例修改值。
5.Center將數值化屬性的平均化爲0。
6.ChangeDateFormat修改數據格式
7.Copy複製制定屬性並命名爲Copy Of XX
8.Discretize簡單劃分的離散化處理。參數:attributeIndices:屬性範圍,如1-5,first-lastbins:桶的數量
9.FirstOrder第n個值用n+1項值和n項值的差替換
10.MathExpression功能和AddExpression類似,不過支持的運算更多,特別是MAX和MIN的支持特別有用。所有支持運算符如下:+, -, *, /, pow, log,abs, cos, exp, sqrt, tan, sin, ceil, floor, rint, (, ),A,MEAN, MAX, MIN, SD, COUNT, SUM, SUMSQUARED,
ifelse11.Reorder重新排列屬性,輸入2-last
11.可以讓第一項排到最後,如果輸入1,3,5的話…其他項就沒有了
12.Standardize這個和Center功能大致相同,多了一個標準化單位變異數
13.StringToNominal將String型轉化爲Nominal型14.SwapValues
然後是weka.filters.unsupervised.instance包下的
1.NonSparseToSparse
將所有輸入轉爲稀疏格式
2.Normalize
規範化整個實例集
3.RemoveFolds
交叉驗證,不支持分層,如果需要的話使用監督學習中的方法
4.RemoveRange
移除制定範圍的實例,化爲NaN
5.Resample
隨機抽樣,從現有樣本產生新的小樣本
6.SubsetByExpression
根據規則進行過濾,支持邏輯運算,向上取值,取絕對值等等
代碼實現如下
Instances data = query.retrieveInstances();
data.setClassIndex(0); // 設置分類屬性所在行號(第一行爲0號),instancesTest.numAttributes()可以取得屬性總數
data.setClassIndex(0);
Discretize discretize = new Discretize();
String[] options = new String[6];
options[0] = "-B";
options[1] = "8";
options[2] = "-M";
options[3] = "-1.0";
options[4] = "-R";
options[5] = "2-last";
discretize.setOptions(options);
discretize.setInputFormat(data);
Instances newInstances1 = Filter.useFilter(data, discretize);
存入到本地
DataSink.write("C://j48.arff", data);
此時在C盤根目錄下會出現一個j48.arff文件
對預處理後的數據分類
以J48算法爲例,代碼如下
ile inputFile = new File("C://j48.arff");//訓練語料文件
ArffLoader atf = new ArffLoader();
atf.setFile(inputFile);
Instances instancesTrain = atf.getDataSet(); // 讀入測試文件
instancesTrain.setClassIndex(0); // 設置分類屬性所在行號(第一行爲0號),instancesTest.numAttributes()可以取得屬性總數
double sum = instancesTrain.numInstances(), // 測試語料實例數
right = 0.0f;
data.setClassIndex(0);
Classifier m_classifier = new J48();
m_classifier.buildClassifier(data); // 訓練
for (int i = 0; i < sum; i++)// 測試分類結果
{
if (m_classifier.classifyInstance(instancesTrain.instance(i)) == instancesTrain
.instance(i).classValue())// 如果預測值和答案值相等(測試語料中的分類列提供的須爲正確答案,結果纔有意義)
{
right++;// 正確值加1
}
}
out.println("J48:" + (right / sum));
數據顯示
Chart.js框架,版本1.0.2,一個簡單、輕量級的繪圖框架,基於HTML5 canvas。這個框架能很多種圖,折線圖、柱狀圖、玫瑰圖等。
引入Chart.js文件
我們將下載好的文件整個拷貝到WebRoot根目錄下,效果如圖
首先我們需要在頁面中引入Chart.js文件。此工具庫在全局命名空間中定義了Chart變量。
<script src="./Chart.js"></script>
創建圖表
爲了創建圖表,我們要實例化一個Chart對象。爲了完成前面的步驟,首先需要需要傳入一個繪製圖表的2d context。以下是案例。
- html代碼
<div id="left" style="width:40%">
<canvas id="canvas" height="512" width="512"></canvas>
</div>
- js代碼
window.myBar = new Chart(document.getElementById("canvab")
.getContext("2d")).Bar(barChartData, {
responsive : true
});
我們還可以用jQuery獲取canvas的context。首先從jQuery集合中獲取我們需要的DOM節點,然後在這個DOM節點上調用 getContext(“2d”) 方法。
//Get context with jQuery - using jQuery's .get() method.
var ctx = $("#myChart").get(0).getContext("2d");
//This will get the first returned node in the jQuery collection.
var myNewChart = new Chart(ctx);
當我們完成了在指定的canvas上實例化Chart對象之後,Chart.js會自動針對retina屏幕做縮放。
Chart對象設置完成後,我們就可以繼續創建Chart.js中提供的具體類型的圖表了。下面這個案例中,我們將展示如何繪製一幅極地區域圖(Polar area chart)。
new Chart(ctx).PolarArea(data,options);
自定義表格
定義表格,方便將jdbc讀取的數據傳送到javaScript中。爲每一個td設定一個id,通rs.getString()方法讀取從數據庫中獲取的數據
<tr>
<td width="100" id="Index"><%=rs.getString("Index")%></td>
<td width="100" id="Time"><%=rs.getString("Time")%></td>
<td width="100" id="Open"><%=rs.getString("Open")%></td>
<td width="100" id="Highest"><%=rs.getString("Highest")%></td>
<td width="100" id="Lowest"><%=rs.getString("Lowest")%></td>
<td width="100" id="Close"><%=rs.getString("Close")%></td>
<td width="100" id="Change"><%=rs.getString("Change")%></td>
<td width="100" id="Increase"><%=rs.getString("Increase")%></td>
<td width="100" id="Amplitude"><%=rs.getString("Amplitude")%></td>
<td width="100" id="HandsAll"><%=rs.getString("HandsAll")%></td>
<td width="100" id="Money"><%=rs.getString("Money")%></td>
<td width="100" id="J48 classification precision" value="J48"><%=right / sum%></td>
</tr>
數據結構
通過document.getElementById().innerHTML方法將td標籤 中的數據傳入到data:中
var barChartData =
{
labels : ["Open", "Highest", "Lowest", "Close", "Change",
"Increase", "Amplitude", "HandsAll", "Money", "Accuracy"],
datasets : [
{
fillColor : "rgba(220,220,220,0.5)",
strokeColor : "rgba(220,220,220,0.8)",
highlightFill : "rgba(220,220,220,0.75)",
highlightStroke : "rgba(220,220,220,1)",
data : [
document.getElementById("Open").innerHTML,
document.getElementById("Highest").innerHTML,
document.getElementById("Lowest").innerHTML,
document.getElementById("Close").innerHTML,
document.getElementById("Change").innerHTML,
document.getElementById("Increase").innerHTML,
document.getElementById("Amplitude").innerHTML,
document.getElementById("HandsAll").innerHTML,
document.getElementById("Money").innerHTML,
document
.getElementById("J48 classification precision").innerHTML, ]
},
{
fillColor : "rgba(151,187,205,0.5)",
strokeColor : "rgba(151,187,205,0.8)",
highlightFill : "rgba(151,187,205,0.75)",
highlightStroke : "rgba(151,187,205,1)",
data : [
document.getElementById("Opens").innerHTML,
document.getElementById("Highests").innerHTML,
document.getElementById("Lowests").innerHTML,
document.getElementById("Closes").innerHTML,
document.getElementById("Changes").innerHTML,
document.getElementById("Increases").innerHTML,
document.getElementById("Amplitudes").innerHTML,
document.getElementById("HandsAlls").innerHTML,
document.getElementById("Moneys").innerHTML,
document
.getElementById("SVM classification precision").innerHTML, ]
}
]
}
預測數據獲取
package cn.zju.edu.test;
import java.io.File;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.Iterator;
/*import weka.classifiers.functions.LibSVM;*/
public class DataUtil {
public DataUtil() throws Exception{
Class.forName("com.mysql.jdbc.Driver").newInstance();
String url = "jdbc:mysql://localhost:3306/test";
String user = "root";
String password = "";
Connection conn = DriverManager.getConnection(url, user, password);
Statement st = conn.createStatement();
Statement st1 = conn.createStatement();
ResultSet rs = st
.executeQuery("SELECT Increase FROM history where Time='2015-10-12,1'");
ResultSet rss = st1
.executeQuery("SELECT Increase FROM history where Time='2015-10-13,2'");
while (rs.next()&&rss.next()) {
String str = rs.getString("Increase");
String str2=rss.getString("Increase");
newFile("./train.txt", str+","+str);//保存2015年10月12日的漲幅情況用於訓練
newFile("./test.txt", str2+","+str2);//保存2015年10月13日的漲幅情況用於測試
}
}
public static void newFile(String filePathAndName, String fileContent) {
try {
File myFilePath = new File(filePathAndName.toString());
if (!myFilePath.exists()) { // 如果該文件不存在,則創建
myFilePath.createNewFile();
}
// FileWriter(myFilePath, true); 實現不覆蓋追加到文件裏
// FileWriter(myFilePath); 覆蓋掉原來的內容
FileWriter resultFile = new FileWriter(myFilePath, true);
PrintWriter myFile = new PrintWriter(resultFile);
// 給文件裏面寫內容,原來的會覆蓋掉
myFile.println(fileContent);
resultFile.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
SVR預測類
public class Predict {
public static void main(String[] args) throws Exception {
new DataUtil();//獲取測試數據和訓練數據
List<Double> label = new ArrayList<Double>();
List<svm_node[]> nodeSet = new ArrayList<svm_node[]>();
getData(nodeSet, label, "./train.txt");
int dataRange = nodeSet.get(0).length;
svm_node[][] datas = new svm_node[nodeSet.size()][dataRange]; // 訓練集的向量表
for (int i = 0; i < datas.length; i++) {
for (int j = 0; j < dataRange; j++) {
datas[i][j] = nodeSet.get(i)[j];
}
}
double[] lables = new double[label.size()]; // a,b 對應的lable
for (int i = 0; i < lables.length; i++) {
lables[i] = label.get(i);
}
// 定義svm_problem對象
svm_problem problem = new svm_problem();
problem.l = nodeSet.size(); // 向量個數
problem.x = datas; // 訓練集向量表
problem.y = lables; // 對應的lable數組
// 定義svm_parameter對象
svm_parameter param = new svm_parameter();
param.svm_type = svm_parameter.EPSILON_SVR;
param.kernel_type = svm_parameter.LINEAR;
param.cache_size = 100;
param.eps = 0.00001;
param.C = 1.9;
// 訓練SVM分類模型
System.out.println(svm.svm_check_parameter(problem, param));
// 如果參數沒有問題,則svm.svm_check_parameter()函數返回null,否則返回error描述。
svm_model model = svm.svm_train(problem, param);
// svm.svm_train()訓練出SVM分類模型
// 獲取測試數據
List<Double> testlabel = new ArrayList<Double>();
List<svm_node[]> testnodeSet = new ArrayList<svm_node[]>();
getData(testnodeSet, testlabel, "./test.txt");
svm_node[][] testdatas = new svm_node[testnodeSet.size()][dataRange]; // 訓練集的向量表
for (int i = 0; i < testdatas.length; i++) {
for (int j = 0; j < dataRange; j++) {
testdatas[i][j] = testnodeSet.get(i)[j];
}
}
double[] testlables = new double[testlabel.size()]; // a,b 對應的lable
for (int i = 0; i < testlables.length; i++) {
testlables[i] = testlabel.get(i);
}
// 預測測試數據的lable
double err = 0.0;
for (int i = 0; i < testdatas.length; i++) {
double truevalue = testlables[i];
System.out.print("真實值:"+truevalue + " ");
double predictValue = svm.svm_predict(model, testdatas[i]);
System.out.println("預測值:"+predictValue);
err += Math.abs(predictValue - truevalue);
Class.forName("com.mysql.jdbc.Driver").newInstance();
String url = "jdbc:mysql://localhost:3306/test";
String user = "root";
String password = "";
Connection conn = DriverManager.getConnection(url, user, password);
Statement st = conn.createStatement();
st.executeUpdate("insert into predictresult(truevalue,predictvalue) values('"+truevalue+"'"+","+"'"+predictValue+"');");
conn.close();
DataUtil.newFile("./result.txt", "真實值:"+truevalue + " "+"預測值:"+predictValue+" "+"err=" + err / datas.length);
}
/*System.out.println("err=" + err / datas.length);*/
}
public static void getData(List<svm_node[]> nodeSet, List<Double> label, String filename) {
try {
FileReader fr = new FileReader(new File(filename));
BufferedReader br = new BufferedReader(fr);
String line = null;
while ((line = br.readLine()) != null) {
String[] datas = line.split(",");
svm_node[] vector = new svm_node[datas.length - 1];
for (int i = 0; i < datas.length - 1; i++) {
svm_node node = new svm_node();
node.index = i + 1;
node.value = Double.parseDouble(datas[i]);
vector[i] = node;
}
nodeSet.add(vector);
double lablevalue = Double.parseDouble(datas[datas.length - 1]);
label.add(lablevalue);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
柱狀圖標參數
Bar.defaults = {
//Boolean - If we show the scale above the chart data
scaleOverlay : false,
//Boolean - If we want to override with a hard coded scale
scaleOverride : false,
//** Required if scaleOverride is true **
//Number - The number of steps in a hard coded scale
scaleSteps : null,
//Number - The value jump in the hard coded scale
scaleStepWidth : null,
//Number - The scale starting value
scaleStartValue : null,
//String - Colour of the scale line
scaleLineColor : "rgba(0,0,0,.1)",
//Number - Pixel width of the scale line
scaleLineWidth : 1,
//Boolean - Whether to show labels on the scale
scaleShowLabels : false,
//Interpolated JS string - can access value
scaleLabel : "<%=value%>",
//String - Scale label font declaration for the scale label
scaleFontFamily : "'Arial'",
//Number - Scale label font size in pixels
scaleFontSize : 12,
//String - Scale label font weight style
scaleFontStyle : "normal",
//String - Scale label font colour
scaleFontColor : "#666",
///Boolean - Whether grid lines are shown across the chart
scaleShowGridLines : true,
//String - Colour of the grid lines
scaleGridLineColor : "rgba(0,0,0,.05)",
//Number - Width of the grid lines
scaleGridLineWidth : 1,
//Boolean - If there is a stroke on each bar
barShowStroke : true,
//Number - Pixel width of the bar stroke
barStrokeWidth : 2,
//Number - Spacing between each of the X value sets
barValueSpacing : 5,
//Number - Spacing between data sets within X values
barDatasetSpacing : 1,
//Boolean - Whether to animate the chart
animation : true,
//Number - Number of animation steps
animationSteps : 60,
//String - Animation easing effect
animationEasing : "easeOutQuart",
//Function - Fires when the animation is complete
onAnimationComplete : null
}
蛛網圖標參數
Radar.defaults = {
//Boolean - If we show the scale above the chart data
scaleOverlay : false,
//Boolean - If we want to override with a hard coded scale
scaleOverride : false,
//** Required if scaleOverride is true **
//Number - The number of steps in a hard coded scale
scaleSteps : null,
//Number - The value jump in the hard coded scale
scaleStepWidth : null,
//Number - The centre starting value
scaleStartValue : null,
//Boolean - Whether to show lines for each scale point
scaleShowLine : true,
//String - Colour of the scale line
scaleLineColor : "rgba(0,0,0,.1)",
//Number - Pixel width of the scale line
scaleLineWidth : 1,
//Boolean - Whether to show labels on the scale
scaleShowLabels : false,
//Interpolated JS string - can access value
scaleLabel : "<%=value%>",
//String - Scale label font declaration for the scale label
scaleFontFamily : "'Arial'",
//Number - Scale label font size in pixels
scaleFontSize : 12,
//String - Scale label font weight style
scaleFontStyle : "normal",
//String - Scale label font colour
scaleFontColor : "#666",
//Boolean - Show a backdrop to the scale label
scaleShowLabelBackdrop : true,
//String - The colour of the label backdrop
scaleBackdropColor : "rgba(255,255,255,0.75)",
//Number - The backdrop padding above & below the label in pixels
scaleBackdropPaddingY : 2,
//Number - The backdrop padding to the side of the label in pixels
scaleBackdropPaddingX : 2,
//Boolean - Whether we show the angle lines out of the radar
angleShowLineOut : true,
//String - Colour of the angle line
angleLineColor : "rgba(0,0,0,.1)",
//Number - Pixel width of the angle line
angleLineWidth : 1,
//String - Point label font declaration
pointLabelFontFamily : "'Arial'",
//String - Point label font weight
pointLabelFontStyle : "normal",
//Number - Point label font size in pixels
pointLabelFontSize : 12,
//String - Point label font colour
pointLabelFontColor : "#666",
//Boolean - Whether to show a dot for each point
pointDot : true,
//Number - Radius of each point dot in pixels
pointDotRadius : 3,
//Number - Pixel width of point dot stroke
pointDotStrokeWidth : 1,
//Boolean - Whether to show a stroke for datasets
datasetStroke : true,
//Number - Pixel width of dataset stroke
datasetStrokeWidth : 2,
//Boolean - Whether to fill the dataset with a colour
datasetFill : true,
//Boolean - Whether to animate the chart
animation : true,
//Number - Number of animation steps
animationSteps : 60,
//String - Animation easing effect
animationEasing : "easeOutQuart",
//Function - Fires when the animation is complete
onAnimationComplete : null
}
運行
右鍵單擊Run as-> MyEclipse Server Application
,啓動後在瀏覽器裏輸入:localhost:8080/dataviewt/index.jsp可查看不同可視化效果 主要比較股票在2015-10-12,1和2015-10-13,2這兩個不同時間段的行情指數