各種股票軟件,例如通達信、同花順、大智慧,都可以實時查看股票價格和走勢,做一些簡單的選股和定量分析,但是如果你想做更復雜的分析,例如迴歸分析、關聯分析等就有點捉襟見肘,所以最好能夠獲取股票歷史及實時數據並存儲到數據庫,然後再通過其他工具,例如SPSS、SAS、EXCEL或者其他高級編程語言連接數據庫獲取股票數據進行定量分析,這樣就能實現更多目的了。
爲此,首先需要找到可以獲取股票數據的接口,新浪、雅虎、騰訊等都有接口可以實時獲取股票數據,歷史數據選擇了雅虎接口,收盤數據選擇了騰訊接口。
(1)項目結構
(2)數據庫連接池
connectionpool.py
#-*- coding: UTF-8 -*- ''' create a connection pool ''' from DBUtils import PooledDB import MySQLdb import string maxconn = 30 #最大連接數 mincached = 10 #最小空閒連接 maxcached = 20 #最大空閒連接 maxshared = 30 #最大共享連接 connstring="root#root#127.0.0.1#3307#pystock#utf8" #數據庫地址 dbtype = "mysql" #選擇mysql作爲存儲數據庫 def createConnectionPool(connstring, dbtype): db_conn = connstring.split("#"); if dbtype=='mysql': try: pool = PooledDB.PooledDB(MySQLdb, user=db_conn[0],passwd=db_conn[1],host=db_conn[2],port=string.atoi(db_conn[3]),db=db_conn[4],charset=db_conn[5], mincached=mincached,maxcached=maxcached,maxshared=maxshared,maxconnections=maxconn) return pool except Exception, e: raise Exception,'conn datasource Excepts,%s!!!(%s).'%(db_conn[2],str(e)) return None pool = createConnectionPool(connstring, dbtype)
(3)數據庫操作
DBOperator.py
#-*- coding: UTF-8 -*- ''' Created on 2015-3-13 @author: Casey ''' import MySQLdb from stockmining.stocks.setting import LoggerFactory import connectionpool class DBOperator(object): def __init__(self): self.logger = LoggerFactory.getLogger('DBOperator') #self.conn = None def connDB(self): #單連接 #self.conn=MySQLdb.connect(host="127.0.0.1",user="root",passwd="root",db="pystock",port=3307,charset="utf8") #連接池中獲取連接 self.conn=connectionpool.pool.connection() return self.conn def closeDB(self): if(self.conn != None): self.conn.close() def insertIntoDB(self, table, dict): try: if(self.conn != None): cursor = self.conn.cursor() else: raise MySQLdb.Error('No connection') sql = "insert into " + table + "(" param = [] for key in dict: sql += key + ',' param.append(dict.get(key)) param = tuple(param) sql = sql[:-1] + ") values(" for i in range(len(dict)): sql += "%s," sql = sql[:-1] + ")" self.logger.debug(sql % param) n = cursor.execute(sql, param) self.conn.commit() cursor.close() except MySQLdb.Error,e: self.logger.error("Mysql Error %d: %s" % (e.args[0], e.args[1])) self.conn.rollback() def execute(self, sql): try: if(self.conn != None): cursor = self.conn.cursor() else: raise MySQLdb.Error('No connection') n = cursor.execute(sql) return n except MySQLdb.Error,e: self.logger.error("Mysql Error %d: %s" % (e.args[0], e.args[1])) def findBySQL(self, sql): try: if(self.conn != None): cursor = self.conn.cursor() else: raise MySQLdb.Error('No connection') cursor.execute(sql) rows = cursor.fetchall() return rows except MySQLdb.Error,e: self.logger.error("Mysql Error %d: %s" % (e.args[0], e.args[1])) def findByCondition(self, table, fields, wheres): try: if(self.conn != None): cursor = self.conn.cursor() else: raise MySQLdb.Error('No connection') sql = "select " for field in fields: sql += field + "," sql = sql[:-1] + " from " + table + " where " param = [] values = '' for where in wheres: sql += where.key + "='%s' and " param.append(where.value) param = tuple(param) self.logger.debug(sql) n = cursor.execute(sql[:-5] % param) self.conn.commit() cursor.close() except MySQLdb.Error,e: self.logger.error("Mysql Error %d: %s" % (e.args[0], e.args[1]))
(4)日誌
LoggerFactory.py
#-*- coding: UTF-8 -*- ''' Created on 2015-3-11 @author: Casey ''' import logging import time ''' 傳入名稱 ''' def getLogger(name): now = time.strftime('%Y-%m-%d %H:%M:%S') logging.basicConfig( level = logging.DEBUG, format = now +" : " + name + ' LINE %(lineno)-4d %(levelname)-8s %(message)s', datefmt = '%m-%d %H:%M', filename = "d:\\stocks\stock.log", filemode = 'w'); console = logging.StreamHandler(); console.setLevel(logging.DEBUG); formatter = logging.Formatter(name + ': LINE %(lineno)-4d : %(levelname)-8s %(message)s'); console.setFormatter(formatter); logger = logging.getLogger(name) logger.addHandler(console); return logger if __name__ == '__main__': getLogger("www").debug("www")
(5)獲取股票歷史數據
參 數:s — 股票名稱
a —
起始時間,月
b — 起始時間,日
c — 起始時間,年
d — 結束時間,月
e — 結束時間,日
f — 結束時間,年
g— 時間週期。
(一定注意月份參數,其值比真實數據-1。如需要9月數據,則寫爲08。)
示例 查詢浦發銀行2010.09.25 – 2010.10.8之間日線數據
http://ichart.yahoo.com/table.csv?s=600000.SS&a=08&b=25&c=2010&d=09&e=8&f=2010&g=d
返回:
Date,Open,High,Low,Close,Volume,Adj Close
2010-09-30,12.37,12.99,12.32,12.95,76420500,12.95
2010-09-29,12.20,12.69,12.12,12.48,79916400,12.48
2010-09-28,12.92,12.92,12.57,12.58,63988100,12.58
2010-09-27,13.00,13.02,12.89,12.94,43203600,12.94
因爲數據量比較大,需要跑很久,所以也可以考慮多線程模式來獲取相關數據,單線程模式:
#-*- coding: UTF-8 -*- ''' Created on 2015-3-1 @author: Casey ''' import urllib import re import sys from setting import params import urllib2 from db import * dbOperator = DBOperator() table = "stock_quote_yahoo" '''查找指定日期股票流量''' def isStockExitsInDate(table, stock, date): sql = "select * from " + table + " where code = '%d' and date='%s'" % (stock, date) n = dbOperator.execute(sql) if n >= 1: return True def getHistoryStockData(code, dataurl): try: r = urllib2.Request(dataurl) try: stdout = urllib2.urlopen(r, data=None, timeout=3) except Exception,e: print ">>>>>> Exception: " +str(e) return None stdoutInfo = stdout.read().decode(params.codingtype).encode('utf-8') tempData = stdoutInfo.replace('"', '') stockQuotes = [] if tempData.find('404') != -1: stockQuotes = tempData.split("\n") stockDetail = {} for stockQuote in stockQuotes: stockInfo = stockQuote.split(",") if len(stockInfo) == 7 and stockInfo[0]!='Date': if not isStockExitsInDate(table, code, stockInfo[0]): stockDetail["date"] = stockInfo[0] stockDetail["open"] = stockInfo[1] #開盤 stockDetail["high"] = stockInfo[2] #最高 stockDetail["low"] = stockInfo[3] #最低 stockDetail["close"] = stockInfo[4] #收盤 stockDetail["volume"] = stockInfo[5] #交易量 stockDetail["adj_close"] = stockInfo[6] #收盤adj價格 stockDetail["code"] = code #代碼 dbOperator.insertIntoDB(table, stockDetail) result = tempData except Exception as err: print ">>>>>> Exception: " + str(dataurl) + " " + str(err) else: return result finally: None def get_stock_history(): #滬市2005-2015歷史數據 for code in range(601999, 602100): dataUrl = "http://ichart.yahoo.com/table.csv?s=%d.SS&a=01&b=01&c=2005&d=01&e=01&f=2015&g=d" % code print getHistoryStockData(code, dataUrl ) #深市2005-2015歷史數據 for code in range(1, 1999): dataUrl = "http://ichart.yahoo.com/table.csv?s=%06d.SZ&a=01&b=01&c=2005&d=01&e=01&f=2015&g=d" % code print getHistoryStockData(code, dataUrl) #中小板股票 for code in range(2001, 2999): dataUrl = "http://ichart.yahoo.com/table.csv?s=%06d.SZ&a=01&b=01&c=2005&d=01&e=01&f=2015&g=d" % code print getHistoryStockData(code, dataUrl) #創業板股票 for code in range(300001, 300400): dataUrl = "http://ichart.yahoo.com/table.csv?s=%d.SZ&a=01&b=01&c=2005&d=01&e=01&f=2015&g=d" % code print getHistoryStockData(code, dataUrl) def main(): "main function" dbOperator.connDB() get_stock_history() dbOperator.closeDB() if __name__ == '__main__': main()
(6)獲取實時價格和現金流數據
A:實時價格數據採用騰訊的接口:滬市:http://qt.gtimg.cn/q=sh<int>,深市:http://qt.gtimg.cn/q=sz<int>
如獲取平安銀行的股票實時數據:http://qt.gtimg.cn/q=sz000001,會返回一個包含股票數據的字符串:
v_sz000001="51~平安銀行~000001~11.27~11.27~11.30~316703~151512~165192~11.27~93~11.26~ 4352~11.25~4996~11.24~1037~11.23~1801~11.28~1181~11.29~2108~11.30~1075~11.31~1592~11.32~ 1118~15:00:24/11.27/3146/S/3545407/17948|14:56:59/11.26/15/S/16890/17787| 14:56:56/11.25/404/S/454693/17783|14:56:54/11.26/173/B/194674/17780|14:56:51 /11.26/306/B/344526/17777|14:56:47/11.26/16/B/18016/17773~ 20151029150142~0.00~0.00~11.36~11.25~ 11.26/313557/354285045~ 316703~35783~0.27~7.38~~11.36~11.25~0.98~1330.32~1612.59~1.03~12.40~10.14~";
數據比較多,比較有用的是:1-名稱;2-代碼;3-價格;4-昨日收盤;5-今日開盤;6-交易量(手);7-外盤;8-內盤;9-買一;10-買一量;11-買二;12-買二量;13-買三;14-買三量;15-買四;16-買四量;17-買五;18-買五量;19-賣一;20-賣一量;21-賣二;22-賣二量;23-賣三;24-賣三量;25-賣四;26-賣四量;27-賣五;28-賣五量;30-時間;31-漲跌;32-漲跌率;33-最高價;34-最低價;35-成交量(萬);38-換手率;39-市盈率;42-振幅;43-流通市值;44-總市值;45-市淨率
B:現金流數據仍然採用騰訊接口:滬市:http://qt.gtimg.cn/q=ff_sh<int>,深市:http://qt.gtimg.cn/q=ff_sz<int>
例如平安銀行的現金流數據http://qt.gtimg.cn/q=ff_sz000001:
v_ff_sz000001="sz000001~21162.20~24136.40~-2974.20~-8.31~14620.87~11646.65~2974.22~ 8.31~35783.07~261502.0~261158.3~平安銀行~20151029~20151028^37054.20^39358.20~ 20151027^39713.50^42230.70~20151026^82000.80^83689.90~20151023^81571.30^71743.10";
比較重要的:1-主力流入;2-主力流出;3-主力淨流量;4-主力流入/主力總資金;5-散戶流入;6-散戶流出;7-散戶淨流量;8-散戶流入/散戶總資金;9-總資金流量;12-名字;13-日期
採用多線程、數據庫連接池實現股票實時價格和現金流數據的獲取:
#-*- coding: UTF-8 -*- ''' Created on 2015年3月2日 @author: Casey ''' import time import threading ''' 上證編碼:'600001' .. '602100' 深圳編碼:'000001' .. '001999' 中小板:'002001' .. '002999' 創業板:'300001' .. '300400' ''' import urllib2 from datetime import date from db import * from setting import * class StockTencent(object): #數據庫表 __stockTables = {'cash':'stock_cash_tencent','quotation':'stock_quotation_tencent'} '''初始化''' def __init__(self): self.__logger = LoggerFactory.getLogger('StockTencent') self.__dbOperator = DBOperator() def main(self): self.__dbOperator.connDB() threading.Thread(target = self.getStockCash).start() threading.Thread(target = self.getStockQuotation).start() self.__dbOperator.closeDB() '''查找指定日期股票流量''' def __isStockExitsInDate(self, table, stock, date): sql = "select * from " + table + " where code = '%s' and date='%s'" % (stock, date) n = self.__dbOperator.execute(sql) if n >= 1: return True '''獲取股票資金流明細''' def __getStockCashDetail(self, dataUrl): #讀取數據 tempData = self.__getDataFromUrl(dataUrl) if tempData == None: time.sleep(10) tempData = self.__getDataFromUrl(dataUrl) return False #解析資金流向數據 stockCash = {} stockInfo = tempData.split('~') if len(stockInfo) < 13: return if len(stockInfo) != 0 and stockInfo[0].find('pv_none') == -1: table = self.__stockTables['cash'] code = stockInfo[0].split('=')[1][2:] date = stockInfo[13] if not self.__isStockExitsInDate(table, code, date): stockCash['code'] = stockInfo[0].split('=')[1][2:] stockCash['main_in_cash'] = stockInfo[1] stockCash['main_out_cash'] = stockInfo[2] stockCash['main_net_cash'] = stockInfo[3] stockCash['main_net_rate'] = stockInfo[4] stockCash['private_in_cash'] = stockInfo[5] stockCash['private_out_cash'] = stockInfo[6] stockCash['private_net_cash'] = stockInfo[7] stockCash['private_net_rate'] = stockInfo[8] stockCash['total_cash'] = stockInfo[9] stockCash['name'] = stockInfo[12].decode('utf8') stockCash['date'] = stockInfo[13] #插入數據庫 self.__dbOperator.insertIntoDB(table, stockCash) '''獲取股票交易信息明細''' def getStockQuotationDetail(self, dataUrl): tempData = self.__getDataFromUrl(dataUrl) if tempData == None: time.sleep(10) tempData = self.__getDataFromUrl(dataUrl) return False stockQuotation = {} stockInfo = tempData.split('~') if len(stockInfo) < 45: return if len(stockInfo) != 0 and stockInfo[0].find('pv_none') ==-1 and stockInfo[3].find('0.00') == -1: table = self.__stockTables['quotation'] code = stockInfo[2] date = stockInfo[30] if not self.__isStockExitsInDate(table, code, date): stockQuotation['code'] = stockInfo[2] stockQuotation['name'] = stockInfo[1].decode('utf8') stockQuotation['price'] = stockInfo[3] stockQuotation['yesterday_close'] = stockInfo[4] stockQuotation['today_open'] = stockInfo[5] stockQuotation['volume'] = stockInfo[6] stockQuotation['outer_sell'] = stockInfo[7] stockQuotation['inner_buy'] = stockInfo[8] stockQuotation['buy_one'] = stockInfo[9] stockQuotation['buy_one_volume'] = stockInfo[10] stockQuotation['buy_two'] = stockInfo[11] stockQuotation['buy_two_volume'] = stockInfo[12] stockQuotation['buy_three'] = stockInfo[13] stockQuotation['buy_three_volume'] = stockInfo[14] stockQuotation['buy_four'] = stockInfo[15] stockQuotation['buy_four_volume'] = stockInfo[16] stockQuotation['buy_five'] = stockInfo[17] stockQuotation['buy_five_volume'] = stockInfo[18] stockQuotation['sell_one'] = stockInfo[19] stockQuotation['sell_one_volume'] = stockInfo[20] stockQuotation['sell_two'] = stockInfo[22] stockQuotation['sell_two_volume'] = stockInfo[22] stockQuotation['sell_three'] = stockInfo[23] stockQuotation['sell_three_volume'] = stockInfo[24] stockQuotation['sell_four'] = stockInfo[25] stockQuotation['sell_four_volume'] = stockInfo[26] stockQuotation['sell_five'] = stockInfo[27] stockQuotation['sell_five_volume'] = stockInfo[28] stockQuotation['datetime'] = stockInfo[30] stockQuotation['updown'] = stockInfo[31] stockQuotation['updown_rate'] = stockInfo[32] stockQuotation['heighest_price'] = stockInfo[33] stockQuotation['lowest_price'] = stockInfo[34] stockQuotation['volume_amout'] = stockInfo[35].split('/')[2] stockQuotation['turnover_rate'] = stockInfo[38] stockQuotation['pe_rate'] = stockInfo[39] stockQuotation['viberation_rate'] = stockInfo[42] stockQuotation['circulated_stock'] = stockInfo[43] stockQuotation['total_stock'] = stockInfo[44] stockQuotation['pb_rate'] = stockInfo[45] self.__dbOperator.insertIntoDB(table, stockQuotation) '''讀取信息''' def __getDataFromUrl(self, dataUrl): r = urllib2.Request(dataUrl) try: stdout = urllib2.urlopen(r, data=None, timeout=3) except Exception,e: self.__logger.error(">>>>>> Exception: " +str(e)) return None stdoutInfo = stdout.read().decode(params.codingtype).encode('utf-8') tempData = stdoutInfo.replace('"', '') self.__logger.debug(tempData) return tempData '''獲取股票現金流量''' def getStockCash(self): self.__logger.debug("開始:收集股票現金流信息") try: #滬市股票 for code in range(600001, 602100): dataUrl = "http://qt.gtimg.cn/q=ff_sh%d" % code self.__getStockCashDetail(dataUrl) #深市股票 for code in range(1, 1999): dataUrl = "http://qt.gtimg.cn/q=ff_sz%06d" % code self.__getStockCashDetail(dataUrl) #中小板股票 for code in range(2001, 2999): dataUrl = "http://qt.gtimg.cn/q=ff_sz%06d" % code self.__getStockCashDetail(dataUrl) #'300001' .. '300400' #創業板股票 for code in range(300001, 300400): dataUrl = "http://qt.gtimg.cn/q=ff_sz%d" % code self.__getStockCashDetail(dataUrl) except Exception as err: self.__logger.error(">>>>>> Exception: " +str(code) + " " + str(err)) finally: None self.__logger.debug("結束:股票現金流收集") '''獲取股票交易行情數據''' def getStockQuotation(self): self.__logger.debug("開始:收集股票交易行情數據") try: #滬市股票 for code in range(600001, 602100): dataUrl = "http://qt.gtimg.cn/q=sh%d" % code self.getStockQuotationDetail(dataUrl) #深市股票 for code in range(1, 1999): dataUrl = "http://qt.gtimg.cn/q=sz%06d" % code self.getStockQuotationDetail(dataUrl) #中小板股票 for code in range(2001, 2999): dataUrl = "http://qt.gtimg.cn/q=sz%06d" % code self.getStockQuotationDetail(dataUrl) #'300001' .. '300400' # 創業板股票 for code in range(300001, 300400): dataUrl = "http://qt.gtimg.cn/q=sz%d" % code self.getStockQuotationDetail(dataUrl) except Exception as err: self.__logger.error(">>>>>> Exception: " +str(code) + " " + str(err)) finally: None self.__logger.debug("結束:收集股票交易行情數據") if __name__ == '__main__': StockTencent(). main()
(7)加入到系統任務計劃中收集盤後數據
(8)收集後的數據可以用以分析了,例如:
求取10月28日主力淨流入最大的股票:select * from stock_cash_tencent where main_net_cash = (select max(main_net_cash) from stock_cash_tencent where date = '20151028' )
原來是“興蓉環境”,當日放量上漲,次日收跌,連續多日有主力資金流入。
excel中做分析:
平安銀行的資金流量分析