【量化】4天學會python機器學習與量化交易-筆記4(p21~p25)

原創

2020-07-02 15:02

文章目錄

平臺：https://www.ricequant.com/quant
api1：https://www.ricequant.com/doc/rqdata-institutional#research-API-get_fundamentals
api2：https://www.ricequant.com/doc/api/python/chn#wizard-stock
rice quant ipynb

p21 因子數據的標準化處理

視頻：https://www.bilibili.com/video/av55456917?p=21

# 2，標準化處理
from sklearn.preprocessing import StandardScaler

std = StandardScaler()
std.fit_transform(fund['pe_ratio_3md'])
# (我的爲什麼報錯。。)

def stand(factor):
    '''自實現標準化'''
    mean = factor.mean()
    std = factor.std()
    
    return (factor - mean)/std

fund['pe_ratio_stand'] = stand(fund['pe_ratio_3md'])
fund

結果：

p22 市值中心化處理介紹

視頻：https://www.bilibili.com/video/av55456917?p=22

1，市值中心化處理

防止得到的股票比較集中。（原因：默認大部分因子都包含了市值的影響。）
去除其他的因子存在的市值的影響。
2，迴歸法進行去除
建立某因子跟市值之間的一個迴歸方程，得出係數
最終預測的結果與因子之間的差值就是不受影響的那部分

p23 案例：市值中性化實現以及回測選股結果

視頻：https://www.bilibili.com/video/av55456917?p=23

# 1，獲取數據
q = query(fundamentals.eod_derivative_indicator.pb_ratio,
         fundamentals.eod_derivative_indicator.market_cap)

fund = get_fundamentals(q, entry_date='2018-01-03')[:, '2018-01-03', :]

#fund[:3]

# 2，對因子數據進行處理，默認使用3倍中位數
fund['pb_ratio'] = mad(fund['pb_ratio'])
fund['market_cap'] = mad(fund['market_cap'])
# 對於市值因子可以選擇處理

# 3,確定建立h迴歸方程特徵值和目標值
# 傳入特徵值需要二維
x = fund['market_cap'].values.reshape(-1,1) # 注意加上.values
y = fund['pb_ratio']

# 4，利用線性迴歸預測
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(x, y)
print(lr.coef_, lr.intercept_)

# 5，得出每個預測值，讓因子的真實值-預測值，得出的誤差就是我們中性化處理之後的結果
y_predict = lr.predict(x)

res = y - y_predict
fund['pb_ratio'] = res

1，中心化處理：

原因：防止回測時選股集中
原理：建立迴歸關係

2，市值中心化選股對比

市值中心化處理：定期的分散到不同的股票
沒有市值中心化處理：選股比較集中

p24 市值中心化結果總結分析

視頻：https://www.bilibili.com/video/av55456917?p=24

p25 總結

完整代碼：

from sklearn.linear_model import LinearRegression

def init(context):
    scheduler.run_monthly(get_data, tradingday=1)

def get_data(context, bar_dict):
    # 查詢兩個因子的數據結果
    fund = get_fundamentals(
            query(fundamentals.eod_derivative_indicator.pb_ratio,
                fundamentals.eod_derivative_indicator.market_cap))
    context.fund = fund.T

    # 進行因子數據的處理，去極值、標準化、市值中心化
    treat_data(context)

    # 利用市淨率進行選股（市淨率小的好）
    # 選出20%的分位數，把小於它的股票保存下來
    context.stock_list = context.fund['pb_ratio'][context.fund['pb_ratio'] <=context.fund['pb_ratio'].quantile(0.2)].index

def treat_data(context):
    # 市淨率因子數據的處理邏輯

    # 對市淨率去極值標準化
    context.fund['pb_ratio'] = mad(context.fund.pb_ratio)
    context.fund['pb_ratio'] = stand(context.fund['pb_ratio'])

    # 選股的處理，對市淨率進行市值中心化
    # 特徵值：市值
    # 目標值：市淨率因子
    x = context.fund['market_cap'].reshape(-1,1)
    y = context.fund['pb_ratio']

    # 建立線性迴歸，中心化處理
    lr = LinearRegression()
    lr.fit(x, y)

    y_predit = lr.predict(x)

    context.fund['pb_ratio'] = y-y_predit



# before_trading此函數會在每天策略交易開始前被調用，當天只會被調用一次
def before_trading(context):
    pass

def handle_bar(context, bar_dict):
    pass

# after_trading函數會在每天交易結束後被調用，當天只會被調用一次
def after_trading(context):
    pass

import numpy as np

def mad(factor):
    '''中位數絕對偏差去極值'''
    # 1,找出中位數
    me = np.median(factor)
    
    # 2,得到每個因子值與中位數的絕對偏差值 |x-median|
    # 3，得到絕對偏差的中位數mad = median(|x-median|)
    mad = np.median(abs(factor - me))
    
    # 4，計算MAD_e = 1.4826*MAD，然後確定參數n，做出調整
    # n取3，表示3倍中位數去極值
    # 求出3倍中位數的上下限
    up = me + (3* 1.4826* mad)
    down = me - (3* 1.4826* mad)
    
    # 利用上下限去極值
    factor = np.where(factor>up, up, factor)
    factor = np.where(factor<down, down, factor)
    
    return factor

def stand(factor):
    '''自實現標準化'''
    mean = factor.mean()
    std = factor.std()
    
    return (factor - mean)/std

未完待續

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【量化】4天學會python機器學習與量化交易-筆記4(p21~p25)

文章目錄

p21 因子數據的標準化處理

p22 市值中心化處理介紹

p23 案例：市值中性化實現以及回測選股結果

p24 市值中心化結果總結分析

p25 總結

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

中文詞向量的下載與使用探索 (tensorflow加載詞向量)

Windows10安裝Rtools [+解決system('g++ -v' 127錯誤]

A20.從零開始前後端react+flask - 查找數據

【金融】技術指標計算-筆記

【量化】4天學會python機器學習與量化交易-筆記3(p16~p20)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結