only integer scalar arrays can be converted to a scalar index

原創

2020-02-23 22:47

在使用StratifiedShuffleSplit進行交叉驗證時，出現上述錯誤。

具體問題發現與解決過程如下：

from sklearn.model_selection import StratifiedShuffleSplit
sss=StratifiedShuffleSplit(n_splits=10,test_size=0.3,train_size=0.7, random_state=42)
for train_index, test_index in sss.split(features, labels):

  X_train, X_test = features[train_index], features[test_index]#訓練集對應的值
  y_train, y_test = labels[train_index], labels[test_index]#類別集對應的值

從文檔中查得StratifiedShuffleSplit使用方式如上，運行程序，該段代碼存在編碼錯誤 UnicodeDecodeError: 'gbk' codec can't decode bytes in position 69-70: illegal multibyte sequence，排查之後發現錯誤行爲：

y_train, y_test = labels[train_index], labels[test_index] #類別集對應的值

爲查找問題，在該行前打印labels[train_index] 出現如題錯誤 TypeError: only integer scalar arrays can be converted to a scalar index，可見編碼錯誤歸結爲數組下標問題。

改爲np.array(labels)[train_index]打印成功

原因可能是新版的numpy需要這樣去使用shuffle，我的train_y是列表，列表元素是array,但是這樣無法使用直接獲取index.

最終修改後可正確運行

from sklearn.model_selection import StratifiedShuffleSpli    
sss=StratifiedShuffleSplit(n_splits=10,test_size=0.3,train_size=0.7, random_state=42)

for train_index, test_index in sss.split(features, labels): print("TRAIN:", train_index, "TEST:", test_index)#獲得索引值 X_train, X_test = features[train_index], features[test_index] #訓練集對應的值 print("labels[train_index]", np.array(labels)[test_index]) y_train, y_test = np.array(labels)[train_index], np.array(labels)[test_index] #類別集對應的值

總結：開始出現編碼錯誤時，我一直在糾結編碼錯誤問題，想着同一個數組，之前也讀取過並未出現解碼問題，生成建模集和驗證集也不應該出現問題。浪費了很多時間。

出現問題時不要一味看表面問題苦惱，逐段、逐行排查尋找根源問題，錯誤就迎刃而解了！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

only integer scalar arrays can be converted to a scalar index

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

機器學習類別不平衡問題的解決方法——待完善

sklearn 支持向量機實踐總結

Python編碼系列問題-（一）

【R語言】必學包之dplyr包

ggplot2學習筆記之標度scale

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結