熊貓操作期間的進度指示器 - Progress indicator during pandas operations

問題:

I regularly perform pandas operations on data frames in excess of 15 million or so rows and I'd love to have access to a progress indicator for particular operations.我經常對超過 1500 萬行左右的數據幀執行 Pandas 操作,我很想訪問特定操作的進度指示器。

Does a text based progress indicator for pandas split-apply-combine operations exist?是否存在用於 Pandas split-apply-combine 操作的基於文本的進度指示器?

For example, in something like:例如,在類似:

df_users.groupby(['userID', 'requestDate']).apply(feature_rollup)

where feature_rollup is a somewhat involved function that take many DF columns and creates new user columns through various methods.其中feature_rollup是一個有點feature_rollup函數,它採用許多 DF 列並通過各種方法創建新的用戶列。 These operations can take a while for large data frames so I'd like to know if it is possible to have text based output in an iPython notebook that updates me on the progress.對於大型數據幀,這些操作可能需要一段時間,所以我想知道是否有可能在 iPython 筆記本中具有基於文本的輸出來更新我的進度。

So far, I've tried canonical loop progress indicators for Python but they don't interact with pandas in any meaningful way.到目前爲止,我已經嘗試了 Python 的規範循環進度指示器,但它們並沒有以任何有意義的方式與 Pandas 交互。

I'm hoping there's something I've overlooked in the pandas library/documentation that allows one to know the progress of a split-apply-combine.我希望在 Pandas 庫/文檔中有一些我忽略的東西,可以讓人們知道拆分應用組合的進度。 A simple implementation would maybe look at the total number of data frame subsets upon which the apply function is working and report progress as the completed fraction of those subsets.一個簡單的實現可能會查看apply函數正在運行的數據幀子集的總數,並將進度報告爲這些子集的完成部分。

Is this perhaps something that needs to be added to the library?這可能是需要添加到庫中的東西嗎?


解決方案:

參考一: https://en.stackoom.com/question/1G3Yk
參考二: https://stackoom.com/question/1G3Yk
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章