實例:時間事件日誌分析

時間事件日誌

個人時間統計工具。要點:

  • 使用 dida365.com 來作爲 GTD 工具
  • 使用特殊格式記錄事件類別和花費的時間,練習數據下載
  • 導出數據
  • 分析數據

讀取數據

分析並讀取數據

%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt

from matplotlib.font_manager import FontManager
import subprocess    

# 定義解析函數
def get_support_chinese_font():
    fm = FontManager()
    mat_fonts = set(f.name for f in fm.ttflist)

    output = subprocess.check_output('fc-list :lang=zh -f "%{family}\n"', shell=True)
    print '*' * 10, '系統可用的中文字體', '*' * 10
    print output
    zh_fonts = set(f.split(',', 1)[0] for f in output.split('\n'))
    available = mat_fonts & zh_fonts

    print '*' * 10, '可用的中文字體', '*' * 10
    for f in available:
        print f
    return available

from matplotlib.pylab import mpl

mpl.rcParams['font.sans-serif'] = ['Arial Unicode MS'] # 指定默認字體
mpl.rcParams['axes.unicode_minus'] = False # 解決保存圖像是負號'-'顯示爲方塊的問題
# 定義時間解析函數
def _date_parser(dstr):
    return pd.Timestamp(dstr).date()

data = pd.read_csv('data/dida365.csv', header=3, index_col='Due Date', parse_dates=True, date_parser=_date_parser)
data.head()
List Name Title Content Is Checklist Reminder Repeat Priority Status Completed Time Order Timezone Is All Day
Due Date
2016-05-24 自我成長 [編程] javascript exercism [1h] NaN N NaN NaN 0 2 2016-05-25T14:15:10+0000 -235295488344064 Asia/Shanghai True
2016-05-23 自我成長 [編程] javascript exercism [0.5h] NaN N NaN NaN 0 2 2016-05-24T15:59:08+0000 -234195976716288 Asia/Shanghai True
2016-05-23 自我成長 [編程] clojure ring request [2h] 閱讀 ring.util.request 源碼\r N NaN NaN 0 2 2016-05-24T15:58:56+0000 -233096465088512 Asia/Shanghai True
2016-05-22 自我成長 [編程] clojure ring 入門 [30m] NaN N NaN NaN 0 2 2016-05-23T15:03:24+0000 -231996953460736 Asia/Shanghai True
2016-05-22 自我成長 [探索發現] 體驗 iMac 開發環境 [3h] iMac 的屏幕體驗很棒,但使用非SSD硬盤速度上和mpb想着非常多。\r N NaN NaN 0 2 2016-05-23T14:33:35+0000 -230897441832960 Asia/Shanghai True

數據清洗

  • 只關心己完成或己達成的事件,即 status != 0 的事件
  • 只需要 List NameTitle 字段
df = data[data['Status'] != 0].loc[:, ['List Name', 'Title']]
df.head()
List Name Title
Due Date
2016-05-24 自我成長 [編程] javascript exercism [1h]
2016-05-23 自我成長 [編程] javascript exercism [0.5h]
2016-05-23 自我成長 [編程] clojure ring request [2h]
2016-05-22 自我成長 [編程] clojure ring 入門 [30m]
2016-05-22 自我成長 [探索發現] 體驗 iMac 開發環境 [3h]

數據解析

解析事件類別和和花費的時間

import re

# 定義標籤解析函數
def parse_tag(value):
    m = re.match(r'^(\[(.*?)\])?.*$', value)
    if m and m.group(2):
        return m.group(2)
    else:
        return '其他'

# 定義時間解析函數
def parse_duration(value):
    m = re.match(r'^.+?\[(.*?)([hm]?)\]$', value)
    if m:
        dur = 0
        try:
            dur = float(m.group(1))
        except e:
            print('parse duration error: \n%s' % e)
        if m.group(2) == 'm':
            dur = dur / 60.0
        return dur
    else:
        return 0
    
titles = df['Title']
df['Tag'] = titles.map(parse_tag)
df['Duration'] = titles.map(parse_duration)
df.head()
List Name Title Tag Duration
Due Date
2016-05-24 自我成長 [編程] javascript exercism [1h] 編程 1.0
2016-05-23 自我成長 [編程] javascript exercism [0.5h] 編程 0.5
2016-05-23 自我成長 [編程] clojure ring request [2h] 編程 2.0
2016-05-22 自我成長 [編程] clojure ring 入門 [30m] 編程 0.5
2016-05-22 自我成長 [探索發現] 體驗 iMac 開發環境 [3h] 探索發現 3.0
df.count()

[Out:]
List Name    232
Title        232
Tag          232
Duration     232
dtype: int64
# 數據起始時間
start_date = df.index.min().date()
start_date

[Out:]
datetime.date(2015, 12, 2)
# 截止時間
end_date = df.index.max().date()
end_date

[Out:]
datetime.date(2016, 5, 24)

數據分析

時間總覽

平均每天投資在自己身上的時間是多少?-> 全部時間 / 總天數

end_date - start_date

[Out:]
datetime.timedelta(174)
df['Duration'].sum() 

[Out:]
482.19999999999999
df['Duration'].sum() / (end_date - start_date).days
2.7712643678160918

精力分配

tag_list = df.groupby(['Tag']).sum()
tag_list
Duration
Tag
寫作 49.0
探索發現 54.5
機器學習 33.5
電影 50.8
編程 243.4
閱讀 51.0
tag_list['Duration'].plot(kind='pie', figsize=(8, 8), fontsize=16, autopct='%1.2f%%')

在這裏插入圖片描述

專注力

長時間學習某項技能的能力

programming = df[df['Tag'] == '編程']
programming.head()
List Name Title Tag Duration
Due Date
2016-05-24 自我成長 [編程] javascript exercism [1h] 編程 1.0
2016-05-23 自我成長 [編程] javascript exercism [0.5h] 編程 0.5
2016-05-23 自我成長 [編程] clojure ring request [2h] 編程 2.0
2016-05-22 自我成長 [編程] clojure ring 入門 [30m] 編程 0.5
2016-05-22 自我成長 [編程] javascript exercism [0.5h] 編程 0.5
programming.resample('m', how='sum').to_period(freq='m').plot(kind='bar', figsize=(8, 8), fontsize=16)

在這裏插入圖片描述

連續時間的精力分配

以時間爲橫軸,查看精力分配。

# 爲什麼不直接使用 df.pivot()? 因爲有重複的行索引,如 2016-05-23
date_tags = df.reset_index().groupby(['Due Date', 'Tag']).sum()
date_tags
Duration
Due Date Tag
2015-12-02 寫作 3.0
2015-12-04 閱讀 3.0
2015-12-06 寫作 4.0
機器學習 3.0
2015-12-07 寫作 1.0
2015-12-08 機器學習 1.0
編程 4.0
2015-12-09 寫作 4.0
2015-12-10 探索發現 0.5
編程 5.5
2015-12-11 寫作 1.5
編程 4.0
閱讀 4.0
2015-12-12 寫作 2.0
機器學習 1.5
2015-12-13 編程 6.0
2015-12-14 閱讀 1.0
2015-12-15 機器學習 2.5
閱讀 1.0
2015-12-16 探索發現 1.0
機器學習 1.5
編程 3.0
閱讀 1.0
2015-12-17 機器學習 2.0
2015-12-18 寫作 1.5
機器學習 1.0
編程 3.0
2015-12-19 探索發現 7.0
閱讀 0.5
2015-12-20 寫作 1.0
... ... ...
2016-04-24 編程 3.5
2016-04-25 編程 3.0
2016-04-26 編程 3.0
2016-04-29 編程 2.0
2016-04-30 編程 2.0
2016-05-01 編程 3.0
2016-05-02 編程 2.0
2016-05-03 編程 2.0
2016-05-04 編程 3.0
2016-05-05 編程 4.0
2016-05-06 編程 4.0
2016-05-07 編程 4.0
2016-05-08 編程 4.0
2016-05-09 編程 4.0
2016-05-10 編程 4.0
2016-05-11 編程 2.0
2016-05-12 編程 3.0
2016-05-13 探索發現 1.0
編程 3.0
2016-05-14 探索發現 1.0
編程 5.0
2016-05-15 編程 1.0
2016-05-17 編程 3.0
2016-05-18 編程 2.0
2016-05-19 編程 1.0
2016-05-20 編程 4.0
2016-05-22 探索發現 3.0
編程 1.0
2016-05-23 編程 2.5
2016-05-24 編程 1.0
# 以 tag 作爲列索引
dates = date_tags.reset_index().pivot(index='Due Date', columns='Tag', values='Duration')
dates
Tag 寫作 探索發現 機器學習 電影 編程 閱讀
Due Date
2015-12-02 3.0 NaN NaN NaN NaN NaN
2015-12-04 NaN NaN NaN NaN NaN 3.0
2015-12-06 4.0 NaN 3.0 NaN NaN NaN
2015-12-07 1.0 NaN NaN NaN NaN NaN
2015-12-08 NaN NaN 1.0 NaN 4.0 NaN
2015-12-09 4.0 NaN NaN NaN NaN NaN
2015-12-10 NaN 0.5 NaN NaN 5.5 NaN
2015-12-11 1.5 NaN NaN NaN 4.0 4.0
2015-12-12 2.0 NaN 1.5 NaN NaN NaN
2015-12-13 NaN NaN NaN NaN 6.0 NaN
2015-12-14 NaN NaN NaN NaN NaN 1.0
2015-12-15 NaN NaN 2.5 NaN NaN 1.0
2015-12-16 NaN 1.0 1.5 NaN 3.0 1.0
2015-12-17 NaN NaN 2.0 NaN NaN NaN
2015-12-18 1.5 NaN 1.0 NaN 3.0 NaN
2015-12-19 NaN 7.0 NaN NaN NaN 0.5
2015-12-20 1.0 4.0 NaN NaN NaN NaN
2015-12-21 NaN NaN NaN NaN NaN 0.5
2015-12-22 NaN 2.0 NaN NaN 8.0 NaN
2015-12-23 NaN 1.0 NaN NaN NaN NaN
2015-12-24 NaN NaN NaN NaN NaN 0.5
2015-12-25 2.0 NaN NaN NaN NaN 1.5
2015-12-26 NaN NaN NaN NaN 2.0 1.0
2015-12-29 NaN NaN NaN NaN NaN 2.0
2015-12-30 NaN NaN NaN NaN NaN 1.0
2016-01-01 NaN NaN NaN NaN NaN 5.0
2016-01-02 NaN NaN NaN NaN 2.0 2.0
2016-01-03 NaN NaN NaN NaN 3.5 NaN
2016-01-04 NaN NaN NaN NaN 6.5 NaN
2016-01-05 2.0 2.0 NaN NaN NaN NaN
... ... ... ... ... ... ...
2016-04-21 NaN 2.0 NaN NaN 5.0 NaN
2016-04-22 NaN NaN NaN NaN 6.0 2.0
2016-04-23 NaN NaN NaN NaN 3.0 NaN
2016-04-24 NaN NaN NaN NaN 3.5 NaN
2016-04-25 NaN NaN NaN NaN 3.0 NaN
2016-04-26 NaN NaN NaN NaN 3.0 NaN
2016-04-29 NaN NaN NaN NaN 2.0 NaN
2016-04-30 NaN NaN NaN NaN 2.0 NaN
2016-05-01 NaN NaN NaN NaN 3.0 NaN
2016-05-02 NaN NaN NaN NaN 2.0 NaN
2016-05-03 NaN NaN NaN NaN 2.0 NaN
2016-05-04 NaN NaN NaN NaN 3.0 NaN
2016-05-05 NaN NaN NaN NaN 4.0 NaN
2016-05-06 NaN NaN NaN NaN 4.0 NaN
2016-05-07 NaN NaN NaN NaN 4.0 NaN
2016-05-08 NaN NaN NaN NaN 4.0 NaN
2016-05-09 NaN NaN NaN NaN 4.0 NaN
2016-05-10 NaN NaN NaN NaN 4.0 NaN
2016-05-11 NaN NaN NaN NaN 2.0 NaN
2016-05-12 NaN NaN NaN NaN 3.0 NaN
2016-05-13 NaN 1.0 NaN NaN 3.0 NaN
2016-05-14 NaN 1.0 NaN NaN 5.0 NaN
2016-05-15 NaN NaN NaN NaN 1.0 NaN
2016-05-17 NaN NaN NaN NaN 3.0 NaN
2016-05-18 NaN NaN NaN NaN 2.0 NaN
2016-05-19 NaN NaN NaN NaN 1.0 NaN
2016-05-20 NaN NaN NaN NaN 4.0 NaN
2016-05-22 NaN 3.0 NaN NaN 1.0 NaN
2016-05-23 NaN NaN NaN NaN 2.5 NaN
2016-05-24 NaN NaN NaN NaN 1.0 NaN
# 補足連續時間,可以看到哪些天沒有在學習
full_dates = dates.reindex(pd.date_range(start_date, end_date)).fillna(0)
full_dates
Tag 寫作 探索發現 機器學習 電影 編程 閱讀
2015-12-02 3.0 0.0 0.0 0 0.0 0.0
2015-12-03 0.0 0.0 0.0 0 0.0 0.0
2015-12-04 0.0 0.0 0.0 0 0.0 3.0
2015-12-05 0.0 0.0 0.0 0 0.0 0.0
2015-12-06 4.0 0.0 3.0 0 0.0 0.0
2015-12-07 1.0 0.0 0.0 0 0.0 0.0
2015-12-08 0.0 0.0 1.0 0 4.0 0.0
2015-12-09 4.0 0.0 0.0 0 0.0 0.0
2015-12-10 0.0 0.5 0.0 0 5.5 0.0
2015-12-11 1.5 0.0 0.0 0 4.0 4.0
2015-12-12 2.0 0.0 1.5 0 0.0 0.0
2015-12-13 0.0 0.0 0.0 0 6.0 0.0
2015-12-14 0.0 0.0 0.0 0 0.0 1.0
2015-12-15 0.0 0.0 2.5 0 0.0 1.0
2015-12-16 0.0 1.0 1.5 0 3.0 1.0
2015-12-17 0.0 0.0 2.0 0 0.0 0.0
2015-12-18 1.5 0.0 1.0 0 3.0 0.0
2015-12-19 0.0 7.0 0.0 0 0.0 0.5
2015-12-20 1.0 4.0 0.0 0 0.0 0.0
2015-12-21 0.0 0.0 0.0 0 0.0 0.5
2015-12-22 0.0 2.0 0.0 0 8.0 0.0
2015-12-23 0.0 1.0 0.0 0 0.0 0.0
2015-12-24 0.0 0.0 0.0 0 0.0 0.5
2015-12-25 2.0 0.0 0.0 0 0.0 1.5
2015-12-26 0.0 0.0 0.0 0 2.0 1.0
2015-12-27 0.0 0.0 0.0 0 0.0 0.0
2015-12-28 0.0 0.0 0.0 0 0.0 0.0
2015-12-29 0.0 0.0 0.0 0 0.0 2.0
2015-12-30 0.0 0.0 0.0 0 0.0 1.0
2015-12-31 0.0 0.0 0.0 0 0.0 0.0
... ... ... ... ... ... ...
2016-04-25 0.0 0.0 0.0 0 3.0 0.0
2016-04-26 0.0 0.0 0.0 0 3.0 0.0
2016-04-27 0.0 0.0 0.0 0 0.0 0.0
2016-04-28 0.0 0.0 0.0 0 0.0 0.0
2016-04-29 0.0 0.0 0.0 0 2.0 0.0
2016-04-30 0.0 0.0 0.0 0 2.0 0.0
2016-05-01 0.0 0.0 0.0 0 3.0 0.0
2016-05-02 0.0 0.0 0.0 0 2.0 0.0
2016-05-03 0.0 0.0 0.0 0 2.0 0.0
2016-05-04 0.0 0.0 0.0 0 3.0 0.0
2016-05-05 0.0 0.0 0.0 0 4.0 0.0
2016-05-06 0.0 0.0 0.0 0 4.0 0.0
2016-05-07 0.0 0.0 0.0 0 4.0 0.0
2016-05-08 0.0 0.0 0.0 0 4.0 0.0
2016-05-09 0.0 0.0 0.0 0 4.0 0.0
2016-05-10 0.0 0.0 0.0 0 4.0 0.0
2016-05-11 0.0 0.0 0.0 0 2.0 0.0
2016-05-12 0.0 0.0 0.0 0 3.0 0.0
2016-05-13 0.0 1.0 0.0 0 3.0 0.0
2016-05-14 0.0 1.0 0.0 0 5.0 0.0
2016-05-15 0.0 0.0 0.0 0 1.0 0.0
2016-05-16 0.0 0.0 0.0 0 0.0 0.0
2016-05-17 0.0 0.0 0.0 0 3.0 0.0
2016-05-18 0.0 0.0 0.0 0 2.0 0.0
2016-05-19 0.0 0.0 0.0 0 1.0 0.0
2016-05-20 0.0 0.0 0.0 0 4.0 0.0
2016-05-21 0.0 0.0 0.0 0 0.0 0.0
2016-05-22 0.0 3.0 0.0 0 1.0 0.0
2016-05-23 0.0 0.0 0.0 0 2.5 0.0
2016-05-24 0.0 0.0 0.0 0 1.0 0.0
# 畫出柱狀圖
full_dates.plot(kind='bar', stacked=True, figsize=(16, 8))

在這裏插入圖片描述

full_dates.resample('m', how='sum').to_period('m').plot(kind='bar', stacked=True, figsize=(8, 8))

在這裏插入圖片描述

發佈了44 篇原創文章 · 獲贊 2 · 訪問量 1萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章