python 單步調試os.walk以理解topdown

python 中遍歷文件夾一般用如下代碼:

    import os
    from os.path import join, getsize
    for root, dirs, files in os.walk('python/Lib/email'):
        print root, "consumes",
        print sum([getsize(join(root, name)) for name in files]),
        print "bytes in", len(files), "non-directory files"
        if 'CVS' in dirs:
            dirs.remove('CVS')  # don't visit CVS directories

root是最外層文件夾名,dirs是該root文件夾下的所有子文件夾,files是該root文件夾下的所有文件。


今天看源碼的時候,有點懵逼,因爲用到了 生成器yield 和 遞歸 

def walk(top, topdown=True, οnerrοr=None, followlinks=False):
    
    import pdb             # 這兩行是博主自己加的,目的是開啓單步調試。 
    pdb.set_trace()        # n:下一步, p xxx:觀察xxx , l:查看所在代碼行  
    
    islink, join, isdir = path.islink, path.join, path.isdir


    try:
        # Note that listdir and error are globals in this module due
        # to earlier import-*.
        names = listdir(top)
    except error, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        new_path = join(top, name)
        if followlinks or not islink(new_path):
            for x in walk(new_path, topdown, onerror, followlinks):
                yield x
    if not topdown:
        yield top, dirs, nondirs

博主的疑問主要在於,topdown這個參數:

When topdown is true, the caller can modify the dirnames list in-place
    (e.g., via del or slice assignment), and walk will only recurse into the
    subdirectories whose names remain in dirnames; this can be used to prune the
    search, or to impose a specific order of visiting. 

看文檔topdown = True的時候,可以原地修改文件夾們,然後只會遞歸那些還留着的文件夾,可以減少查詢次數??

好吧,看得我一愣一愣的,什麼鬼嘛,只好單步下看看。


    try:
        # Note that listdir and error are globals in this module due
        # to earlier import-*.
        names = listdir(top)
    except error, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)
這一段講的是把文件夾root下的子文件夾dirs 和 文件nondirs 分別找出來。沒啥難度。


我的目錄如下:

E:\projects\myApp_emits\myApp
E:\projects\myApp_emits\myApp\a.jnt
E:\projects\myApp_emits\myApp\b
E:\projects\myApp_emits\myApp\b\c.txt

我的調用函數如下:

import os
des_folder = 'e:/projects/myApp_emits/myApp'
a = os.walk(des_folder, topdown=True)

parent, dir, files = a.next()
print parent, dir, files

parent, dir, files = a.next()
print parent, dir, files


接下來這段先看topdown=True的情況,源代碼簡化爲:

    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        new_path = join(top, name)
        if followlinks or not islink(new_path):
            for x in walk(new_path, topdown, onerror, followlinks):
                yield x

稍微解釋下:

因爲os.walk這個函數帶有yield,那麼它就不再是函數啦,是個生成器,記爲a, 不斷得a.next() 就可以不斷返回yield後面的參數。

eg: yield top, dirs, nondirs ,那麼每次a.next() 就會返回top, dirs, nondirs, 然後整個生成器掛起,直到下一個next() 觸發,從yield top, dirs, nondirs這一句後的下一句繼續執行,直到再次遇到yield,若沒有遇到就結束啦。(奇怪,博主怎麼來了一波yield講解。。)


所以按照我們的代碼結果如下:

e:/projects/myApp_emits/myApp ['b'] ['a.jnt']
e:/projects/myApp_emits/myApp\b [] ['c.txt']


再來看topdown=False的情況,源代碼簡化爲:

    for name in dirs:
        new_path = join(top, name)
        if followlinks or not islink(new_path):
            for x in walk(new_path, topdown, onerror, followlinks):
                yield x
    if not topdown:
        yield top, dirs, nondirs


結果如下:

e:/projects/myApp_emits/myApp\b [] ['c.txt']
e:/projects/myApp_emits/myApp ['b'] ['a.jnt']


結論:

對比結果我們能知道,topdown參數其實作用很簡單,True則先掃頂級目錄,False則從子目錄開掃,最後再掃頂級目錄。


Ps:

單步遇到遞歸要慢點,不然容易暈,這個例子還算好的,不暈,看tornado那個yield+裝飾器,分分鐘讓你迷失在人生道路。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章