Python網絡爬蟲(十一)——csv

簡介

  • 逗號分隔值(Comma-Separated Values,csv),有時也稱爲字符分隔值,因爲分隔字符也可以不是逗號
  • 逗號分隔值文件以純文本形式存儲表格數據
  • CSV 文件由任意數目的記錄組成,記錄間以某種換行符分隔
  • 每條記錄由字段組成,字段間的分隔符是其它字符或字符串,最常見的是逗號或製表符
  • 通常,所有記錄都有完全相同的字段序列
  • CSV 文件格式的通用標準並不存在,但是在 RFC 4180 中有基礎性的描述
  • 使用的字符編碼同樣沒有被指定,但是 bitASCII 是最基本的通用編碼

讀寫文件

使用 csv 模塊主要是爲了讀寫 csv 格式的文件

reader

def reader(iterable, dialect='excel', *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ 
    """
    csv_reader = reader(iterable [, dialect='excel']
                            [optional keyword args])
        for row in csv_reader:
            process(row)
    
    The "iterable" argument can be any object that returns a line
    of input for each iteration, such as a file object or a list.  The
    optional "dialect" parameter is discussed below.  The function
    also accepts optional keyword arguments which override settings
    provided by the dialect.
    
    The returned object is an iterator.  Each iteration returns a row
    of the CSV file (which can span multiple input lines).
    """
    pass

樣本數據:

Sample data
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
import csv

with open('csv_data.txt','r') as fp:
    data = csv.reader(fp)
    title = next(data)
    print(type(title))
    print(title)
    for i in data:
        print(i)

fp.close()

結果爲:

<class 'list'>
['Sample data']
['aaa', 'bbb', 'ccc', 'ddd']
['111', '222', '333', '444']
['+++', '---', '***', '///']

從結果可以看出,使用 reader 返回的是 list。

DictReader

DictReader 是一個類:

class DictReader:
    def __init__(self, f, fieldnames=None, restkey=None, restval=None,
                 dialect="excel", *args, **kwds):
        self._fieldnames = fieldnames   # list of keys for the dict
        self.restkey = restkey          # key to catch long rows
        self.restval = restval          # default value for short rows
        self.reader = reader(f, dialect, *args, **kwds)
        self.dialect = dialect
        self.line_num = 0

樣本數據

first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
import csv

with open('csv_data.txt','r') as fp:
    data = csv.DictReader(fp)
    for i in data:
        print(i['first'],i['second'],i['third'],i['forth'])

fp.close()

結果爲:

aaa bbb ccc ddd
111 222 333 444
+++ --- *** ///

從結果可以看出,使用 DictReader 可以使用字典的形式來輸出數據。

writer

def writer(fileobj, dialect='excel', *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ 
    """
    csv_writer = csv.writer(fileobj [, dialect='excel']
                                [optional keyword args])
        for row in sequence:
            csv_writer.writerow(row)
    
        [or]
    
        csv_writer = csv.writer(fileobj [, dialect='excel']
                                [optional keyword args])
        csv_writer.writerows(rows)
    
    The "fileobj" argument can be any object that supports the file API.
    """
    pass
import csv

title = ['first','second','third','forth']
value = [
    ['aaa','bbb','ccc','ddd'],
    ['111','222','333','444'],
    ['+++','---','***','///']
]

with open('csc_saved.csv','w',newline='') as fp:
    writer = csv.writer(fp)
    writer.writerow(title)
    writer.writerows(value)

fp.close()

結果爲:

first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///

DictWriter

DictWriter 也是一個類:

class DictWriter:
    def __init__(self, f, fieldnames, restval="", extrasaction="raise",
                 dialect="excel", *args, **kwds):
        self.fieldnames = fieldnames    # list of keys for the dict
        self.restval = restval          # for writing short dicts
        if extrasaction.lower() not in ("raise", "ignore"):
            raise ValueError("extrasaction (%s) must be 'raise' or 'ignore'"
                             % extrasaction)
        self.extrasaction = extrasaction
        self.writer = writer(f, dialect, *args, **kwds)

同樣也可以使用 DictWriter 通過字典的形式將數據寫入 csv 格式的文件中。

import csv

title = ['first','second','third','forth']
value = [
    ['aaa','bbb','ccc','ddd'],
    ['111','222','333','444'],
    ['+++','---','***','///']
]

with open('csc_saved.csv','w',newline='') as fp:
    writer = csv.DictWriter(fp,title)
    writer.writerow(dict(zip(title,title)))
    for i in range(len(value)):
        item = dict(zip(title,value[i]))
        writer.writerow(item)

fp.close()

結果爲:

first,second,third,forth
aaa,bbb,ccc,ddd
111,222,333,444
+++,---,***,///
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章