今天要從同事發給我的一個文件中統計一些數字,一看還是數據庫文件,以DBF結尾,近1個G呢。電腦上也沒裝ACCESS等數據庫管理軟件。後來找了個DBF閱讀器,發現雖然能打開,但是篩選什麼的不方便,也不好導出EXCEL表!怎麼搞?當前是看看python能不能幫忙哈!
網上找了很多python關於dbf文件操作的項目。最後還是發現用dbfread比較方便。項目地址:https://github.com/zycool/dbfread
一、安裝
pip install dbfread
二、打開一個DBF文件
>>> from dbfread import DBF >>> table = DBF('people.dbf')
>>> for record in table: ... print(record) OrderedDict([('NAME', 'Alice'), ('BIRTHDATE', datetime.date(1987, 3, 1))]) OrderedDict([('NAME', 'Bob'), ('BIRTHDATE', datetime.date(1980, 11, 12))])
>>> len(table) 2
>>> for record in table.deleted: ... print(record) OrderedDict([('NAME', 'Deleted Guy'), ('BIRTHDATE', datetime.date(1979, 12, 22))]) >>> len(table.deleted)
二、一盤情況下,一次只緩存一條數據在內存中。其它的都是直接從硬盤中讀取。但是可以通過參數 load=True
,來進行全部加載。這種加載是隨機的:
>>> table = DBF('people.dbf', load=True)
>>> print(table.records[1]['NAME'])
Bob
>>> print(table.records[0]['NAME'])
Alice
table.load()
.
This is useful when you want to look at the header before you commit to loading anything. For example, you can make a function which returns a list of tables in a directory and load only the ones you need.
If you just want a list of records and you don’t care about the other table attributes you can do:
>>> records = list(DBF('people.dbf'))
You can unload records again with table.unload()
.
If the table is not loaded, the records
and deleted
attributes
return RecordIterator
objects.
Loading or iterating over records will open the DBF and memo file once for each iteration. This means the DBF
object
doesn’t hold any files open, only the RecordIterator
object
does.
Character Encodings
All text fields and memos (except binary ones) will be returned as unicode strings.
dbfread will try to detect the character encoding (code page) used in the file by looking at the language_driver
byte.
If this fails it reverts to ASCII. You can override this by passingencoding='my-encoding'
.
The encoding is available in the encoding
attribute.
Memo Files
If there is at least one memo field in the file dbfread will look for the corresponding memo file. For buildings.dbf
this
would be buildings.fpt
(for
Visual FoxPro) or buildings.dbt
(for
other databases).
Since the Windows file system is case preserving, the file names may end up mixed case. For example, you could have:
Buildings.dbf BUILDINGS.DBT
This creates problems in Linux, where file names are case sensitive. dbfread gets around this by ignoring case in file names. You can turn this off by passing ignorecase=False
.
If the memo file is missing you will get a MissingMemoFile
exception.
If you still want the rest of the data you can pass ignore_missing_memofile=True
.
All memo field values will now be returned as None
,
as would be the case if there was no memo.
dbfread has full support for Visual FoxPro (.FPT
)
and dBase III (.DBT
)
memo files. It reads dBase IV (also .DBT
)
memo files, but only if they use the default block size of 512 bytes. (This will be fixed if I can find more files to study.)
Record Factories
If you don’t want records returned as collections.OrderedDict
you
can use the recfactory
argument
to provide your own record factory.
A record factory is a function that takes a list of (name, value)
pairs
and returns a record. You can do whatever you like with this data. Here’s a function that creates a record object with fields as attributes:
class Record(object):
def __init__(self, items):
for (name, value) in items:
setattr(self, name, value)
for record in DBF('people.dbf', recfactory=Record, lowernames=True):
print(record.name, record.birthdate)
If you pass recfactory=None
you
will get the original (name, value)
list.
(This is a shortcut for recfactory=lambda items: items
.)
Custom Field Types
If the included message types are not enough you can add your own by subclassing FieldParser
.
As a silly example, here how you can read text (C
)
fields in reverse:
from dbfread import DBF, FieldParser
class MyFieldParser(FieldParser):
def parseC(self, field, data):
# Return strings reversed.
return data.rstrip(' 0').decode()[::-1]
for record in DBF('files/people.dbf', parserclass=MyFieldParser):
print(record['NAME'])
and here’s how you can return invalid values as InvalidValue
instead
of raising ValueError
:
from dbfread import DBF, FieldParser, InvalidValue
class MyFieldParser(FieldParser):
def parse(self, field, data):
try:
return FieldParser.parse(self, field, data)
except ValueError:
return InvalidValue(data)
table = DBF('invalid_value.dbf', parserclass=MyFieldParser):
for i, record in enumerate(table):
for name, value in record.items():
if isinstance(value, InvalidValue):
print('records[{}][{!r}] == {!r}'.format(i, name, value))
This will print:
records[0][u'BIRTHDATE'] == InvalidValue(b'NotAYear')