簡介
MySQL 是一種關係型數據庫管理系統,關係數據庫將數據保存在不同的表中,而不是將所有數據放在一個大倉庫內,這樣就增加了速度並提高了靈活性。
GUI
有時候使用 cmd 進行數據庫操作不太方便,因此會藉助於 GUI 來進行數據庫的編輯和修改,常見用於 MySQL 的 GUI 工具有 workbench 和 navicat,前者是官方出的 GUI。
python 庫
如果我們想要將爬取到的數據以數據庫的形式存儲時,就需要 python 和 MySQL 的連接。常見的驅動 MySQL 的 python 庫有 mysqldb,mysqlclient,pymysql 等。按照自己 python 的安裝版本自行選擇合適的即可。
數據庫連接
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
cursor = db.cursor()
for i in range(cursor.execute('show databases')):
print(cursor.fetchone())
db.close()
結果爲:
('information_schema',)
('mysql',)
('performance_schema',)
('pymysql',)
('sys',)
數據操作
數據插入
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
command = """
insert into sheet1(id,name) value(105,'qianduo')
"""
cursor = db.cursor()
for i in range(cursor.execute('select * from sheet1')):
print(cursor.fetchone())
cursor.execute(command)
db.commit()
for i in range(cursor.execute('select * from sheet1')):
print(cursor.fetchone())
db.close()
結果爲:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
除了上面提到的執行命令的方式,也可以將參數放到命令外部執行:
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
# 此時 value 中的參數類型應該都是 %s 而不管是否真的是 str,內部會進行處理
command = """
insert into sheet1(id,name) value(%s,%s)
"""
cursor = db.cursor()
cursor.execute(command,(106,'sunjie'))
db.commit()
for i in range(cursor.execute('select * from sheet1')):
print(cursor.fetchone())
db.close()
結果爲:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
數據獲取
- fetchone:每次獲取一條數據
- fetchall:接受全部的返回結果
- fetchmany:獲取指定條數的數據
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
command = """
select * from sheet1
"""
cursor = db.cursor()
for i in range(cursor.execute(command)):
print(cursor.fetchone())
print("***************")
cursor.execute(command)
print(cursor.fetchall())
print("***************")
cursor.execute(command)
print(cursor.fetchmany(3))
db.close()
結果爲:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
***************
((101, 'zhangsan'), (102, 'lisi'), (103, 'wangwu'), (104, 'zhaoliu'), (105, 'qianduo'), (106, 'sunjie'))
***************
((101, 'zhangsan'), (102, 'lisi'), (103, 'wangwu'))
數據刪除
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='2602388671',
database='pymysql',
port=3306
)
delete_command = """
delete from sheet1 where id=101
"""
select_command = """
select * from sheet1
"""
cursor = db.cursor()
for i in range(cursor.execute(select_command)):
print(cursor.fetchone())
print('*********************')
cursor.execute(delete_command)
for i in range(cursor.execute(select_command)):
print(cursor.fetchone())
db.commit()
db.close()
結果爲:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
*********************
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
數據更新
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='2602388671',
database='pymysql',
port=3306
)
command = """
update sheet1 set name='gaojie' where id=102
"""
select_command = """
select * from sheet1
"""
cursor = db.cursor()
cursor.execute(command)
for i in range(cursor.execute(select_command)):
print(cursor.fetchone())
db.commit()
db.close()
操作步驟
- 首先利用 connect 創建一個 Connection 類對象,進行數據庫連接:
class Connection(object):
"""
Representation of a socket with a mysql server.
The proper way to get an instance of this class is to call
connect().
Establish a connection to the MySQL database. Accepts several
arguments:
:param host: Host where the database server is located
:param user: Username to log in as
:param password: Password to use.
:param database: Database to use, None to not use a particular one.
:param port: MySQL port to use, default is usually OK. (default: 3306)
:param bind_address: When the client has multiple network interfaces, specify
the interface from which to connect to the host. Argument can be
a hostname or an IP address.
:param unix_socket: Optionally, you can use a unix socket rather than TCP/IP.
:param read_timeout: The timeout for reading from the connection in seconds (default: None - no timeout)
:param write_timeout: The timeout for writing to the connection in seconds (default: None - no timeout)
:param charset: Charset you want to use.
:param sql_mode: Default SQL_MODE to use.
:param read_default_file:
Specifies my.cnf file to read these parameters from under the [client] section.
:param conv:
Conversion dictionary to use instead of the default one.
This is used to provide custom marshalling and unmarshaling of types.
See converters.
:param use_unicode:
Whether or not to default to unicode strings.
This option defaults to true for Py3k.
:param client_flag: Custom flags to send to MySQL. Find potential values in constants.CLIENT.
:param cursorclass: Custom cursor class to use.
:param init_command: Initial SQL statement to run when connection is established.
:param connect_timeout: Timeout before throwing an exception when connecting.
(default: 10, min: 1, max: 31536000)
:param ssl:
A dict of arguments similar to mysql_ssl_set()'s parameters.
:param read_default_group: Group to read from in the configuration file.
:param compress: Not supported
:param named_pipe: Not supported
:param autocommit: Autocommit mode. None means use server default. (default: False)
:param local_infile: Boolean to enable the use of LOAD DATA LOCAL command. (default: False)
:param max_allowed_packet: Max size of packet sent to server in bytes. (default: 16MB)
Only used to limit size of "LOAD LOCAL INFILE" data packet smaller than default (16KB).
:param defer_connect: Don't explicitly connect on contruction - wait for connect call.
(default: False)
:param auth_plugin_map: A dict of plugin names to a class that processes that plugin.
The class will take the Connection object as the argument to the constructor.
The class needs an authenticate method taking an authentication packet as
an argument. For the dialog plugin, a prompt(echo, prompt) method can be used
(if no authenticate method) for returning a string from the user. (experimental)
:param server_public_key: SHA256 authenticaiton plugin public key value. (default: None)
:param db: Alias for database. (for compatibility to MySQLdb)
:param passwd: Alias for password. (for compatibility to MySQLdb)
:param binary_prefix: Add _binary prefix on bytes and bytearray. (default: False)
See `Connection <https://www.python.org/dev/peps/pep-0249/#connection-objects>`_ in the
specification.
"""
_sock = None
_auth_plugin_name = ''
_closed = False
_secure = False
def __init__(self, host=None, user=None, password="",
database=None, port=0, unix_socket=None,
charset='', sql_mode=None,
read_default_file=None, conv=None, use_unicode=None,
client_flag=0, cursorclass=Cursor, init_command=None,
connect_timeout=10, ssl=None, read_default_group=None,
compress=None, named_pipe=None,
autocommit=False, db=None, passwd=None, local_infile=False,
max_allowed_packet=16*1024*1024, defer_connect=False,
auth_plugin_map=None, read_timeout=None, write_timeout=None,
bind_address=None, binary_prefix=False, program_name=None,
server_public_key=None):
if use_unicode is None and sys.version_info[0] > 2:
use_unicode = True
if db is not None and database is None:
database = db
if passwd is not None and not password:
password = passwd
if compress or named_pipe:
raise NotImplementedError("compress and named_pipe arguments are not supported")
self._local_infile = bool(local_infile)
if self._local_infile:
client_flag |= CLIENT.LOCAL_FILES
if read_default_group and not read_default_file:
if sys.platform.startswith("win"):
read_default_file = "c:\\my.ini"
else:
read_default_file = "/etc/my.cnf"
if read_default_file:
if not read_default_group:
read_default_group = "client"
cfg = Parser()
cfg.read(os.path.expanduser(read_default_file))
def _config(key, arg):
if arg:
return arg
try:
return cfg.get(read_default_group, key)
except Exception:
return arg
user = _config("user", user)
password = _config("password", password)
host = _config("host", host)
database = _config("database", database)
unix_socket = _config("socket", unix_socket)
port = int(_config("port", port))
bind_address = _config("bind-address", bind_address)
charset = _config("default-character-set", charset)
if not ssl:
ssl = {}
if isinstance(ssl, dict):
for key in ["ca", "capath", "cert", "key", "cipher"]:
value = _config("ssl-" + key, ssl.get(key))
if value:
ssl[key] = value
self.ssl = False
if ssl:
if not SSL_ENABLED:
raise NotImplementedError("ssl module not found")
self.ssl = True
client_flag |= CLIENT.SSL
self.ctx = self._create_ssl_ctx(ssl)
self.host = host or "localhost"
self.port = port or 3306
self.user = user or DEFAULT_USER
self.password = password or b""
if isinstance(self.password, text_type):
self.password = self.password.encode('latin1')
self.db = database
self.unix_socket = unix_socket
self.bind_address = bind_address
if not (0 < connect_timeout <= 31536000):
raise ValueError("connect_timeout should be >0 and <=31536000")
self.connect_timeout = connect_timeout or None
if read_timeout is not None and read_timeout <= 0:
raise ValueError("read_timeout should be >= 0")
self._read_timeout = read_timeout
if write_timeout is not None and write_timeout <= 0:
raise ValueError("write_timeout should be >= 0")
self._write_timeout = write_timeout
if charset:
self.charset = charset
self.use_unicode = True
else:
self.charset = DEFAULT_CHARSET
self.use_unicode = False
if use_unicode is not None:
self.use_unicode = use_unicode
self.encoding = charset_by_name(self.charset).encoding
client_flag |= CLIENT.CAPABILITIES
if self.db:
client_flag |= CLIENT.CONNECT_WITH_DB
self.client_flag = client_flag
self.cursorclass = cursorclass
self._result = None
self._affected_rows = 0
self.host_info = "Not connected"
# specified autocommit mode. None means use server default.
self.autocommit_mode = autocommit
if conv is None:
conv = converters.conversions
# Need for MySQLdb compatibility.
self.encoders = {k: v for (k, v) in conv.items() if type(k) is not int}
self.decoders = {k: v for (k, v) in conv.items() if type(k) is int}
self.sql_mode = sql_mode
self.init_command = init_command
self.max_allowed_packet = max_allowed_packet
self._auth_plugin_map = auth_plugin_map or {}
self._binary_prefix = binary_prefix
self.server_public_key = server_public_key
self._connect_attrs = {
'_client_name': 'pymysql',
'_pid': str(os.getpid()),
'_client_version': VERSION_STRING,
}
if program_name:
self._connect_attrs["program_name"] = program_name
if defer_connect:
self._sock = None
else:
self.connect()
- 構建一個 Cursor 類對象,以與數據庫對象進行交互:
class Cursor(object):
"""
This is the object you use to interact with the database.
Do not create an instance of a Cursor yourself. Call
connections.Connection.cursor().
See `Cursor <https://www.python.org/dev/peps/pep-0249/#cursor-objects>`_ in
the specification.
"""
#: Max statement size which :meth:`executemany` generates.
#:
#: Max size of allowed statement is max_allowed_packet - packet_header_size.
#: Default value of max_allowed_packet is 1048576.
max_stmt_length = 1024000
_defer_warnings = False
def __init__(self, connection):
self.connection = connection
self.description = None
self.rownumber = 0
self.rowcount = -1
self.arraysize = 1
self._executed = None
self._result = None
self._rows = None
self._warnings_handled = False
- 利用 execute 函數執行 MySQL 命令
- 利用 fetch 獲取數據
- 使用 commit 提交更改
- 關閉數據庫
需要注意的是:
- 與數據庫交互是通過 cursor 實現的
- 數據庫打開、關閉和更新則是通過 db 實現的
- 因爲執行一些操作後,cursor 所指向的當前位置會發生變化,因此要注意 cursor 的當前位置
- 所有修改數據庫的行爲,如果要使修改生效,需要使用 commit 提交更改