emysql 源碼閱讀

說明：測試使用的版本checkout源自https://github.com/jkvor/emysql.git
這個版本在github上已經不再更新了

emysql 也是一個常用的erlang mysql數據庫驅動。相比較erlang_mysql_driver，emysql的代碼結構更加清晰。emysql的使用非常方便，先執行emysql:start、emysql:add_pool，然後調用emysql:fetch/execute就可以開始執行sql語句了。

網上emysql分析源碼的版本衆多，發現我下載下來的還跟大部分人下載的不一樣，可能該版本修改了很多，我下載了當前最新版本。本來是想分析emysql源碼的，測試下發現了幾個bug，不知道是不是使用的方式不當。
下面就開始分析下emysql的源碼和測試的過程。
（注：使用“業務進程”表示調用emysql:execute的進程）

emysql 啓動簡要分析

1 emysql的啓動
emysql:start 以監督樹進程的形式，啓動了兩個子進程，emysql_statements 和 emysql_conn_mgr。

init(_) ->
    {ok, {{one_for_one, 10, 10}, [
        {emysql_statements, {emysql_statements, start_link, []}, permanent, 5000, worker, [emysql_statements]},
        {emysql_conn_mgr, {emysql_conn_mgr, start_link, []}, permanent, 5000, worker, [emysql_conn_mgr]}
    ]}}.

2 emysql_statements的啓動
emysql_statements在啓動時並沒有做特殊操作，只是初始化了#state。這裏emysql_statements的state裏有兩個元素statements和prepared，數據結構都是採用gb_trees。
3 emysql_conn_mgr的啓動
emysql_conn_mgr的state有兩個元素pools 和 waiting , 如果app文件一開始配置了連接池則會在啓動的時候添加，一般來說是[]。

init([]) ->
    Pools = initialize_pools(),
    Pools1 = [emysql_conn:open_connections(Pool) || Pool <- Pools],
    {ok, #state{pools=Pools1}}.

emysql_conn_mgr這個進程類似於erlang_mysql_driver中的mysql_dispatcher，emysql_conn_mgr管理了多個連接池，連接池裏放置了多個連接。另外emysql_conn_mgr還管理了waiting的pid，這些pid是等待連接的業務進程。

測試

1 測試一
簡要分析後，就開始測試了，同一時間spawn了10萬個進程執行select語句，主要目的是想測試下，在這種情況下主要的壓力放在哪些進程上了。測試表明，剛開始6秒左右2萬多的進程能夠得到返回，但是在後面的時間就都是connection_wait_timeont報錯了。

execute(PoolId, Query, Args, Timeout) when is_atom(PoolId) andalso (is_list(Query) orelse is_binary(Query)) andalso is_list(Args) andalso is_integer(Timeout) ->
    Connection = emysql_conn_mgr:wait_for_connection(PoolId),
    monitor_work(Connection, Timeout, {emysql_conn, execute, [Connection, Query, Args]});

wait_for_connection是一個阻塞的過程，就算我們的執行時傳入的參數Timeout大於5000，execute真正的timeout時間也會受wait_for_connection影響。execute方法把Timeout參數傳給了monitor_work，並沒有傳給wait_for_connection。monitor_work確實會在Timeout時間內返回，但是wait_for_connection就按自己定義的timeout時間了。
這也就是許多進程在5秒後就說到timeout報錯的原因，因爲wait_for_connection自己定義的timeout時間就是5秒，這裏我們看下wait_for_connection的源碼。

wait_for_connection(PoolId) when is_atom(PoolId) ->
    %% try to lock a connection. if no connections are available then
    %% wait to be notified of the next available connection
    case lock_connection(PoolId) of
        unavailable ->
            gen_server:call(?MODULE, start_wait, infinity),
            receive
                {connection, Connection} -> Connection
            after lock_timeout() -> %% 這裏就是wait_for_connection自定義的timeout時間，一般來說爲5秒
                exit(connection_lock_timeout)
            end;
        Connection ->
            Connection
    end.

業務進程如果一開始lock_connection的時候沒有獲取到Connection，則會一直阻塞等待emysql_conn_mgr發來Connection。而阻塞了lock_timeout()的時間後還沒收到連接，那麼就真的exit了，所以就這個代碼而言測試大量請求的情況下很容易受到timeout報錯，業務進程會一直阻塞在等待emysql_conn_mgr返回連接的wait_for_connnection中。
而emysql_conn_mgr會在收到start_wait消息的時候，把業務進程的pid存入自己的state.waiting隊列中。

上述測試中遇到的問題就是，state的waiting隊列一直沒有清空！很顯然業務進程已經掛了，timeout報錯了，但是emysql_conn_mgr中的waiting裏一直沒有把這些業務進程刪除，這是爲什麼呢。
我們可以看下pass_connection_to_waiting_pid的方法，當一個業務進程已經使用完connection後

1 當前無waiting的業務進程，刪除pool中locked的connection，並把connection添加到pool的avliable中。
2 當前的waiting裏有等待執行的業務進程，這個就是我們上面說到的業務進程。該進程調用wait_for_connection時沒有馬上拿到連接，所以進入阻塞等待了。這種情況下就不把connection放回去了，直接給waiting隊列中的進程了，但是很遺憾的情況是：erlang:process_info(Pid, current_function) 時->業務進程已經掛了，並沒有在{current_function,{emysql_conn_mgr,wait_for_connection,1}}中。
按道理這個時候，我們應該把waiting裏的pid該刪除了，然而並沒有pass_connection_to_waiting_pid直接返回了原來的State，所以導致State中的無效pid一直沒有被刪除。每次pass_connection的時候，都必須遍歷之前waiting的pid。
這裏修改成
pass_connection_to_waiting_pid(State#state{waiting=Waiting1}, Connection, Waiting1)就可以了。

pass_connection_to_waiting_pid(State, Connection, Waiting) ->
    %% check if any processes are waiting for a connection
    case queue:is_empty(Waiting) of
        true ->
            %% if no processes are waiting then unlock the connection
            case find_pool(Connection#connection.pool_id, State#state.pools, []) of
                {Pool, OtherPools} ->
                    %% find connection in locked tree
                    case gb_trees:lookup(Connection#connection.id, Pool#pool.locked) of
                        {value, Conn} ->
                            %% add it to the available queue and remove from locked tree
                            Pool1 = Pool#pool{
                                available = queue:in(Conn#connection{locked_at=undefined}, Pool#pool.available),
                                locked = gb_trees:delete_any(Connection#connection.id, Pool#pool.locked)
                            },
                            {ok, State#state{pools = [Pool1|OtherPools]}};
                        none ->
                            {{error, connection_not_found}, State}
                    end;
                undefined ->
                    {{error, pool_not_found}, State}
            end;
        false ->
            %% if the waiting queue is not empty then remove the head of
            %% the queue and check if that process is still waiting
            %% for a connection. If so, send the connection. Regardless,
            %% update the queue in state once the head has been removed.
            {{value, Pid}, Waiting1} = queue:out(Waiting),
            case erlang:process_info(Pid, current_function) of
                {current_function,{emysql_conn_mgr,wait_for_connection,1}} ->
                    erlang:send(Pid, {connection, Connection}),
                    {ok, State#state{waiting = Waiting1}};
                _ ->
                %% 這裏是關鍵，State又一次原樣返回了，導致State裏的waiting一直沒變
                %% 應該改成 pass_connetion_to_waiting_pid(State#state{waiting=Waiting1}, Connection, Waiting1)
                    pass_connection_to_waiting_pid(State, Connection, Waiting1)
            end
    end.

2 測試二
鑑於上面測試的表現，修改了下代碼，允許業務進程一直等待，直到查詢完成。

wait_for_connection(PoolId) when is_atom(PoolId) ->
    %% try to lock a connection. if no connections are available then
    %% wait to be notified of the next available connection
    case lock_connection(PoolId) of
        unavailable ->
            gen_server:call(?MODULE, start_wait, infinity),
            receive
                {connection, Connection} -> Connection
            end;
        Connection ->
            Connection
    end.

這裏修改後，wait_for_connection就不會有timeout時間了。那麼emysql:monitor_work裏timeout時間纔是真正的timeout時間。業務進程會阻塞在wait_for_connection中，然後獲得連接，獲得連接後，spawn子進程去使用connection執行sql。
但是這裏又遇到了一個問題，gen_server:call(?MODULE, start_wait, infinity),這個消息發送的時候並沒有告訴emysql_conn_mgr，該進程是等待哪個pool的連接的！不同的pool可能就是連接不同數據庫的！爲此我特意建了兩個數據庫，兩個連接池，結果就會出現一個問題，就是emysql_conn_mgr在pass_connection_to_waiting_pid的時候，對返回的connection不加區分直接就傳遞給了waitng中的pid。
關於這個問題，修改的地方必須是一開始添waiting pid的時候，該pid加上pool的標誌。也就是有多個隊列，每個隊列對應一個pool，當有空閒的connection出現時，查看connection的pool_id，然後獲取相應的waiting queue的pid，將connetion發送給該pid。

emysql 源碼閱讀

emysql 啓動簡要分析

測試

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

unp筆記三 IO複用基礎概念

IA__gdk_gc_new: assertion 'drawable != NULL' failed

const string爲函數參數的警告

unp筆記二多進程服務器

unp筆記一網絡編程基礎

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結