Python實現簡單的CPU多核並行

現在的電腦的CPU一般都有多個核心,在Python中可以使用 multiprocessing 包比較方便地實現將計算任務分配給多個核心,使之並行地計算以實現加速的效果。

一般主要會用到的語法有

獲取CPU的核心數:

n_cpu = multiprocessing.cpu_count()

並行執行函數:

proc = multiprocessing.Process(target=single_run, args=(digits, "parallel"))
proc.start()
proc.join()

其中,target屬性是要並行執行的函數名,args是該函數的參數,注意要用元組的形式。

下面通過一個簡單的例子來演示一下CPU並行地效果。對MINST-digits數據的10個類分別運行t-SNE降維,比較並行運行與串行運行的時間差異。

import numpy as np
import multiprocessing
from sklearn.manifold import TSNE
import time


path = "E:\\blog\\data\\MNIST50m\\"


def run_tsne(data):
    t_sne = TSNE(n_components=2, perplexity=30.0)
    Y = t_sne.fit_transform(data)
    return Y


def single_run(digits, fold="1by1"):
    for digit in digits:
        print(str(digit) + " starting...")
        X = np.loadtxt(path+str(digit)+".csv", dtype=np.float, delimiter=",")
        t_sne = TSNE(n_components=2, perplexity=30.0)
        Y = t_sne.fit_transform(X)
        np.savetxt(path+fold+"\\Y"+str(digit)+".csv", Y, fmt='%f', delimiter=",")
        print(str(digit) + " finished.")


def one_by_one():
    begin_time = time.time()
    digits = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    # digits = [1, 2, 3, 4, 5, 6]
    single_run(digits, "1by1")
    end_time = time.time()
    print("one by one time: ", end_time-begin_time)


def parallel():
    begin_time = time.time()
    n = 10  # 10
    procs = []
    n_cpu = multiprocessing.cpu_count()
    chunk_size = int(n/n_cpu)

    for i in range(0, n_cpu):
        min_i = chunk_size * i

        if i < n_cpu-1:
            max_i = chunk_size * (i+1)
        else:
            max_i = n
        digits = []
        for digit in range(min_i, max_i):
            digits.append(digit)
        procs.append(multiprocessing.Process(target=single_run, args=(digits, "parallel")))

    for proc in procs:
        proc.start()
    for proc in procs:
        proc.join()

    end_time = time.time()
    print("parallel time: ", end_time-begin_time)


if __name__ == '__main__':
    # one_by_one()
    parallel()

串行輸出如下,可以看到花了500多秒的時間。

1 starting...
1 finished.
2 starting...
2 finished.
3 starting...
3 finished.
4 starting...
4 finished.
5 starting...
5 finished.
6 starting...
6 finished.
7 starting...
7 finished.
8 starting...
8 finished.
9 starting...
9 finished.
one by one time:  538.7096929550171

而在我六核的 i5-9400F 上的並行輸出如下,可以看到花了300多秒,稍微快了一些,但是效果並不理想。

4 starting...
3 starting...
0 starting...
5 starting...
2 starting...
1 starting...
0 finished.
2 finished.
4 finished.
3 finished.
5 finished.
6 starting...
1 finished.
6 finished.
7 starting...
7 finished.
8 starting...
8 finished.
9 starting...
9 finished.
parallel time:  339.75568318367004

爲了更好地體現CPU並行和串行的差別,我又讓它們分別對6個digit做t-SNE降維,並行的速度大概是串行的4倍。

6個digit的串行輸出:

1 starting...
1 finished.
2 starting...
2 finished.
3 starting...
3 finished.
4 starting...
4 finished.
5 starting...
5 finished.
6 starting...
6 finished.
one by one time:  357.5319800376892

6個digit的並行輸出:

3 starting...
4 starting...
1 starting...
5 starting...
2 starting...
0 starting...
5 finished.
0 finished.
4 finished.
2 finished.
1 finished.
3 finished.
parallel time:  85.06037616729736

總的來說,對於一些計算需求來講,CPU多核並行能夠提高一定的計算速度,但是提升能力有限,比如6核的i5處理器,速度的提升不會超過6倍。所以如果想大幅度提高速度的話,還是得用GPU版本的並行。

關於Python的GPU編程,可以參考 《Python Parallel Programming Cookbook》這本書,這是一本開源的書,在網上應該能夠比較方便地找到電子版。如果實在找不到(或者懶得找)也可以聯繫我獲取。

發佈了189 篇原創文章 · 獲贊 240 · 訪問量 56萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章