阅读一些SO文章之后,我想出了一种OpenCV
在python3中使用的方法multiprocessing
。我建议在linux上执行此操作,因为根据这篇文章,只要内容未更改,生成的进程就会与其父进程共享内存。这是一个最小的示例:
import cv2
import multiprocessing as mp
import numpy as np
import psutil
img = cv2.imread('test.tiff', cv2.IMREAD_ANYDEPTH) # here I'm using a indexed 16-bit tiff as an example.
num_processes = 4
kernel_size = 11
tile_size = img.shape[0]/num_processes # Assuming img.shape[0] is divisible by 4 in this case
output = mp.Queue()
def mp_filter(x, output):
print(psutil.virtual_memory()) # monitor memory usage
output.put(x, cv2.GaussianBlur(img[img.shape[0]/num_processes*x:img.shape[0]/num_processes*(x+1), :],
(kernel_size, kernel_size), kernel_size/5))
# note that you actually have to process a slightly larger block and leave out the border.
if __name__ == 'main':
processes = [mp.Process(target=mp_filter, args=(x, output)) for x in range(num_processes)]
for p in processes:
p.start()
result = []
for ii in range(num_processes):
result.append(output.get(True))
for p in processes:
p.join()
代替使用Queue
,从过程中收集结果的另一种方法是通过multiprocessing
模块创建共享数组。(已导入ctypes
)
result = mp.Array(ctypes.c_uint16, img.shape[0]*img.shape[1], lock = False)
然后,假设没有重叠,则每个进程可以写入数组的不同部分。但是,创建一个大mp.Array
的文件出奇地慢。这实际上违背了加快操作的目的。因此,仅在与总计算时间相比增加的时间不多的情况下才使用它。该数组可以通过以下方式转换为numpy数组:
result_np = np.frombuffer(result, dtypye=ctypes.c_uint16)