一个Python多线程和CPU负载的问题

背景

工作过程中观察到这么一个现象:python+flask这对搭档中,当一个request进来触发一个长时间任务时,我一般都会起一到几个线程去异步处理。但是我发现不管起几个线程,这些线程所在的python进程最多只能跑满一个CPU核心。

反思

上述现象其实已经观察到很多次了,也有很长时间了,但期间一直没有花时间去调研这个问题。是在不应该。

调查结果

Python里面引入一个机制GIL(global interpreter lock),记全局解释器锁,他会让同一时刻,只有一个线程执行python代码。也就是它给解释器加了一把锁,避免在多线程环境下出的各种问题。算是一种设计上的均衡考量。

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.

However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.

Past efforts to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.

摆脱限制

  1. 使用C等实现库,或者响应的代码,可以释放GIL
  2. 使用多进程模型
  3. 使用第三方解释器,比如Jypthon(该解释器只有2.x版本)

多进程模型

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

参考资料

  1. Glossary — Python 3.11.2 documentation
  2. concurrent.futures — Launching parallel tasks — Python 3.11.2 documentation