背景
工作过程中观察到这么一个现象:python+flask这对搭档中,当一个request进来触发一个长时间任务时,我一般都会起一到几个线程去异步处理。但是我发现不管起几个线程,这些线程所在的python进程最多只能跑满一个CPU核心。
反思
上述现象其实已经观察到很多次了,也有很长时间了,但期间一直没有花时间去调研这个问题。是在不应该。
调查结果
Python里面引入一个机制GIL(global interpreter lock),记全局解释器锁,他会让同一时刻,只有一个线程执行python代码。也就是它给解释器加了一把锁,避免在多线程环境下出的各种问题。算是一种设计上的均衡考量。
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as
dict
) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.
Past efforts to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.
摆脱限制
- 使用C等实现库,或者响应的代码,可以释放GIL
- 使用多进程模型
- 使用第三方解释器,比如Jypthon(该解释器只有2.x版本)
多进程模型
import concurrent.futures
import math
PRIMES = [
112272535095293,
112582705942171,
112272535095293,
115280095190773,
115797848077099,
1099726899285419]
def is_prime(n):
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
print('%d is prime: %s' % (number, prime))
if __name__ == '__main__':
main()