Performance of System V Style Shared Memory Support in Python 3.8
In version 3.8, Python supports System V style shared memory. This support allows creation of memory segments that can be shared between Python processes and, consequently, help avoid (de)serialization costs when sharing data between processes.
This change introduces a new manager multiprocessing.managers.SharedMemoryManager that allows manager based access to this shared memory capability. The change also introduces a new package named multiprocessing.shared_memory. This package contains SharedMemory and ShareableList classes. While the former class provides “raw” access to shared memory, the latter provides access to the shared memory by abstracting it as a list in Python but with some limitations.
Evaluation
To evaluate the performance gains from shared memory, I ran the following simple test — create a list of integers and double each integer in the list in parallel by chunking the list and processing each chunk in parallel.
To understand the benefits, the test was executed with both vanilla (non-shareable) Python lists and shareable Lists.
The test was executed on a 8-core 64GB RAM Linux box running Pop OS (Ubuntu) 19.10 and Python 3.8.0.
Result and Observation
The below runtime data was averaged across six test runs.
Without shared memory
data 99 ints : 0.004216 secs per iteration
data 999 ints : 0.004034 secs per iteration
data 9,999 ints : 0.004522 secs per iteration
data 99,999 ints : 0.012110 secs per iterationWith shared memory
data 99 ints : 0.005805 secs per iteration
data 999 ints : 0.015656 secs per iteration
data 9,999 ints : 0.718858 secs per iteration
data 99,999 ints : 67.926538 secs per iteration
Clearly, the performance based on shared memory is worse (4x, 179x, 6792x) that the performance based on non-shared memory.
Even with bool, float, and string data values (and corresponding minor tweaks to worker
function), the test results in similar observation.
Questions
Besides the obvious question — Is this evaluation flawed?, the observation raises three questions.
- Is (de)serialization of vanilla lists of primitive types optimized in Python?
- If so, this evaluation suggests ShareableList will not improve performance when used with vanilla lists of primitive types. However, ShareableList can only contain int, float, bool, string, bytes, and None values. So, under what circumstances will ShareableList improve performance?
- Also, in general, under what circumstances will the new shared memory support improve the performance?
Answering the above questions will help developers decide if the new shared memory support can benefit their apps.
Summary
The new shared memory feature (ShareableList in particular) in Python 3.8 is unlikely to improve performance off-the-shelf. A thorough evaluation of the circumstances under which this feature will improve performance is needed to help developers decide to use this feature.
Note
Since this feature was included in Python, I am assuming there was a backing performance evaluation that the feature would improve performance. However, the above result suggests otherwise. So, if you think the above evaluation is flawed, then please do leave a comment with the reasons why you think the evaluation is flawed. This will help me fix it!