TY - GEN
T1 - Parallelisation Approaches and Their Effect on LZO Compression Efficiency
AU - Demirel, Onur
AU - Ozsoy, Adnan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - This work presents a systematic evaluation of three parallelisation approaches, POSIX threads (pthreads), OpenMP on multicore CPUs, and OpenMP target offloading to NVIDIA GPUs, for accelerating the Lempel-Ziv-Oberhumer (LZO) lossless compression algorithm. Using an inter-task, chunk-based strategy, we benchmark seven heterogeneous datasets ranging from 0.8 MB text files to a 28.6 GB Wikipedia dump. Relative to a tuned serial baseline, on average of different dataset results, pthreads achieves a compression speed-up of 3.38×, while CPU-based OpenMP attains 3.59×; in contrast, GPU offloading peaks at 1.05×, with transfer and kernel-launch overheads frequently offsetting the device’s massive concurrency. Scalability on CPUs plateaus beyond twenty threads, indicating memory-bandwidth contention, whereas several low thread anomalies exhibit cache-driven super-linear behaviour. The study highlights the trade-offs between low-level thread management and directive based approaches, and underscores the need for finer-grained intra-task parallelism and asynchronous data movement to unlock GPU potential. All source code and experimental scripts are publicly released to foster reproducibility and further research.
AB - This work presents a systematic evaluation of three parallelisation approaches, POSIX threads (pthreads), OpenMP on multicore CPUs, and OpenMP target offloading to NVIDIA GPUs, for accelerating the Lempel-Ziv-Oberhumer (LZO) lossless compression algorithm. Using an inter-task, chunk-based strategy, we benchmark seven heterogeneous datasets ranging from 0.8 MB text files to a 28.6 GB Wikipedia dump. Relative to a tuned serial baseline, on average of different dataset results, pthreads achieves a compression speed-up of 3.38×, while CPU-based OpenMP attains 3.59×; in contrast, GPU offloading peaks at 1.05×, with transfer and kernel-launch overheads frequently offsetting the device’s massive concurrency. Scalability on CPUs plateaus beyond twenty threads, indicating memory-bandwidth contention, whereas several low thread anomalies exhibit cache-driven super-linear behaviour. The study highlights the trade-offs between low-level thread management and directive based approaches, and underscores the need for finer-grained intra-task parallelism and asynchronous data movement to unlock GPU potential. All source code and experimental scripts are publicly released to foster reproducibility and further research.
KW - GPU offloading
KW - Lempel-Ziv-Oberhumer (LZO)
KW - Lossless compression
KW - OpenMP
KW - Parallelisation
KW - pthreads
UR - https://www.scopus.com/pages/publications/105023420247
U2 - 10.1007/978-3-032-04728-1_13
DO - 10.1007/978-3-032-04728-1_13
M3 - Conference contribution
AN - SCOPUS:105023420247
SN - 9783032047274
T3 - Lecture Notes in Networks and Systems
SP - 151
EP - 162
BT - The 6th Joint International Conference on AI, Big Data and Blockchain, AIBB 2025
A2 - Awan, Irfan
A2 - Younas, Muhammad
A2 - Ghinea, George
A2 - Tor-Morten, Grønli
A2 - Sen, Sevil
PB - Springer Science and Business Media Deutschland GmbH
T2 - 6th Joint International Conference on AI, Big Data, and Blockchain, AIBB 2025
Y2 - 19 August 2025 through 21 August 2025
ER -