Blog

Comparison based sorting for systems with multiple GPUs

Comparison based sorting for systems with multiple GPUs

Abstract

As a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are designed to use all available GPUs in the system.


In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single-GPU sorting algorithm. Then, a series of merge steps produce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.

Frequently Asked Question

Apakah Docdoo bisa bekerja standalone tanpa Odoo?
Docdoo dirancang untuk berdiri sendiri dan bisa dengan mudah disambungkan dengan sistem ERP lain via API. Tentu, integrasi paling mulus *out-of-the-box* adalah dengan Odoo.
Jika ada bug setelah masa Go Live, apakah Thinq lepas tangan?
Tentu tidak! Fase pasca Go-Live dilengkapi garansi stabilitas 3 bulan dan kami menawarkan kontrak AMC (Annual Maintenance Contract) untuk jaminan perbaikan bug dan dukungan kustomisasi jangka panjang.
Bagaimana integrasinya dengan sistem perpajakan lokal (EFaktur)?
Sebagai lokalisasi khusus, Thinq telah mengembangkan paket konektor tambahan sehingga Odoo langsung bisa men-generate CSV yang valid diupload ke efaktur DJP tanpa pusing rekonsiliasi manual.
Apakah kami bisa menggunakan server on-premise sendiri?
Sangat bisa! Odoo dirancang fleksibel baik untuk arsitektur Cloud Hosting maupun server fisik (on-premise) yang berada di internal jaringan infrastruktur perusahaan Anda.
Berapa lama waktu implementasi Odoo secara rata-rata?
Untuk bisnis skala ritel dan layanan standar, implementasi berkisar antara 4-8 minggu. Sedangkan untuk skala manufaktur enterprise membutuhkan 3-6 bulan tergantung kompleksitas proses perpindahan data.

Speak to our experts, today.

Ready to transform your business? Connect with our specialists to find the perfect solution tailored to your needs..

Shabrina
HubungiSales Team