LOGO – GPU Accelerated Travelling Salesman Problem (TSP) Solver
Our previous posts about Travelling Salesman Problem resulted in a lot of feedback. for example, Dr Kamil Rocki (University of Tokyo) pointed us out to their high performance GPU-based TSP Solver “LOGO”. It is an approximate stochastic solver based on Iterative Local Search and 2-opt local search merhod. 2-opt search is performed in parallel on GPU. Despite limited shared memory resources, my implementation is able to solve arbitrarily large instances. The results show that in case of the larger instances, GPU (NVIDIA GTX 680) checks 25*10^9 2-opt swaps per seconds, meaning 700-800 GFLOP/s (2D distance calculation). To the best of authors knowledge it is the fastest GPU TSP algorithm concerning local search! Anyone faster out there?
Download source code (OSX, Linux) v. 0.2 (last update – 26/09/2012)
The electronic poster is to appear at the SC12 conference in November in Salt Lake City:
“Kamil Rocki, Reiji Suda, “High Performance GPU Accelerated TSP Solver” (Electronic Poster), The International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC12), 10-16 November 2012, Salt Lake City, USA”
The algorithm is also described on author’s website and the previous paper:
Abstract
In this paper we are presenting high performance GPU implementations of the 2-opt and 3-opt local search algorithms used to solve the Traveling Salesman Problem. The main idea behind them is to take a route that crosses over itself and reorder it so that it does not. It is a very important local search technique. GPU usage greatly decreases the time needed to find the best edges to be swapped in a route. Our results show that at least 90% of the time during Iterated Local Search is spent on the local search itself. We used 13 TSPLIB problem instances with sizes ranging from 100 to 4461 cities for testing. Our results show that by using our GPU algorithm, the time needed to find optimal swaps can be decreased approximately 3 to 26 times compared to parallel CPU code using 32 cores. Additionally, we are pointing out the memory bandwidth limitation problem in current parallel architectures. We are showing that re-computing data is usually faster than reading it from memory in case of multi-core systems and we are proposing this approach as a solution.
Rocki, K.; Suda, R.; , “Accelerating 2-opt and 3-opt local search using GPU in the travelling salesman problem,” High Performance Computing and Simulation (HPCS), 2012 International Conference on , vol., no., pp.489-495, 2-6 July 2012 [doi: 10.1109/HPCSim.2012.6266963] [Free PDF].
Category: Articles, Computer Science, SC12, Software
-
http://twitter.com/john_hauck John Michael Hauck






