Menu Content/Inhalt
Home

HPC for SKA1 Survey Central Signal Processing Print
June 2013

This article discusses the economical feasibility of applying COTS GPU to SKA1 CSP Survey.  The computing requirements are obtained from the Feasibility Study White Paper dated 2013-0503.  Pre-requisite training notes to read are: Amdahl Law, Arithmetic Intensity (AI), SKA, and Radio Interferometry.

Parallel Computing

o SKA Survey telescope is designed to map the sky in spectral lines and continuum.  The telescope consists of 96 dishes (for seeing the same part of the sky from different locations on Earth). Each dish produces 36 beams (for seeing different parts of the sky) and each beam consists of signals in 500 consecutive channels of 1MHz width.  Each MHz band will go into CSP for breaking down into 512 finer channels of 2kHz and signals of the same 2kHz frequency band from 96 dishes will be cross-correlated to produce the final visibility for SDP.  The 2 processes of channelization and cross-correlation are separated by another process called cross connect.  This is because signals coming out of the first process are organized in streams per beam and per dish, whereas the signals going into the second process must be organised in streams of the same frequency per beam for all dishes.   Cross connect has to be achieved with data switches outside of the processor due to huge data size.  As such processing has to be broken into 2 separate processors. 

Channelization Processor

o The splitting into finer channels per stream of signal can be processed separately from other streams but all streams have to be coordinated.   The Amdahl limit is 100% meaning that parallel computing is essential.  However, the AI index of this process is very low and is about 5 FLOPS per Byte per second.  Owing to the large data transfer rate, the process has to be handled by 250 GPU nodes assuming PCIev4 (31.5GB/s) is the memory transfer bottleneck and not external data transfer.

Cross Correlation Processor

o The Amdahl limit is 100% whereas the AI is 47 FLOPS per Byte per second and 219 GPU nodes are required based on PCIev4. 

Nvidia Tesla Volta

o Conservatively Tesla Volta has an AI of about 280.  This means both processes will not make good use of the processing capacity of Volta.

END