Monday, 1 December 2014

CUDA + SDR





Not having blogged for a while I thought I better do an update.
I have a number of projects on the go, one of the latest is using
NVIDIA's CUDA to do Software Defined Radio (SDR). I know
there is nothing too original about this but I wanted to learn how to
use CUDA and I thought SDR would be a place to start.

In the first picture you can see a waterfall of the 2 m band in a
Qt application,  it uses OpenGL and CUDA.
There is not much to see, just some APRS. I am currently working
on the software digital down converters. Using the various memory
resources on the NVIDIA card needs careful planning to obtain
maximum acceleration. I am expecting to be able to have at
least 10 receiver channels running on the card. I have some other ideas
for this Parallel computer like PAPR reduction in DVB-T2.

The SDR I am using is an Ettus research B200 which has a USB3
interface between itself and the P.C. The B200 operates from about
50 MHz to 6 GHz. With about a 32 MHz BW. I put mine in a
Hammond 1455L1601 box as it comes without a case.

The Graphics card I am using is a GTX 680 with 1536 CUDA cores. I am
planning to upgrade that to a GTX 690 which has twice as many cores.
The 690 is actually 2 GTX 680s on one card and appears as two
compute devices. Fortunately as these cards are aimed at the Gamer market
and high end Gamers like to have the best gear so the price of last generation
cards on the used markets is very reasonable. I found Gum Tree to be
a better place to buy them rather than eBay.

NVIDIA have just announced their Pascal chips with NVLINK which
will provide a quantum leap in performance. Those cards will be available
in 2016. They stack memory and CPU wafers on top of one another and
interconnect using Silicon vias. They will also allow much faster
communication with the host CPU through a shared memory interface.
Even with PCIe v 3 the global memory interface between the motherboard
and the GPU is the main bottle neck.

NVIDIA and partners like IBM are working hard to bring this technology
to other programming environments like Java and Python. They are also
providing application specific libraries for things like Deep Learning
Neural Networks. Maybe one day I will have a Neural Network to work
DX for me while I code.

Well that is it for now, back to my CUDA 6.5 programming / learning.

Wednesday, 10 September 2014

DATV-Express team win BATC Grant Dixon Award

I am pleased to announce that the DATV-Express team have won this year's BATC
Grant Dixon, G8CGK (SK) award for technical innovation.


Here are the links to the BATC talk

DATV-Express talk part 1
DATV-Express talk part 2


Sunday, 31 August 2014

August update

Not a lot to report, I finally managed to get the Verilog code to compile
into something other than a piece of wire, unfortunately the design was
too big to fit on the FPGA I am using so I am looking how to simplify it.

I have my talk for the BATC convention on the 6/7th of September
completed. It is a update of the current software situation and some
speculation of what might be in any possible DATV-Express 2.

We are always looking to fill a need in the hobby, if there is no need
then we won't need to develop anything.

I am giving a simpler talk and a demonstration to the Worthing Club on
the 17th of September.

There is an up and coming article in the next QST magazine  by Art WA8RMC
and Ken W6HHC is giving a talk at the TAPR DCC on the 6/7th of Sept.

There is a lot going on in the world of DATV at the moment and it is
difficult to track all the new developments.

I am hoping to do some networking at the BATC convention, find out what
others are doing and see where we can help, we don't want to duplicate what
others are doing.

If you attend any of my talks please say hello.

Tuesday, 29 July 2014

Winter projects stacking up

Diversity Receiver?
This is my winter project. I have been wanting to build a diversity SDR for quite
some time and finally I have the bits to make one.

To the left is a development board for an AD9253 quad 125 MS/sec 14 bit ADC.
The middle is an interposer board to adapt between the connectors used on the
ADC board and the FMC connector on the FPGA board. This board requires a
minor modification (to do with the framing clock signal). On the right is a Xilinx
SP601 evaluation board. The plan is to have a simple bit of code on the FPGA
to stitch together all the ICs, frame the data and send it over 1Gbit Ethernet to the
P.C for processing. The limiting factor is the 1Gbit Ethernet (of course).

Using Xilinx FPGA tools is new to me, (in the past I have used Altera) but I have
managed to write and deploy a simple program to the board. Xilinx have a different
way of doing things but the principals are the same.

Of course the FPGA board wants 5V and the AD9253 board 6V.

It will only be a 4 channel diversity receiver but that is a start but once I have verified
that it  works at HF I plan to add VHF / UHF / SHF down converters to it.

Sunday, 20 July 2014

Parallel Processing CUDA OpenGL UHD USRP2

For a bit of light relief the last few days I have been immersed in
CUDA and OpenGL programming. My initial goal is to use the
USRP2 I have to digitise a large piece of spectrum and display
it inside the window of a Qt5 application. I will be using CUDA to
do the parallel processing and OpenGL to display the results.

So far I have managed to create an OpenGL widget that displays a
window in a Qt application, grab samples using UHD, process
them using the CUDA library and write some simple kernel code.
I need to learn a bit more about using OpenGL before I go any further
as I want to display the results as a waterfall and getting CUDA to
talk to OpenGL via the GLWidget does not look too easy to do.
Getting CUDA to share buffers with OpenGL is not difficult but adding
the extra complexity of the GLWidget means I start to stray off the
beaten path.

The biggest problem has been installing the CUDA toolkit and more
especially the correct NVIDIA driver. The one that Ubuntu wants me
to install is not the right one. It has to be the latest one on the NVIDIA
website for CUDA 6.0 to work. I am using a GTX680 card for the GPUs.
I have also been looking at OpenCL but as I am using an NVIDIA card
I thought it better to use CUDA for the moment.

I bought the GTX680 a few years ago and for the same price I could get
something much more powerful now.

I notice it is not going to be long before Ubuntu 12.04 LTS is no longer
supported at which point I will have to consider upgrading. I have heard
upgrades never go smoothly so I am not looking forward to it.

I will post some more about Odroid in the next epistle.  

Friday, 11 July 2014

FFTs and optimisation

Changing the FFT from FFTW to av_fft made little or no difference
on the CPU load when running DVB-T. However optimising the
RS encoder has taken a few percent off the CPU load.

I am using the Valgrind tools to profile the program, the highest CPU load
varies between the iFFT the RS encoder and the interleaver. Currently
the interleaver is the biggest CPU drain and I can't find any obvious way
to optimise it.

I have been able to get 2 MHz wide DVB-T to work on the Odroid U3+
but I had to reduce the oversampling which has caused a small amount
of aliasing to appear in the output spectrum. Work continues.

Tuesday, 8 July 2014

Odroid U3+

Odroid U3+

Not a very good picture but what you are seeing is an Odroid U3+ in it's
case sitting on a tangle of wires which is my desktop. The Odroid is made
by Hardkernel, it is a Quadcore ARM device as used in Galaxy Smartphones.
It is a lot faster that other ARM devices I have been experimenting with and
is well supported with an inhouse magazine and hardware accelerators for the
graphics.

Currently with DATV-Express there are 2 main development strands. Firstly
I am adding support on the PC platform for analogue video capture and encoding
in software using libavcodec.

On the ARM platform I am trying to get the code fast enough to handle low
bandwidth DVB-T. I have moved from using fftw to do the iFFT to
av_fft which can be found in libavcodec. The new iFFT uses single precision
maths and on the ARM platform has special modules that use the NEON SIMD
instructions available on later ARM devices. The combination of these two
features means the code should run a lot faster.

Thanks to a suggestion by Ron W6RZ (who has ported my DVB-S2
implementation to GNURadio) I have managed to optimize part of the S2 code
to knock around 5% off the CPU load. I am sure further optimizations can be found.

Finally I have been reading a book on OpenCL called "Heterogeneous Computing
with OpenCL" One of the chapters gives an example of using it to do real-time
graphics processing and it has sent my mind racing as to the possibilities.

OpenCL for those that don't know is a framework based on the C language that
allows you to harness both CPUS and GPUS (graphics cards) in a parallel
processing environment. So far I have installed the Intel OpenCL SDK and
compiled and run a few pieces of example code. I have a NVIDIA card on my
Linux machine and the NVIDIA drivers now include the bits needed to work
with OpenCL.

What I am thinking of doing is taking the video capture code I have already
written for DATV-Express, run that on the CPUs to capture up to 4 video
channels. Then pass the video frames to the GPUs for manipulation and then
back to the CPUS for MPEG compression using libavcodec. The resultant
transport stream would then be sent to DATV-Express. I can then get rid of the
old Analogue video mixer / effects unit I have. The video mixer cost me
about the same amount as a dedicated PC would.  

Till next time!