Friday, 24 November 2017

CUDA DVB-S2 decoder marches on.

No pictures this time as videos of the receiver in action don't prove much.
I have now implemented version 2 of the LDPC decoder based on a
research paper I found Gronroos   I now appear to be getting a throughput
in excess of 300 MBits/s. There are still some issues, the new decoder uses
8 bit metrics and it is a challenge to get them not to underflow on weak
signals. I have used the CUDA intrinsic SIMD instructions which use saturated
8 bit maths, this is a deviation from the original paper but seems to work.

The decoder works on the bases of handling 128 codewords in
parallel which causes two problems, firstly the latency and secondly the fact
the 128 codewords have to use the same FEC. This is not an issue with
TV broadcasting but does cause some issues with the requirements of
Phase 4 Ground. Another problem is that the new algorithm uses Min-Sum
rather Sum-Product to do the decoding which looses some performance.
I did find another paper that applies a correction to Min-Sum to
get back some of the performance.

The next improvement I made was to use a table lookup approach to detect
BCH code words in error. This has resulted in a 5x increase in speed
of the BCH error detector. The bad code words are then further processed
using the Berlekamp and Massey algorithm and up to 12 errors corrected
per code word. The syndromes are calculated on the CPU.
I calculate the odd order syndromes by using log/alog tables of one of the
base polynomials. The even order syndromes are calculated by multiplying
the required odd order syndromes, for example s2 = GMULT(s1,s1)
(where GMULT is a multiply over the Galois field). I am thinking of
splitting the batches of 128 codewords up into sets of 64 or maybe 32
then processing them in separate CPU threads. I am not sure whether
the overhead of creating the threads will be greater than the concurrency
I can achieve (only experiment will tell).

I have a lot more things to play with like using pinned memory for
device to host transfers and CUDA callbacks.

I am now decoding DVB-S2 at 800 KSymbols/sec, the limiting factor is no
longer the LDPC but the Kalman based adaptive equaliser I am using in
the front of the modem. Currently it uses floating point so can probably be sped up.
After equaliser training on the preamble I hope to switch to the far lesser
demanding LMS method but as yet that doesn't work.

So lots more interesting stuff to play with. I have had no formal education
in any of this I have just learnt it from articles I have found on the internet.
So please excuse any mistakes I make.

For any of you that read this blog that are not Radio Amateurs, part of the ethos
of the Amateur Radio movement is the concept of self training in radio
communications. I hope I meet that challenge, in this my personal journey.

So till next time ....

Monday, 16 October 2017

Work progresses on my software decoder

Here is another quick video of my work on using CUDA to do DVB-S2 demodulation.
I have now reached the dizzy heights of 500 Kilo Symbols per second in real time
with 20 iterations of the LDPC decoder still a long way to go and a lot to learn about
programming in CUDA.

Thursday, 5 October 2017

LDPC the power of iteration

Currently I am working on a DVB-S2 software decoder using an NVIDIA GPU
and the CUDA programming language. It is still very early days but I thought
I would share some of the initial results.

The type of FEC used in DVB-S2 is called Low Density Parity Check LDPC
and it uses an iterative algorithm to do the decoding, in my case one called
belief propagation also known as the sum-product algorithm.

Above is my software running in off-line test mode decoding a known
DVB-S2 frame. As can be seen the first time it runs with 4 iterations
the frame is unrecoverable. The next time it runs with 10 iterations
and while the baseband header can be decoded all the transport packets
in the body of the message are in error. Finally it is run again with 20
iterations, this time the whole frame can be decoded. Each run uses the
same identical noise sequence added to it

Above is a example of the modem working in real-time receiving DVB-S2 transmitted
from DATV-Express and being received on ADALM-PLUTO, it is being decoded
totally in software using a GTX980TI GPU. A huge amount of work still needs to be
done before this is of any practical use. The motivation for doing this work is to have
something that will support many of the features not found in commercial DBS
chip-sets and to give me a radio related project for learning CUDA programming on.
There are rumours that DVB-S2X VL_SNR modes may be used on future CUBESAT
Lunar and Martian missions. I agree not many people are going to want to use a
£600 GPU card when a £100 Minitiouner can be used instead, but I am not
just many people! The GPU card running flat out makes a nice room heater.

There is also a video over on my YouTube channel that shows some moving video
rather than just a frame capture (the software has changed a lot since I made that clip).

Thursday, 7 September 2017

Pluto SDR

I recently bought a couple of ADALM-PLUTO SDRs from Digikey.
I then modified the DATV-Express to work with them

The video above is a short test session. The  picture was captured by a Logitech
C920 camera, fed into vMix then into DATV-Express it was then transmitted
on 437 MHz using the Pluto-SDR using 64 QAM 2 MHz DVB-T with FEC=7/8.
It was then received using a Hides UT100D and displayed on a PC using their
bundled software.

I will be presenting a 40 min talk on my efforts with the Pluto at the BATC CAT17
convention. The video will be uploaded to the BATC YouTube channel sometime
after the convention which is being held over the weekend of the 9/10 Sept 2017.

QPSK Analysis

 I have also included some shots from the Signal Hound BB60C analyser and Spike
software to give you an idea of what the spectrum looks like.

If you can't wait until CAT17 is uploaded then here is a link to my YouTube Channel

Wednesday, 5 July 2017

9 GHz local oscillator source and new test gear

I recently acquired an Analog Devices ADF41020 18 GHz PLL evaluation board.
It comes fitted with a Hittite HMC515 11.5 - 12.5 GHz VCO which after quite
some effort I replaced with a Hittite HMC510 8.45 - 9.55 GHz VCO. The new VCO
will in fact lock from 7.6 Ghz to 10 GHz. I am getting about 3.5 dBm out of the board.
The output is a lot cleaner than the cheap YIG based oscillator I was using for a 9 GHz
LO.  The board itself cost $150 and the replacement VCO $35. I had considerable
difficulty in soldering the part in place. I watched all the YouTube videos on how
to do this sort of thing but it didn't work out as easy as they said it would. The biggest
issue was the ground plane sucking away all the heat and making it almost impossible
to get the solder to flow with the hot air tool, also the no clean flux would boil
away with the slightest amount of heat and lift the chip off the board.
Next time I will practice on some trashed boards but I am too impatient.
The Hittite VCO is the chip in the bottom right corner.
While the Hittite VCO is an expensive part it does also have a divide by 2 and
a divide by 4 output which could make it useful for a multi-band converter.
Still it makes a change doing hardware for once rather than software.

DVB-S2 displayed on an E4405B

I also recently bought yet another spectrum analyser for my lab. I felt I got it at
a really good price and it is a very clean example. It is only rated to 13.2 GHz.
I do have a much older HP 8952L that can be tricked to cover 24 Ghz but
I have never needed to use the upper range.

Spike QPSK Analysis
Spike Real time OFDM analysis of a 2 MHz DVB-T signal

Finally I have had the opportunity to experiment with the Spike software
package that comes with the Signal Hound BB60C for carrying out modulation
analysis. The software is constantly evolving and upgrades are free. So hopefully
in time many more modulation analysis features will be added. I would like to see
COFDM added, it is on their roadmap but not in the near future.

I have a couple of other projects in progress at the moment but they will be
the subject of another blog .

Sunday, 26 February 2017

Digital Pre-Distortion Revisted

As I have mentioned before I have been interested in the digital pre-distortion of
Digital TV signals for some time. I have now finally got around to looking at
this problem again.

After a literature search I settled on using the polynomial filter model with memory
to do the pre-distortion. The plan is to use a LimeSDR to do the actual modulation and
a CUDA graphics card to do the maths.

The diagram above is actually quite clever, it uses the input z(n) and output y(n) of the PA
to train a filter to produce the inverse of the PA characteristic. Then it takes the signal to
be transmitted x(n) and passes it through this filter to pre-distort it. The PA then distorts
it back into a signal that looks like the original drive signal x(n).

The filter is not a normal FIR filter whose taps multiply and accumulate a series of
samples, instead the FIR filter is fed with a power series
x(0)  x(0)*|x(0)|  x(0)*|x(0)|^2 ... x(0)*|x(0)|^N etc

As the PA's characteristic will change slowly over time it should only be necessary to
update the filter coefficients on an infrequent basis. This is good because it means the
very intensive Minimum Mean Square Error (MMSE) estimator does not need
to run all the time.

I plan to use QR Decomposition followed by backwards substitution to estimate the filter
tap values. Fortunately clever people have already written a library to do this for me.
NVIDIA have a library called cuSolver that does it, cuSolver in turn is based on
LAPACK. NVIDIA have very kindly given an example in Appendix C.1 of their toolkit
documentation showing exactly how to do this. The example has to be modified to
deal with complex numbers but that change is trivial.

It may well be better to run the MMSE estimator on the CPU using LAPACK, rather
than on the GPU as for small matrices the CPU is faster. It is only when the matrices
become large that the GPU excels.

The beauty of the LimeSDR is that because of it's USB3 interface there is plenty of
bandwidth available. This is needed because the DPD needs to see about 7 times the
bandwidth of the transmitted signal if it is going to reduce the 3rd, 5th and 7th order
IMD products (shoulders to you DATVers).

Hopefully by doing all this on a PC host rather than using an FPGA I should be able to
get something running fairly quickly. Moving it to an FPGA can come later when I get
some idea of how well it works.

There are a lot of subtleties to this and I have just glossed over it as I didn't want people
to get too bogged down in the maths.

If you are interested in more detail please have a look here Keysight DPD

Wish me luck!

Monday, 12 December 2016

There is no heat in my workshop (or how I remoted my IC7100 radio)

Icom RS-BA1 Remote Control Software

Icom IC7100 rear panel
 At the moment all my antennas come into a brick built workshop in my back garden.
The workshop has minimal heating so it does not get used during the winter months.
To remedy this situation I bought a used IC7100 on eBay from W&S. I already had
an Ethernet cable going from the house to the workshop, so it was fully networked.

I had an old Belkin Ethernet to USB hub which I connected to the workshop
network and then plugged the USB2 port on the IC7100 into it. Also connected I
have an LDG AT200PC. The AT200PC is no longer made but has an RS232
control port which was ideal for this application. The 2m/70 cms port on the IC7100
is connected to a Diamond Diplexor which splits the output so it can connect to a
2m and 70 cms  yagi. The HF port of the radio goes into the LDG which as well as
being an antenna tuner also is an antenna switch which I can control remotely allowing
me to switch between an HF antenna and a 6m/4m beam. The ATU will be controlled
by a simple Windows dialog application I am writing (the app can currently only read
the version number of the ATU's firmware but that shows it has comms).

Initially the Belkin would not work with my Windows 10 machine
(it worked fine on Win7). After some Googling I found that while
Bekin don't support it on Windows 10 (their tec support said it could not be done)
it is possible to use under Win10 by downloading updated drivers from the
chip manufacturers website (Silex Technology).

Now using Icom's RS-BA1 remote software I can fully control my radio and use PC
connected headphones and microphone with it from the warmth of the main house.
To the RS-BA1 software the radio appears to be locally connected via USB2.

To rotate the VHF/UHF antennas I used my homebrew Arduino based rotator
controller. It appears as a command line application on my desktop and allows
me either to input a bearing or a QRA locator. I have blogged about the rotator
controller before.

I also have plans to add a multi-mode TNC to the set-up so I can use legacy modes
like Pactor and Amtor remotely. There are still some spare USB2 sockets left on the

Digital Beam Forming on 70 cms
70 cms Turnstile antenna
Above is the 70 cms turnstile antenna I bought from a company in Europe.
My current thoughts are to use 16 of these antennas connected to phase/time
locked Lime SDRs to produce a Digital Beam Forming demonstrator. This will
allow the tracking of multiple Cubesats / balloons simultaneously. Unlike a conventional
antenna which can only point in one direction at a time this can have multiple
receiver channels each pointing at a different satellite. It can also steer nulls in the
radiation patterns to null out interferers on each of the beams (It is only maths).
Ideally I would like many more than 16 antennas but there is a limit to the
space and money I have available for this project.

The signal processing will be done using CUDA C and NVIDIA graphics cards
allowing for potentially hundreds of independent beams. I am happy to collaborate
with others on this project.

So that is it for this month. I hope you enjoyed my thoughts.