Friday, 24 November 2017

CUDA DVB-S2 decoder marches on.

No pictures this time as videos of the receiver in action don't prove much.
I have now implemented version 2 of the LDPC decoder based on a
research paper I found Gronroos   I now appear to be getting a throughput
in excess of 300 MBits/s. There are still some issues, the new decoder uses
8 bit metrics and it is a challenge to get them not to underflow on weak
signals. I have used the CUDA intrinsic SIMD instructions which use saturated
8 bit maths, this is a deviation from the original paper but seems to work.

The decoder works on the bases of handling 128 codewords in
parallel which causes two problems, firstly the latency and secondly the fact
the 128 codewords have to use the same FEC. This is not an issue with
TV broadcasting but does cause some issues with the requirements of
Phase 4 Ground. Another problem is that the new algorithm uses Min-Sum
rather Sum-Product to do the decoding which looses some performance.
I did find another paper that applies a correction to Min-Sum to
get back some of the performance.

The next improvement I made was to use a table lookup approach to detect
BCH code words in error. This has resulted in a 5x increase in speed
of the BCH error detector. The bad code words are then further processed
using the Berlekamp and Massey algorithm and up to 12 errors corrected
per code word. The syndromes are calculated on the CPU.
I calculate the odd order syndromes by using log/alog tables of one of the
base polynomials. The even order syndromes are calculated by multiplying
the required odd order syndromes, for example s2 = GMULT(s1,s1)
(where GMULT is a multiply over the Galois field). I am thinking of
splitting the batches of 128 codewords up into sets of 64 or maybe 32
then processing them in separate CPU threads. I am not sure whether
the overhead of creating the threads will be greater than the concurrency
I can achieve (only experiment will tell).

I have a lot more things to play with like using pinned memory for
device to host transfers and CUDA callbacks.

I am now decoding DVB-S2 at 800 KSymbols/sec, the limiting factor is no
longer the LDPC but the Kalman based adaptive equaliser I am using in
the front of the modem. Currently it uses floating point so can probably be sped up.
After equaliser training on the preamble I hope to switch to the far lesser
demanding LMS method but as yet that doesn't work.

So lots more interesting stuff to play with. I have had no formal education
in any of this I have just learnt it from articles I have found on the internet.
So please excuse any mistakes I make.

For any of you that read this blog that are not Radio Amateurs, part of the ethos
of the Amateur Radio movement is the concept of self training in radio
communications. I hope I meet that challenge, in this my personal journey.

So till next time ....

1 comment: