Latency analysis of the raspberry camera
This post presents results of an analysis looking for the cause of latency in a Raspberry wifibroadcast FPV setup
The last bit a wifibroadcast FPV system would need to kill analog FPV would be better latency. The robustness of the transmission using wifibroadcast is already better, the image quality for sure, just the latency is a bit higher.
This post will not present a solution and will also not measure total system delay. It concentrates on the latencies in the TX PI.
For measuring latency easily you need something to measure that is driven by the same clock as the measurement. I decided to go with a simple LED that is driven by a GPIO of the Raspberry. The LED is observed by the camera. This setup can be seen in the following Figure:
I wrote a program that toggles the LED and outputs for each action a timestamp delivered by gettimeofday(). This allows me know the time of the LEDs actions relative to the PIs clock.
The latency between the image capture to a h264 image comprises capture and compress latency. Since this both happens hidden by raspivid, it cannot be divided easily. And this is the number we more or less “have to live with” until Broadcom opens up the GPU drivers.
I measured the latency using a modified version of raspivid. The compression engine packs the h264 data into NAL units. Think of them as h264 images that are prefixed by a header (0x00000001) and then written image after image to form the video stream. The modification I added to raspivid was that each NAL unit received a timestamp upon arrival. This timestamp was written right before the NAL header into the h264 stream.
I also wrote a program that was able to read my “timestamped” stream and convert it into single images that are attached with their corresponding timestamps.
For example, the LED toggling program gave me an output like this:
OFF 1441817299 404717 ON 1441817299 908483 OFF 1441817300 9102 ON 1441817300 509716 OFF 1441817300 610361 ON 1441817301 111039 OFF 1441817301 211695 ON 1441817301 717073 OFF 1441817301 817717 ON 1441817302 318342 OFF 1441817302 419034 ON 1441817302 919647 OFF 1441817303 20302 ON 1441817303 520965 OFF 1441817303 621692 ON 1441817304 122382 OFF 1441817304 223078 ON 1441817311 718719 OFF 1441817311 819685 ON 1441817312 320652 OFF 1441817312 421654
First column is the status of the LED, second column the seconds, third column the microseconds.
My h264 decoder then gave me something like this:
1441741267 946995 CNT: 0 Found nalu of size 950 (still 130063 bytes in buf) 1441741267 965983 CNT: 1 Found nalu of size 907 (still 129148 bytes in buf) 1441741267 983183 CNT: 2 Found nalu of size 1124 (still 128016 bytes in buf) 1441741268 3971 CNT: 3 Found nalu of size 1409 (still 126599 bytes in buf) 1441741268 27980 CNT: 4 Found nalu of size 3028 (still 123563 bytes in buf) 1441741268 51698 CNT: 5 Found nalu of size 7005 (still 116550 bytes in buf) 1441741268 68547 CNT: 6 Found nalu of size 9667 (still 106875 bytes in buf) 1441741268 89147 CNT: 7 Found nalu of size 10312 (still 96555 bytes in buf) 1441741268 109650 CNT: 8 Found nalu of size 19244 (still 77303 bytes in buf) 1441741268 138233 CNT: 9 Found nalu of size 19338 (still 57957 bytes in buf) 1441741268 160402 CNT: 10 Found nalu of size 31165 (still 26784 bytes in buf) 1441741268 172178 CNT: 11 Found nalu of size 19899 (still 6877 bytes in buf) 1441741268 195332 CNT: 12 Found nalu of size 25129 (still 105935 bytes in buf) 1441741268 213109 CNT: 13 Found nalu of size 24777 (still 81150 bytes in buf) 1441741268 236775 CNT: 14 Found nalu of size 24657 (still 56485 bytes in buf) 1441741268 259814 CNT: 15 Found nalu of size 24738 (still 31739 bytes in buf) 1441741268 274674 CNT: 16 Found nalu of size 24783 (still 6948 bytes in buf) 1441741268 300793 CNT: 17 Found nalu of size 24855 (still 106209 bytes in buf) 1441741268 314963 CNT: 18 Found nalu of size 18368 (still 87833 bytes in buf) 1441741268 339084 CNT: 19 Found nalu of size 17959 (still 69866 bytes in buf) 1441741268 365756 CNT: 20 Found nalu of size 17958 (still 51900 bytes in buf)
Where the first column represents the seconds and the second column represents the microseconds.
Since the LED was running at 2Hz and the camera at 48Hz I could directly relate the LED event to a specific video frame just by looking at the images (-> is the LED on or off?). This gave me two timestaps, the first of the LED event and the second of the capture of it.
The delay I got out of these was always in the range between 55ms and 75ms. The variation of 20ms makes sense since this is roughly our frame time. Depending on whether I captured the LED at the beginning (longer delay) or at the end of the exposure (shorter delay) the times vary.
Camera settings were: 48FPS, -g 24, -b 6000000, -profile high
I was wondering: Does the 55ms minimum latency come mostly from compression or from capturing? I looked through ways to capture directly and found some nice hack here: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=109137.
Here I did the same trick with the LED and was even a bit more lucky to capture this shot:
Notice that the LED is half-on half-off? This is due to the rolling shutter effect of the sensor. By accident I captured the LED right “in the middle” where it turned off. So the capture time of this shot is known to be exactly when the LED switched state (so here we do not have the 55-75ms jitter as in the case above). The delay between event and capture in this case was 38ms. Unfortunately, the camera runs in this hacky mode at 5Mpixel. So this is not exactly the 720p FPV scenario.
To validate my findings from above I also timestamped the hello_encode program included in Raspian. This gave me a compression latency of 10ms for a single 720p frame.
Although my different measurements are not completely comparable to each other I now have a rather clear view on the latencies:
Capture: 40ms Compression: 10ms FEC-Encoding+Transmission+Reception+FEC-Decoding+Display: Remaining ~50-100ms (to be confirmed)
One thing I did notice on my experiments with raspivid: It uses fwrite to output the h264 stream. Since this is usually buffered I noticed sometimes about 6KiB being stuck in the buffering. Now that size if far away from a whole frame size so it won’t cause one frame being stuck inside the pipeline. But nonetheless, it will probably cause a small delay.
Another thing I noticed is that the NALU header (0x00000001) is written by raspivid at the beginning of each new h264 frame. Since the decoder needs to wait until the next header to know the end of the frame, a latency of one frame is created (unnecessarily).
Maybe a variant of wifibroadcast that is directly integrated into raspivid would make sense. This could treat whole h264 frames as an atomic block, divide the frame into packages and transmit them. Right now the blocking is done at fixed intervals. This leads to the problem that additional data is stuck in the pipeline due to a partly filled block. I still need some time to think it over but there could be some potential here.