The first week of the Summer of code began with the build of the core PRU firmware for data capture. I coded up firmware for both the Programmable Real-Time units operating in tandem to sample the pins and transfer it to the DDR memory directly. The code can be viewed at these links:
PRU0 assembly code
PRU1 assembly code
It makes good use of the XIN/XOUT broadside interface available for inter-PRU communication allowing movement a chunk of sampled data from PRU1 to PRU0 [currently 8 registers = 16 samples = 32 bytes] in one clock cycle. PRU0 then writes the data into the DDR memory in bursts of 32 bytes. Inter-PRU signalling is achieved through interrupts.
For buffer overflow/underflow detection, there is a global byte counter running in PRU0, which is moved alongwith the logic data “for free” via XOUT. PRU0 compares the received value of the counter with the value received from the previous interrupt, and if they differ by more than 32, then there has been an overflow. Also, when the ARM core is signalled for data, an interrupt counter is incremented. The counter is compared to its previous value, and the delta here again enables us to determine if an underflow has occurred.
This approach works fine for one-shot sampling, and I have been able to achieve all the way up to 4 MSamples of 12 pins running at 100 MHz [40ms], although the limitation on the maximum sample rate is likely to be the hardware of the BeagleBone Black in this case, remember there’s a 47pF capacitive load on the HDMI shared pins, and it is yet to be tested with actual hardware.
Currently there’s only 8 MB of memory shared with the PRU for storing samples, there’s an issue with the UIO kernel driver that prevents reserving more memory. The UIO driver will not be fixed, rather the issue will be addressed in the remoteproc interface of the PRU with the kernel. Until then, there is a workaround by adding mem=448m to the boot command line in uEnv.txt, to reserve the upper 64 MB of the memory for the PRU.
Reducing the sample rate is just inserting more NOPs into the sample loop to adjust the cycles. However, availability of more room between two sampling instructions turn out to be potential cycles for performing RLE (which will be implemented very soon), which seems achievable at 50 MHz.
Overwriting previous chunks of data works fine as well, stress tested upto overwrite 500x. See here for an example run.
The amount of samples collected is still limited by only the amount of memory available. The PRUs are quite fast and capable of sample rates like 100 MHz. The only bottleneck is to hold the data and process it before it runs out, and this is the target for the coming week.
The current test code will be adapted to the sigrok bindings once it is in good shape. The current stable code is available at the repo here, and the latest development is in the “prutest” branch here
Next update due on June 4th.