This week I spent some time familiarizing myself with the libsigrok components, and developed bindings for BeagleLogic into libsigrok. The development work walked me through non-blocking I/O, which I implemented into the BeagleLogic kernel module only to realize that it had a major bug that causes the client to sleep uninterrupted and froze my BeagleBone Black everytime. The bug got identified and fixed and after that non-blocking I/O worked as expected.
Memory Mapped Kernel Buffers
This was another optimization I made use of in the sigrok bindings. The sigrok library internally keeps logic data in packets and it is exchanged with the host application via pointers. I exploited this good opportunity to use the mmap() functionality I designed into the kernel driver, so the sigrok implmentation just passes a mmap()’ed pointer to the host application instead of read()-ing the data.
However the currently being read buffer still needs to be updated for poll() to work properly. For this reason I implemented dummy reads. When a NULL pointer is passed as the buffer to read(), it does not read any data from the buffer, and just updates the read state within the drivers so that poll() works fine.
The flow of data is like this:
libsigrok bindings signal the BeagleLogic module to start the sampling operation, and starts polling on the /dev/beaglelogic node using a GPollFD
Once the PRU fills up a data buffer, it sends an IRQ that wakes up the the polling thread and the beaglelogic_receive_data() callback in protocol.c executes
The callback sends a packet with the pointer to that segment of data required, and adjusts the internal offsets for the next callback
The process goes on till the sample limit has been reached. BeagleLogic also supports streaming but…
sigrok-cli on the BeagleBone Black
To test the results of the bindings, I ran a sample operation on BeagleLogic using the sigrok bindings and it worked fine. I was able to decode a wave file from my trusty audio player hardware 🙂 to wav right there on the BeagleBone Black using the article on the sigrok blog as a reference. I got 6 seconds of a wave file directly from the dump. This took ~750 seconds to process from RAM on the BeagleBone Black.
I also tested sampling into a 256 MB buffer in the RAM
The first week was spent well in prototyping the PRU firmware. Problems with the memory management [Maximum of 8 MB of memory] and dismal memory copy speeds [10-20 MB/s] with UIO and libprussdrv prompted me to escalate the back-end to the remoteproc kernel driver. It took me a while before I started off based on the WS28xx lighting firmware example to build up the necessary kernel driver infrastructure for BeagleLogic, and it’s now close to the first set of stress tests and actual data capture for the PRU.
Going into the kernel driver, here is a summary of what’s coming up in the next update [the experiment is with dummy data, the PRU’s aren’t engaged] :
The first week of the Summer of code began with the build of the core PRU firmware for data capture. I coded up firmware for both the Programmable Real-Time units operating in tandem to sample the pins and transfer it to the DDR memory directly. The code can be viewed at these links: PRU0 assembly code PRU1 assembly code
It makes good use of the XIN/XOUT broadside interface available for inter-PRU communication allowing movement a chunk of sampled data from PRU1 to PRU0 [currently 8 registers = 16 samples = 32 bytes] in one clock cycle. PRU0 then writes the data into the DDR memory in bursts of 32 bytes. Inter-PRU signalling is achieved through interrupts.
For buffer overflow/underflow detection, there is a global byte counter running in PRU0, which is moved alongwith the logic data “for free” via XOUT. PRU0 compares the received value of the counter with the value received from the previous interrupt, and if they differ by more than 32, then there has been an overflow. Also, when the ARM core is signalled for data, an interrupt counter is incremented. The counter is compared to its previous value, and the delta here again enables us to determine if an underflow has occurred.
This approach works fine for one-shot sampling, and I have been able to achieve all the way up to 4 MSamples of 12 pins running at 100 MHz [40ms], although the limitation on the maximum sample rate is likely to be the hardware of the BeagleBone Black in this case, remember there’s a 47pF capacitive load on the HDMI shared pins, and it is yet to be tested with actual hardware.
Currently there’s only 8 MB of memory shared with the PRU for storing samples, there’s an issue with the UIO kernel driver that prevents reserving more memory. The UIO driver will not be fixed, rather the issue will be addressed in the remoteproc interface of the PRU with the kernel. Until then, there is a workaround by adding mem=448m to the boot command line in uEnv.txt, to reserve the upper 64 MB of the memory for the PRU.
Reducing the sample rate is just inserting more NOPs into the sample loop to adjust the cycles. However, availability of more room between two sampling instructions turn out to be potential cycles for performing RLE (which will be implemented very soon), which seems achievable at 50 MHz.
Overwriting previous chunks of data works fine as well, stress tested upto overwrite 500x. See here for an example run.
The amount of samples collected is still limited by only the amount of memory available. The PRUs are quite fast and capable of sample rates like 100 MHz. The only bottleneck is to hold the data and process it before it runs out, and this is the target for the coming week.
The current test code will be adapted to the sigrok bindings once it is in good shape. The current stable code is available at the repo here, and the latest development is in the “prutest” branch here
This week I spent time understanding the remoteproc implementation and UIO implementations of the kernel drivers of the PRU in the kernel source tree.
One of the important decisions this week was the confirmation of UIO implementation for the PRU drivers. Although a remoteproc implementation is a better approach, the initial implementation of the core would be in UIO, and as the remoteproc infrastructure improves, BeagleLogic will eventually migrate. Since this would be a core change, it would have minimal impact on the functionality.
All set for May 19th when the coding period commences.
It’s been almost a week since I began work on BeagleLogic after my finals, and with another 12 days remaining for the Community Bonding period, here’s an update on the things accomplished so far (mostly administrative):
We now have beaglelogic.net currently as a mirror to this blog. In due course of time, the project will migrate to its own GitHub pages and website.
The GitHub repository for BeagleLogic has been created. This will carry the front-end HTML files mostly, with the actual libraries including sigrok, sigrokdecode and our BeagleLogic Server as submodules to it.
Set up a build environment for cross-compilation with the required libs and package config files copied in from the BeagleBone Black Debian System Image (23rd April beta release). Successfully built libsigrok, libsigrokdecode and sigrok-cli and executed on the BeagleBone Black and it decodes a bundled sample DS1307 I2C dump as shown below:
Create the presentation for my project as required by BeagleBoard.org. The draft slides can be viewed here, I will be recording, uploading and posting the video soon, and also updating it at the eLinux wiki page.
The cross compilation, though accomplished, has a few rough edges as I had to edit the pkg-config descriptors for each dependency library for a non-standard prefix. I now use PKG_CONFIG_SYSROOT_DIR and compile with a standard ‘/usr’ prefix with untouched pkgconfig files from the BeagleBone Black, but it seems that a few modifications are needed in the build script to build smoothly. Once this is accomplished, I would be publishing complete build instructions and a cross-compilation script to build sigrok from source for the BeagleBone or BeagleBone Black running Debian Wheezy.
There will be the next update on 14th May 2014, which would be Update 0, after which Update 1 will happen on 21st once coding begins on May 19th.
Stay connected for the latest updates on BeagleLogic.
[A brief introduction to Google Summer of Code [GSoC]: it’s an initiative by Google to encourage student participation in Open Source projects. Through this program, students work on projects and proposals under participating organizations for a period of 3 months, and Google pays a handsome stipend. The students are paired with experienced mentors from the organization who assist them and guide them during the period]
For the next 3 months, I will be working on building BeagleLogic, a software that will be leveraging the power of the Programmable Real-Time Unit [PRU] available on the BeagleBone Black (and the original BeagleBone) to convert it into a Logic Analyser that can help people understand and visualise the digital signals they are dealing with.
Matt Ranostay and Hunyue Yau from BeagleBoard.org will be my mentors for the project.
I will be working closely with the sigrok project, which is an open source signal analysis software, and which would act as the back-end of the project. It supports a large family of digital multimeters, oscilloscopes, logic analyzers, and also includes scriptable protocol decoding support. One of the first tasks in this project is to integrate support for the on-chip PRU under libsigrok, the skeleton driver has already been created in my fork of the library.
I’m looking forward to a great summer 🙂 and I would like to thank BeagleBoard.org community for their acceptance. I would also like to thank Google for the wonderful opportunity you provide to us students to connect with the Open Source Community through your Summer of Code program, and seek to gain experience that will help me in the future.
Future updates will be posted on this blog, or will be shared on a separate blog, if necessary.
For reference, and as a guideline for future GSoC applicants, I attach a copy of the proposal I submitted. The format of the proposal varies from organization to organization; yet the proposal would give a clear idea of the groundwork that is done in general before a proposal is submitted. The proposal has been subject to several revisions after feedback from the BeagleBoard.org community.
There are so many examples of audio playback using the on board CS43L22 audio DAC for the STM32F4Discovery available online. The official demo uses the USB Host functionality to read a raw audio file from a USB Flash Drive. Then there are examples that use SD cards in the SPI mode. The STM32F407 microcontroller on the STM32F4Discovery does pack in an SDIO bus for native interface with SD cards, however it turns out that the CS43L22 connects to the STM32F407 using two signal pins PC10 and PC12 that are required by the SDIO bus [SDIO_D2 and SDIO_CK], further, those signals are non-remappable. Therefore it seems it is not possible to get both the DAC and the SDIO interface working together.
I wondered if it was possible to physically remap the I2S3 pins that connects to the CS43L22, instead of the SDIO pins. A check at the datasheet confirmed that I2S3 signals I2S3_CK (initially at PC10, also SDIO_DAT2) and I2S3_SD (initially at PC12, also SDIO_CK) may be remapped through software to appear instead at pins PB3 and PB5 respectively.
I also had the following alternatives in mind:
Using the on-board DAC on the STM32(pins PA4 and PA5) with a headphone amplifier.
Now that the signals have been remapped, using an external audio DAC / codec to play back the audio. I felt this somehow defeated the purpose of having a complete audio DAC setup onboard.
Building a full custom design. This is a good long-term solution, and is in progress.
Since I also had a LCD connected via FSMC to my set-up, the LCD RD pin was mapped to FSMC_NOE (PD4), which also goes to the RESET pin on the CS43L22. This issue had been taken care of by making the LCD write-only [the pin was configured as general push-pull, not AF mode].
I however decided to solve the problem in situ by modding my STM32F4Discovery and rewiring the I2S3 signals onto PB3 and PB5. Armed with the CS43L22 and STM32F4Discovery datasheets and following the traces on the bottom layer of the PCB, I was able to cut the traces and then reroute them onto the PB3 and PB5. It was so simple, I wonder why it hadn’t been like that in the first place.
The F4 Discovery board undergoes the knife
However getting the modified board play back audio via the microSD card took longer than expected. This was due to my inability to perform simultaneous I2S and SDIO data transactions to enable streaming audio playback after the traces were cut and rerouted. I was able to play wave files though by sending the stream to the CS43L22 Audio DAC and waiting for the transfer to complete before requesting another chunk of audio data from the SD Card. This naturally led to significant stuttering in playback.
Debugging led me to the SDIO low level driver, which used to fail after the first frame was read and sent, with the error flag SDIO_STA_DCRCFAIL, meaning data corruption (CRC on received data failed). But how? The traces had been cut properly, there was no sign of electrical continuity.
My instinct was to try out SDIO in 1-bit bus mode. Circular audio playback worked perfectly this time. Therefore it could have something to do with the cut traces (PC10 and PC12). The traces were quite close and one of the cut ends now carried a 48 MHz clock signal.
Next was tried addition of series resistors (100 ohm) on the SDIO lines going from the card to the board. I targeted PC10 and PC12 specifically, and the issue seemed to be resolved even with 4-bit mode now. I was now successfully able to play back streaming audio from the SD Card.
Through further trials, I found out that:
Keeping 100 ohm resistors between the SD Card both PC10 (SDIO_CK) and PC12 (SDIO_D3), circular audio playback works
Keeping 100 ohm between SD Card and PC10 (SDIO_CK) but not on PC12 (SDIO_D3), it still works
Keeping 100 ohm between SD Card and PC12 (SDIO_D3) but connecting PC10 (SDIO_CK) directly to the SD card worked but locked up after a few seconds or so.
The SDIO_CK line needs suitable termination, especially with mine being a wire-wrap breakout board that puts everything together.
The sources will be made available soon.
The sources are now available at https://github.com/abhishek-kakkar/STM32F4SDIOAudio . The code is provided “as-is” as a reference that might be useful, compilation may break with the latest versions of ChibiOS and uGFX.
The STM32F4Discovery from STMicroelectronics is one of the mature, extremely affordable, and yet capable development boards available in the market [I say mature because it has been around for quite a while; since Q4 2011]. The board is equipped with a STM32F407VGT6 ARM Cortex-M4 Core with embedded FPU running at 168 MHz, 1 MB of Flash and 192 KB of SRAM, and adds an accelerometer, a CS43L22 Stereo Audio DAC (with a headphone jack), an MEMS digital microphone and USB OTG support. The demos provided by ST demonstrate its audio playback and recording capabilities, which I was quite impressed with when I tested it for the first time.
The STM32F407 Microcontroller also packs in support for seamless LCD interfacing (via FSMC), SDIO, DCMI and Ethernet, hence I built a circuit to augment the STM32F4Discovery board capabilities by adding support for attachment of:
TFT LCD module with a touchscreen. These are cheaply available via eBay.
a microSD card [power to the card controllable via a PMOS switch].
Camera module with a 8-bit HSYNC/VSYNC/PCLK interface.
[Planned] Ethernet support, either via ENC28J60 or a DP83848 Ethernet PHY
The circuit was initially wired in Summer 2013 and recently modified to support audio playback alongside SDIO [see article here]
Here’s a pic of the board up and running:
A double sided PTH dot matrix board is used for construction. The board was just in size to accomodate the STM32F4Discovery board on one side and the LCD with its 40-pin connector on the other making it a stacked 3-layer design. 0.1mm Magnet wire is used for all electrical connections.
The LCD side also has a 3.3V LDO for power distribution, a connector and charging circuit for a 1-cell Li-Ion battery. It also has a connector for a 10-DOF IMU for expansion.
Two 2.5cm spacers and two 6 cm metal studs mounted on the 4 mounting holes on the PCB act as an inclined stand for the assembly to be kept on a table top.
The LCD used is a 3.5″ HVGA LCD display [320×480 pixels] with a 4-wire touchscreen, mounted on a breakout board. The module used in it is a TRULY TFT1P7134-E, which uses HFFS (High Fringe Field Switching) technology, which is somewhat similar to In-Plane switching (IPS) and gives true-to-life colors. There is no color distortion noticed in the image, even when viewed at almost 180degree angles from any direction. It uses the Renesas R61581 TFT controller, which is an enhancement of the ILITek ILI9481 originally used in such displays (the instruction set is nearly compatible).
It connects to the 16-bit FSMC bus on the STM32F407 Microcontroller, which allows the LCD to be accessible as simply external memory, and enables DMA usage for data transfer to save CPU utilization. The module reset pin is connected to the MCU reset pin. The backlight is connected through a PMOS to TIM5 on the STM32F407. The R61581 supports in built LED backlight control, but I have disabled it and gone for the direct control in order to support more compatible displays.
The LCD breakout board contains a XPT2046 (ADS7846 compatible) touchscreen controller that interfaces the 4-wire touch panel via the SPI bus to the microcontroller. The PENIRQ output provides feedback on screen touch.
Almost any LCD from eBay which comes with a 40-pin connector will be able to connect to this connector and can be supported with suitable changes to firmware.
The uGFX library is used to interface to the LCD and touchscreen. It also provides a widget toolkit for UI design. The R61581/ILI9481 drivers adapted for this board have been contributed by me to the uGFX project.
The microSD card connects to the SDIO bus on the STM32F407. 4-bit bus mode is supported, and performance is stable at 48 MHz operation after recent addition of 100 Ohm termination on SDIO_CK (the clock line).
FatFS is used on the software side for file operations on the card. Read speeds of up to 9MB/s (theoretical maximum is 12MB/s) have been achieved with large buffer sizes.
Card detection is currently not implemented, but will be taken care of in future hardware revisions.
The STM32F407 contains a Digital Camera Interface (DCMI) bus that captures data sent in by a digital camera in the 8-bit format with external/inframe synchronization. Here only the HW synchronization via HSYNC/VSYNC/PCLK is used. The connector on the board was designed to be compatible with a OV7670 camera module, like the one available here
To drive the camera module, the XCLK is fed from PA8, which is a master clock output of the STM32F407. The MCO1 is configured to output a 16MHz clock using the HSI.
The control bus for the camera is SCCB, which does work with standard 400kHz I2C signals, with modifications in the way data is read, and delay between subsequent transfers.
I did try interfacing with a OV7670 camera module but I was largely dissatisfied with the results, they were terribly off-color.
The board also has a connector for a 10 DOF IMU board available on eBay. It consists of an ADXL345 accelerometer, L3G4200D gyroscope, HMC5883L compass and BMP085 pressure sensor, everything accessible via the I2C port through pins PB6 and PB9.
ChibiOS is different from many other real time kernels in the fact that it offers a tightly integrated hardware abstraction layer (HAL), which is very well written and easy to use. The HAL provides, among other things, off-the-shelf support for SD cards via both SDIO and SPI and integration with the FatFS library, GPIO configuration and asynchronous transfers on SPI and I2C and serial ports, with built-in support for DMA usage to offload transfers and save the CPU for computation. ChibiOS 2.6.3 is currently being used.
uGFX offers support for rich graphics and touch input. It also offers a GUI toolkit which can be used with a Windows/Linux simulator to develop GUI on embedded devices.
Both ChibiOS and uGFX are available under a dual license for free non-commercial usage and commercial licenses once used in production purposes.
A custom board file was created off the reference STM32F4Discovery board.h and board.c .
Drivers have been written for the CS43L22 using the ChibiOS STM32 HAL. Since I2S support is currently not implemented in the STM32 HAL, I had to implement I2S configuration & circular transfer using DMA.
The software development takes place using an Eclipse-based development environment on Windows 8, with GNU Tools for ARM Embedded Processors . OpenOCD is used to debug in-circuit using the onboard ST-LINK/V2.
This summarizes the complete development platform.
Schematics would be coming soon.