How to Build the Z-Wave Bootloaders

You’ve finished designing a new PCB for your Z-Wave product and are now ready to start testing with your own custom firmware. Well, the first thing you need are bootloaders. The bootloader is a standalone application that handles upgrading the application firmware among other things. Downloading a bootloader into a Silicon Labs devkit is easy, just click on “run”. But for custom PCBs you must build it from the source code. There are two types of bootloaders needed: One for End Devices which will receive the updated firmware image over the radio, the other is for Controllers which will receive the image via the UART wires. These are similar but there are a few important differences. This post is specifically for the Z-Wave 800 series and GSDK 4.4.1. Hopefully the process will be a little easier in a future release.

End Device Bootloader – OTA

Fortunately the End Device bootloader is easy with the release of GSDK 4.4.1 (Z-Wave 7.21.1). Plug your board into a ProKit (WSTK) via the Tag-Connect connector. The WSTK should show up in Simplicity Studio v5 (SSv5) Launcher Perspective with “custom board” and the EFR32ZG23 part number on your board. Click on Detect Target if not. Ensure the Debug Mode is set to Mini (or OUT for WSTK1). Select the debug adaptor then start the New Project Wizard via File->New. Make sure the latest SDK is selected and GCC12 (not GCC10). Click on Z-Wave to filter the list and uncheck Solution Examples. Scroll down to “Bootloader – SoC Internal Storage (For Z-Wave Applications)” and select that. Click Next then rename the project to something more meaningful with the chip (ZG23A) and GSDK version (_411 for 4.1.1) for example. Build the project which should complete without error. The bootloader has a lot of security options but I recommend using the defaults. If you have a complex device and need additional code space, you can relocate the OTA buffer to an off-chip serial flash chip which will free up nearly 200K of FLASH space but no additional RAM.

Controller Bootloader – OTW

This example uses a custom PCB and the EFR32ZG23A (mid-security) chip. I start with this combination as that is what most customers will start with. Using one of the devkits causes SSv5 to automagically “know” all sorts of things about the board and what GPIOs are wired to what and what other features are available. When you pick this chip, there are zero pre-built “demos” as all of the current devkits have B (high-security) parts on them.

Start the New Project Wizard via File->New. Ensure the IDE/Toolchain is GCC12 and not GCC10. scroll down and click on the “Bootloader – NCP UART XMODEM (for Z-Wave Applications)” then click on Next. This will create a project called bootloader-storage-internal-single-zwave then append the GSDK version to that which in this case is 4.4.1 so add “ZG23A_441” to the project name to keep track of which chip and which GSDK this is for. Click on Finish then build the project. This will fail because SL_SERIAL_UART_PERIPHERAL_NO is not defined as well as several other things related to the UART.

Clearly the UART needs to be configured but a guide is needed to figure out what that might be. Plug in a Devkit and build the same project but with a different name. This project builds just fine so search for SL_SERIAL_UART_PERIPHERAL_NO. btl_uart_driver_cfg.h has a define for this for USART0. The same file in the custom project says “bootloader UART peripheral not configured”. Obviously somehow it needs to be configured. The .slcp file has Platform->Bootloader->Drivers->Bootloader UART Driver configured.

We don’t need RTS/CTS as they are not used. Configure the custom project with USART0, RX=PA09 and TX=PA08 then the project compiles. This should be the default since you MUST use a UART for XMODEM. Maybe a future release will fix this! There are other configuration items under the Bootloader Core component but generally these can remain at the defaults.

Conclusion

Bootloaders are critical to being able to field-upgrade the firmware of any Z-Wave product which is mandatory for certification. See the Bootloader Users Guide (UG489) for more details on the many options available. The process to create the Z-Wave bootloaders is a bit more complicated than it should be but I hope this guide will bring your Z-Wave product to market a little quicker. Let me know what you think by commenting below.

Team Z-Wave Development Using Git

Silicon Labs Simplicity Studio v5 (SSv5) has a steep learning curve but once you’re up the curve it can accelerate an IoT firmware development. However, sharing the project among several engineers isn’t as straightforward as it should be. Fortunately it is actually quite easy once you know the trick which I explain below.

Step 1 – Create the Repo

Create the repository using Github or your own private server. Typically this is done via a browser which also sets various options up such as the language and the license. Once this has been created, copy the name of the repository to use in the next step.

Step 2 – Clone the Repo locally

Clone the repo onto your computer using the typical “git clone HTTPS://github.com/<gitusername>/<projectName.git>“. Choose a folder on your computer that is convenient. I recommend the folder be under the SSv5 workspace folder which will make finding it later a little easier.

Step 3 – add a .gitignore file

Create a file at the top level of the repo to ignore the files you do not need to put under source code control. Use the lines below and include any other files or folders as needed. You may want to include the .hex, .gbl, .map, and .axf files which are under the GNU* folder or copy them to another folder so you have the binary files in case building the project proves to be difficult. Note that I am NOT checking in the SDK which is huge and Silabs keeps even quite old versions on github and via their website. Thus you don’t need to keep a copy of the SDK on your local servers – but you can if your are that kind of person.

################ Silabs files to ignore #####################
# Ignore the Build Directory entirely
GNU*
# Other SSv5 files to ignore
.trash
.uceditor
.projectlinkstore
*.bak

Step 4 – Create the SSv5 Project

Create the SSv5 project within this folder. Typically this is done using the Project Wizard or selecting one of the sample applications. Be sure to locate the project within the repo folder.

Step 5 – Commit the Files

Commit the files to the repo using:

git add *
git commit -am "Initial checking"
git push

At this point you can either clone the repo into a different folder to see if it works or have a team member clone it onto their computer. Try building the project to see if there are any missing files.

Step 6 – Import the Newly Cloned Repo into SSv5

This is the tricky bit! We’re going to Import the project into SSv5 but the TRICK is to import it into the cloned repo folder. By default, SSv5 will make a COPY of the project when importing. The problem with that is that you then lose the connection to the git repo which is the whole point!

Use “File – Import” then browse to the cloned git repo folder. The project name should show up with a Simplicity Studio .sls file. Select this file by clicking on it then click Next.

Then the next screen pops up. Ensure the proper compiler is selected for the project! GCC10 or GCC12! These settings should come from the .sls so you shouldn’t need to change them.

Click on Next

THIS IS THE MOST IMPORTANT STEP! In the next screen, UNCHECK the “Use Default Location” button! Click on Browse and select the repo folder.

Click on Finish. Then check that the project builds properly.

Team members can now work collaboratively on the project and manage the check in/out and merging using standard git commands.

When the project is complete, be sure everything is checked in and the repo is clean, then in SSv5 right click on the project and select delete. But do not check the “delete project contents on disk” checkbox unless you want to also delete the local repo. This removes the project from the workspace in SSv5 but leaves the files where they are. You can clean up the files later.

The key to using git with SSv5 is to UNCHECK the Default Location button during the import. If you leave that checked, or browse to the wrong folder, SSv5 will make a COPY of all the files and you lose the connection to the git repo.

Installing UART Drivers in a Z-Wave Project

I have a simple problem – I have a sensor module with a simple uart interface that I need to connect to my Z-Wave project. Ther UART is used to configure the sensor and then it will send sensor readings at regular intervals. Simple right? Turns out the path is not so simple for Silicon Labs Simplicity Studio. Let’s go step by step through my journey to implement this interface on the Silicon Labs EFR32ZG23.

Typical UART Driver

Embedded engineers expect the UART driver to consist of the following functions:

  • UART_INIT(Baud, data_bits, stop_bits, parity, options)
    • Initialize the uart with the desired baud rate and options
  • UART_GETCHAR()
    • Return 1 character
  • UART_PUTCHAR(char)
    • Send 1 character
  • UART_PUTS(string) – nice to have
    • Put a string of characters – simply calls UART_PUTCHAR for each char in the string

Since most MCUs contain multiple UARTs, these basic functions need a pointer to which UART is being accessed. Typically a pointer to the desired UART is added to each function and then these are wrapped in macros with the UART number in the macro name such as: UART0_INIT and EUSART2_PUTCHAR. Extensions from these basic functions generally involve buffering data or being blocking or non-blocking (IE: polled or interrupt driven) and selecting GPIOs. Pretty standard stuff, easy to understand, easy to use, does what you expect with minimal code and effort.

Note that these functions are independent of the hardware. The engineer does not need to read the manual (RTFM) to be able to use them. If any of the fancy features of the UART are needed, the engineer will RTFM and then write the appropriate registers with the appropriate values to enable the desired feature. Hopefully the manual is detailed enough for the engineer to get the desired function to work without a lot of trial and error (unfortunately this is rarely the case). But less than 1% of the engineers will ever need those fancy features which is why these simple functions are the foundation of most UART drivers.

Now follow my efforts to interface this simple UART sensor with an EFR32 with the expectation there will be an API similar to the Typical UART Driver described above.

Step 1: What do I need? – 5 minutes

RTFM of the sensor to find the basic UART interface requirements. The manual states it uses a baud rate of 256000, 1 stop bit, no parity. The manual was clearly written by someone who did not have English as their first language as it also states “the sensor uses small-end format for serial communication” which I’m assuming means little-endian? Could it mean the bit ordering of each byte is LSB first? The limited information in the manual means I’ll be doing some trial-and-error debugging. The manual is thin, only 23 pages so it only took about five minutes to find this information.

Step 2: What does the EFR32 have? – 10 minutes

I’m using the Silicon Labs EFR32ZG23 which has three EUSARTs (enhanced UART) and one USART. All four have lots of features including SPI as well as normal UART functionality. The datasheet describes EUSART0 as being able to operate in lower power modes and USART0 (apparently not “Enhanced”) as having IrDA, I2S and SmartCard features. None of these extra features are needed for my application. EUSART0 and EUSART2 have limited GPIO connections thus I don’t want to use either of those as I want the maximum pinout flexibility offered by EUSART1. The EFR32ZG23 datasheet is 130 pages but searching for “UART” (then “USART” since UART has only 1 hit) got me this much information in about 10 minutes. Later I stumbled on the fact that the EUSARTs have a 16 byte hardware FIFO on both the send and receive side vs. the UART which just has two byte. This is deep in the xG23 Reference Manual and not mentioned in the datasheet. A 16 byte FIFO means the EUSART is a winner! Why bother including the USART?

Step 3: Open Simplicity Studio and Explore APIs – 60 Minutes

Open Simplicity Studio v5 (SSv5) which is the IDE for developing and debugging code for the EFR. I had already built a basic project starting with the Z-Wave SwitchOnOff sample app and was able to toggle an LED on my custom PCB. I had already created a bootloader, updated the SE, built and customized the sample app, joined a Z-Wave network and been in the debugger getting other parts of the project working. Now it was time to talk to the sensor so the first place to look is to click on the .SLCP file, then Software Components then search for “UART”. SSv5 then gives a long list of various drivers for various platforms, protocols, and applications. But all I want are the 4 functions listed above, why is this so complex already? There are no UART drivers for the Z-Wave protocol which is unfortunate as in the 500 series we did have the basic 4 functions prior to the move into SSv5. The first entry is Application->Utility->Simple Communication Interface (UART). The description talks about it being used for NCP communication but I’m not using NCP in this case so pass this one up. Next is Platform->Bootloader with four flavors of drivers but they are part of the bootloader and not the application so maybe not what I need? Next is Platform->Driver->UART which looks promising which I’ll discuss it in more detail shortly. Next is Services->Co-Processor Communication->secondary device->Driver which again has a single line description for a co-processor but no details of why I would use it so pass on this one too. Next is Third Party->Amazon FreeRTOS->Common I/O IoT UART which while I am using FreeRTOS I am not using the Amazon flavor. Looking at the link in the description there are similar functions to the 4 I want but with somewhat more OSish sounding names. Finally there are WiSun, Zigbee and Silicon Labs Matter drivers each with just 1 line descriptions and seem to be specific to the protocol. They also seem to be specific to Silicon Labs DevKit boards but I’m not using a DevKit, I have my own custom board so these don’t seem to be usable either. This is a case of information overload and simply sorting thru the options seems like a waste of time. Why are there so many flavors of this common API?

I had enabled DEBUGPRINT in the Z-Wave sample app so I was already aware of the IO Stream EUSART driver which is what the printf functions sprinkled thru the Z-Wave code use. With all of these options I spent about 1 hour reading just the section headings of the documentation looking for the simple function I need. Next step is to “install” a driver and try it out.

Step 4: UART Driver – IO Stream – 60 minutes

Since DEBUGPRINT uses the IO Stream driver, I assumed that would be the way to go for my use case as it should share most of the code already. I clicked on the + Add New Instances in SSv5, selected the desired baud rate, start/stop/parity and named the instance to differentiate it from the “vcom” instance used by the DEBUGPRINT utilities. The documentation is online and is versioned so I browsed thru that looking for my 4 functions. There is an Init routine and an IRQ handler but I was looking to avoid interrupts at least until I have have finished the trial-and-error of getting the sensor to send anything intelligible. I spent an hour trying to understand how this works without a PUTCHAR when I finally understood that only the “last stream initialized” “owns” the PRINTF functions. Since this driver uses DMA and interrupts it is way more complicated than I need and I don’t want to interfere with the DEBUGPRINT utilities. Uninstalled the IO Stream driver instance after wasting 60 minutes RTFMing and trying to follow the code.

Step 5: UART Driver – uartdrv – 6 hours

Going back to SSv5 and looking thru the options for UART Drivers I found Platform->Driver->UART->UARTDRV EUSART and installed it. SSv5 let me enter the board rate, parity, stop bits and several other options. I picked EUSART1 since it is able to drive all GPIOs and connected the proper GPIOs from my schematic. SSv5 installed a bunch of files mostly in the config folder and specifically the sl_uartdrv_eusart_mod_config.h which held the baud rate and various other UART settings. Next step is to look thru the documentation for my 4 simple functions. Unfortunately there are 27 functions and nothing obvious being the ones I need. But, I sallied forth and spent time reading the manual as well as looking thru the code and eventually was able to add enough code to my project to receive at least a few bytes. But the bytes I received don’t match the values the sensor should be sending. The other problem is that the UARTDRV_Receive function appears to wait for the buffer I gave it to fill up before returning. But I need to have each byte returned to me so I can decode which command is being sent and each one is a different size. These functions are way too complicated, the documentation is sparse and confusing and the API doesn’t have usable functions for my (or any?) application. I spent the better part of a day trying to get this “driver” to work but in the end it was taking me more time to figure out how to use it than if I just wrote my own.

Step 6: Peripheral – EUSART – 3 hours

I went back to SSv5 and searched for USART instead of UART and this time I found Platform->Peripheral->EUSART. This low level API is already installed I assume via the IO Stream for the DEBUGPINT utilities. The documentation lists 23 functions in the API but many of these are specific to the modes I’m not using like SPI or IrDA so I can simply ignore those. There is an Init function and EUSART_Rx and EUSART_Tx which are basically GetChar and PutChar. The API does not setup the clocks or the GPIOs but the examples explain how to do that. The documentation is clouded with all the many features like 9-bit data and SPI making it harder to decipher how to do basic UART operations.

Looking at the EUSART_UartInitHf function it becomes immediately obvious that this code is trying to be all things for all applications on all Silabs chips. The code both doesn’t do enough and does way too much at the same time! For example, the code writes all the registers back to their default value. Which is necessary if the UART is in an unknown state and is certainly the safe thing to do. However, this type of thing is rarely ever needed as the application will power up the chip, set the UART configuration once, and then never change it again. Maybe the baud rate will change but that function will write the necessary registers without needing everything to be set to the default. Not only is there a ton of code but there are several good size structures which are wasting both FLASH and RAM. Embedded systems are defined by the limited resources which include CPU time, FLASH and RAM. Drivers should be efficient and not waste these valuable resources. If an application needs to reset every register in the UART, then let the 0.001% of customers write their own! I gave up after spending another morning reading and trying stuff out and trying in vain to follow the code. Too complicated!

Step 7: Write My Own UART Driver – 2 hours

I spent less than two hours writing my own UART drivers. Why would I do that? Because it was easy and does exactly what I want with easy to understand code that anyone can follow. My driver is not universal and does not handle every option or error condition. But it does what 99% of applications need. The code is less than a few dozen bytes and uses only a few dozen bytes of RAM. The more lines of code, the more bugs and this code is so small you can check it by inspection.

Most of those two hours were spent reading documentation and trying to find where the interrupt vectors are “registered”. Turns out startup_efr32zg23.c in the SDK assigns all the vectors to a weak Default_Handler. All I had to do to “register” the ISR is give the function the proper name and the compiler plugs it into the vector table for me.

I did spend perhaps another couple of hours figuring out how to add an event to the ZAF_Event_Distributor, adding the event and some additional code to search for the proper sensor data in the byte stream. The key is in the ISR to use ZAF_EventHelperEventEnqueueFromISR to put an event into the FreeRTOS queue which is then processed in the zaf_event_distributor_app_event_manager function in SwitchOnOff.c.

Below is UART_DRZ.C which has the three functions needed and an interrupt service routine to grab the data out of the EUSART quickly and the let the application know there is data available.

/*
* @file UART_DRZ.c
* @brief UART Driver for Silicon Labs EFR32ZG23 and related series 2 chips
*
* Created on: Jul 10, 2023
* Author: Eric Ryherd - DrZWave.blog
*
* Minimal drivers to initialize and setup the xG23 UARTs efficiently and provide simple functions for sending/receiving data.
* Assumes using the EUSARTs and not the USART which has limited functionality and only a 2 byte buffer vs 16 in the EUSART.
* Assumes high frequency mode (not operating in low-power modes with a low-frequency clock)
* This example just implements a set of drivers for EUSART1. The code is tiny so make copies for the others as needed.
*/

#include <em_cmu.h>
#include <zaf_event_distributor_soc.h> // this is new under GSDK 4.4.1
#include "UART_DRZ.h"
#include "events.h"

// Rx Buffer and pointers for EUSART1. Make copies for other EUSARTs.
static uint8_t RxFIFO1[RX_FIFO_DEPTH];
static int RxFifoReadIndx1;
static int RxFifoWriteIndx1;
//static uint32_t EUSART1_Status;

/* UART_Init - basic initialization for the most common cases - works for all EUSARTs
* Write to the appropriate UART registers to enable special modes after calling this function to enable fancy features.
*/
void UART_Init( EUSART_TypeDef *uart, // EUSART1 - Pointer to one of the EUSARTs
uint32_t baudrate, // 0=enable Autobaud, 1-1,000,000 bits/sec
EUSART_Databits_TypeDef databits, // eusartDataBits8=8 bits - must use the typedef!
EUSART_Stopbits_TypeDef stopbits, // eusartStopbits1=1 bit - follow the typdef for other settings
EUSART_Parity_TypeDef parity, // eusartNoParity
GPIO_Port_TypeDef TxPort, // gpioPortA thru D - Note that EUSART0 and 2 have GPIO port limitations
unsigned int TxPin,
GPIO_Port_TypeDef RxPort,
unsigned int RxPin)
{

// Check for valid uart and assign uartnum
int uartnum = EUSART0 == uart ? 0 :
EUSART1 == uart ? 1 :
EUSART2 == uart ? 2 : -1;
EFM_ASSERT(uartnum>=0);

CMU_Clock_TypeDef clock = uartnum == 0 ? cmuClock_EUSART0 :
uartnum == 1 ? cmuClock_EUSART1 : cmuClock_EUSART2;

if (uartnum>=0) {
// Configure the clocks
if (0==uartnum){
CMU_ClockSelectSet(clock, cmuSelect_EM01GRPCCLK); // EUSART0 requires special clock configuration
} // EUSART 1 and 2 use EM01GRPCCLK and changing it will cause VCOM to use the wrong baud rate.
CMU_ClockEnable(clock, true);

// Configure Frame Format
uart->FRAMECFG = ((uart->FRAMECFG & ~(_EUSART_FRAMECFG_DATABITS_MASK | _EUSART_FRAMECFG_STOPBITS_MASK | _EUSART_FRAMECFG_PARITY_MASK))
| (uint32_t) (databits) // note that EUSART_xxxxxx_TypeDef puts these settings in the proper bit locations
| (uint32_t) (parity)
| (uint32_t) (stopbits));

EUSART_Enable(uart, eusartEnable);

if (baudrate == 0) {
uart->CFG0 |= EUSART_CFG0_AUTOBAUDEN; // autobaud is enabled with baudrate=0 - note that 0x55 has to be received for autobaud to work
} else {
EUSART_BaudrateSet(uart, 0, baudrate); // checks various limits to ensure no overflow and handles oversampling
}

CMU_ClockEnable(cmuClock_GPIO, true); // Typically already enabled but just to be sure enable the GPIO clock anyway

// Configure TX and RX GPIOs
GPIO_PinModeSet(TxPort, TxPin, gpioModePushPull, 1);
GPIO_PinModeSet(RxPort, RxPin, gpioModeInputPull, 1);
GPIO->EUSARTROUTE[uartnum].ROUTEEN = GPIO_EUSART_ROUTEEN_TXPEN;
GPIO->EUSARTROUTE[uartnum].TXROUTE = (TxPort << _GPIO_EUSART_TXROUTE_PORT_SHIFT)
| (TxPin << _GPIO_EUSART_TXROUTE_PIN_SHIFT);
GPIO->EUSARTROUTE[uartnum].RXROUTE = (RxPort << _GPIO_EUSART_RXROUTE_PORT_SHIFT)
| (RxPin << _GPIO_EUSART_RXROUTE_PIN_SHIFT);
}

RxFifoReadIndx1 = 0; // TODO - expand to other EUSARTs as needed
RxFifoWriteIndx1 = 0;

// Enable Rx Interrupts
EUSART1->IEN_SET = EUSART_IEN_RXFL;
NVIC_EnableIRQ(EUSART1_RX_IRQn);
}

/* EUSART1_RX_IRQHandler is the receive side interrupt handler for EUSART1.
* startup_efr32zg23.c defines each of the IRQs as a WEAK function to Default_Handler which is then placed in the interrupt vector table.
* By defining a function of the same name it overrides the WEAK function and places this one in the vector table.
* Change this function name to match the EUSART you are using.
*
* This ISR pulls each byte out of the EUSART FIFO and places it into the software RxFIFO.
*/
void EUSART1_RX_IRQHandler(void){
uint8_t dat;
uint32_t flags = EUSART1->IF;
EUSART1->IF_CLR = flags; // clear all interrupt flags
NVIC_ClearPendingIRQ(EUSART1_RX_IRQn); // clear the NVIC Interrupt

for (int i=0; (EUSART_STATUS_RXFL & EUSART1->STATUS) && (i<16); i++) { // Pull all bytes out of EUSART
dat = EUSART1->RXDATA; // read 1 byte out of the hardware FIFO in the EUSART
if (EUSART1_RxDepth()<RX_FIFO_DEPTH) { // is there room in the RxFifo?
RxFIFO1[RxFifoWriteIndx1++] = dat;
if (RxFifoWriteIndx1 >= RX_FIFO_DEPTH) {
RxFifoWriteIndx1 = 0;
}
} else { // No room in the RxFIFO, drop the data
// TODO - report underflow
break;
}
// TODO - add testing for error conditions here - like the FIFO is full... Set a bit and call an event
}
// TODO - check for error conditions
zaf_event_distributor_enqueue_app_event(EVENT_EUSART1_CHARACTER_RECEIVED); // Tell the application there is data in RxFIFO
}

// Return a byte from the RxFIFO - be sure there is one available by calling RxDepth first
uint8_t EUSART1_GetChar(void) {
uint8_t rtn;
rtn = RxFIFO1[RxFifoReadIndx1++];
if (RxFifoReadIndx1>=RX_FIFO_DEPTH) {
RxFifoReadIndx1 = 0;
}
return(rtn);
}

// Put 1 character into the EUSART1 hardware Tx FIFO - returns True if FIFO is not full and False if FIFO is full and the byte was not added - nonblocking
bool EUSART1_PutChar(uint8_t dat) {
bool rtn = false;
if (EUSART1->STATUS & EUSART_STATUS_TXFL) {
EUSART1->TXDATA = dat;
rtn = true;
}
return(rtn);
}

// number of valid bytes in the RxFIFO - use this to avoid blocking GetChar
int EUSART1_RxDepth(void) {
int rtn;
rtn = RxFifoReadIndx1 - RxFifoWriteIndx1;
if (rtn<0) {
rtn +=RX_FIFO_DEPTH;
}
return(rtn);
}

The corresponding UART_DRZ.h file:

/*
* UART_DRZ.h
*
* Created on: Jul 10, 2023
* Author: eric
*/

#ifndef UART_DRZ_H_
#define UART_DRZ_H_

#include <em_eusart.h>
#include <em_gpio.h>

void UART_Init( EUSART_TypeDef *uart, // Pointer to one of the EUSARTs
uint32_t baudrate, // 0=enable Autobaud, 1-1,000,000 bits/sec
EUSART_Databits_TypeDef databits,
EUSART_Stopbits_TypeDef stopbits,
EUSART_Parity_TypeDef parity,
GPIO_Port_TypeDef TxPort,
unsigned int TxPin,
GPIO_Port_TypeDef RxPort,
unsigned int RxPin);

int EUSART1_RxDepth(void);
uint8_t EUSART1_GetChar(void);

// Rx FIFO depth in bytes - make it long enough to hold the longest expected message
#define RX_FIFO_DEPTH 32

#endif /* UART_DRZ_H_ */

The code above was updated 3/28/2024 to match the GSDK 4.4.1 release. I plan to release this code under github soon so it can be easily incorporated into any project and kept more up-to-date.

Conclusions

For all the talk of “Modular Code”, “Code ReUse”, “APIs” and of course AI generated code, embedded systems are unique due to their limited resources. Limited resources means you cannot throw generic “modular” code at a problem if it bloats the resulting application. Embedded Engineers are also a limited resource and in short supply. Reusing tested, well written, well documented code is a huge time saver. However, if the problem is simple, it may be more efficient to write it yourself. Certainly that was the case in this scenario.

I’ve mentioned before that embedded engineers are usually at least 2 weeks (if not 2 months!) late in a project from the very first day. By the time marketing, finance and management decide on the product features, fund it, and allocate engineering resources, the project is already behind schedule. Chip vendors need to make the engineers job easier by providing APIs that serve 90% of the needs without obfuscating the calls under a ton of features few people will ever use. The API needs to be intuitive and not require hours of reading manuals or randomly trying stuff until they work. Hide the complexity and provide easy to use functions with concise but detailed documentation with a few examples (I love to cut and paste!).

On the hardware side, I have a rule of thumb that if a customer isn’t willing to pay an extra nickel per chip for a feature, then do not include it! The time it takes to spec, design, code, document, validate in simulation, validate the silicon, document again (due to invariable changes), write silicon test programs, run extra test vectors on every chip, and finally develop training, (WHEW!) far outweigh the brain-fart feature someone thought might be cool. When instancing multiple copies of a peripheral, they should all be the same. Then you can reuse all of the above whereas if they are different you have to make special versions of everything, especially the documentation. Users will first assume they are all the same. They will strongly dislike the fact that one EUSART can route to all GPIOs whereas the others have limitations. Why is there a USART in the xG23 family? Why are the EUSARTs each just a little different? Why are there so many options in the EUSART? Does anyone use all these features? Will EVERYONE pay an extra nickel for the chip for features they don’t need? Obviously there are some features that are required, some that are expected, but there are a bunch that could be dropped.

Z-Wave Watchdog Timer Best Practices

WatchDogVirtually all embedded systems must run 24 x 7 x 365 x many many years without ever being rebooted. Since there is no one there to “press the reset button” if the device fails, the watchdog timer is there to do just that. The 500 series Z-Wave chips from Silicon Labs have a watchdog timer and the example code provides a very minimal use of the watchdog timer. However, the minimal use in the example code is not sufficient to provide a robust watchdog for embedded Z-Wave devices. This post explains some rules and methods to code a robust watchdog timer.

Long time embedded expert Jack Ganssle has a great article on Watchdog timers. He describes the use of a watchdog timer on the Clementine spacecraft where a fault in the system caused the spacecraft to dump virtually of its fuel resulting in the loss of the mission. The lead software engineer had wanted a watchdog but the designers decided not to include it. Jacks example shows how important it is to spend at least some time coding a robust watchdog for our IoT devices. While our devices aren’t controlling multi-million dollar spacecraft, we are coding light switches that are hardwired into the wall and cannot be easily rebooted. Try telling the customer to go into the basement and toggle the power to his entire house to reboot the light switches!

What is a Watchdog?

A watchdog timer is a timer that runs constantly. Typically a complex combination of events resets (or “kicks”) the watchdog timer every now and then, usually every few milliseconds. If the combination of events ever gets stuck, the timer will continue to run. If the watchdog timer “times out”, the system is reset – basically the reset button is pushed! Your embedded system reboots and keeps on running. Generally no one even realizes it has rebooted (I’ll discuss that problem in more detail shortly).

WatchdogTimerThis diagram shows the Watchdog timers value which is constantly counting up. Every time the Watchdog is “kicked”, the counter is reset to zero. Somewhere in your code the ZW_WatchDogKick() routine is called which resets the watchdog timer. Sometimes this reset condition happens on a nice regular basis, sometimes it happens at varying times as shown by the level of the timer. The key is the timeout threshold has to be longer than any normal operating condition. If a fault condition occurs, the timer keeps on counting up until the threshold is reached and then the system is reset. When the watchdog timer fires, the Z-Wave chip goes thru a full reset just as if power had been removed and reapplied. Your embedded system is back up and running as if nothing had happened.

SiLabs Sample Code = Minimal Watchdog

The SiLabs sample code has the following implementation of the watchdog:

BYTE ApplicationInitSW(ZW_NVM_STATUS nvmStatus) {
...
#ifdef WATCHDOG_ENABLED
 ZW_WatchDogEnable();
#endif
} 

void ApplicationPoll(void){
#ifdef WATCHDOG_ENABLED
 ZW_WatchDogKick();
#endif
}

The sample code has the good implementation practice of putting the Watchdog code inside #defines so it can be easily enabled/disabled. Unfortunately it blindly kicks the dog every ApplicationPoll without checking any other conditions. ApplicationPoll is called roughly every few hundred microseconds and a lot of fault conditions can exist and ApplicationPoll will still be called. With this implementation the only way the watchdog is going to fire is if there is a catastrophic failure and ApplicationPoll is no longer being called. While this implementation is better than nothing, it won’t reset the system in many cases where the device has become unresponsive. This is where you come in, you have to add more code to the watchdog algorithm. It may be easy to just use what SiLabs provides, but for a robust product you really need to spend some time adding your own conditions to the watchdog algorithm.

A Better Watch Dog Example

Writing good watchdog code requires some significant thought and testing. The possible sources of failure need to be discussed with members of the team and with other Z-Wave developers who are fighting the same fight (thus the need for this blog). I can provide a few guidelines to include in your analysis but this is not a complete solution. Only you know all the possible failure modes of your product and that requires some serious thought and analysis.

Mutex Gets Stuck

The most common failure I have seen is the fact that the SiLabs provided Application Framework (AF) mutex can get stuck. When the mutex is stuck, it most often results in the device still able to receive Z-Wave traffic but often can’t respond. If the device is power cycled, then it returns to full operation. So often this failure goes unnoticed both in testing and in actual use.

What is the mutex you ask? The mutex is a simple flag in the AF that prevents the code from overwriting the Send Buffer while a message is currently being sent over the radio. When a GET command comes in, the AF will call a command class handler to handle the GET and build a REPORT frame in memory. When ready to send the frame, the AF will call pTxBuf=GetResponseBuffer() to get a buffer for the radio to send. There is only one buffer so if the buffer is already in use, you get a NULL pointer back and will have to wait and send the frame later.  This in general works fine as long as frames don’t come in too fast. But in a large network with lots of repeated and re-routed frames you will occasionally get a bunch of GETs quickly and it is possible for the REPORTs to get cross wired and end up locking up the mutex for a frame that will never be sent. If the code then doesn’t properly release the buffer, the mutex is stuck. The Application Framework code is known to lock the mutex occasionally so you must code around this problem. The easiest solution to this rare event is to ensure the watchdog is watching the mutex and simply reboot if it gets stuck for too long.

My solution is to have a counter that counts up once per second in ApplicationPoll anytime ActiveJobs() is true (in SDK 6.81.xx its now called ZAF_mutex_isActive()). ActiveJobs is true anytime a buffer is in use and false when all the buffers are free. There are actually two buffers, one for response frames (REPORTs sent as a result of a GET) and a second buffer for request frames (unsolicited notifications).

Application Specific Reasons

Beyond the mutex you must think long and hard about application specific failure conditions. The most obvious is that the device has not received or sent a frame in 25 hours. Most hubs will poll a device at least a couple of times per day to make sure it is still alive. So if there has been no traffic in a day, maybe something is stuck and a reboot is in order. Plus if nothing has happened in a day then probably no one will notice the reboot (which only takes 1.5 seconds). You do have to be careful that some other part of the application isn’t impacted as a result of the reboot. For example, if you are a light switch and by default you turn the light off on a reboot, then people will be really annoyed if the light randomly turns off because your hub hasn’t polled it in day. There are lots of potential checks you can make here but every application will have different requirements so you will have to think hard about all the possible conditions for your specific case.

Sample good watchdog:

E_APPLICATION_STATE ApplicationPoll( E_PROTOCOL_STATE bProtocolState ) {
...
if (ActiveJobs()) {              // Mutex buffer is busy
    if (OneSecondTimer) ActiveJobsCounter++;  // Once/sec increment
} else {
    ActiveJobsCounter=0;         // When buffer is free clear counter
}
...
if ((ActiveJobsCounter<30) &&       // Mutex isn't stuck 
    (LastCommsHours<25) &&          // Got a frame in the last 24 hrs
    ApplicationSpecificReasons) {   // Other reasons
    ZW_WatchDogKick();              // Everything is OK so reset WDOG
}

In the example code above we do have a major issue in that if the counters stop counting for some reason, the watchdog will never fire! But that’s easy to check for in ApplicationPoll and if ApplicationPoll itself isn’t running then the WatchDog is no longer being kicked so it will reset.

Doesn’t Work If Not Tested

The old coding adage (proven totally true by me many many times) goes “If the code hasn’t been tested, it doesn’t work”. Same thing applies to your Watchdog code. So how do you test the watchdog? The first thing to do is to log the number of times the watchdog has triggered. This has to be stored in NVM since RAM will be lost when you reboot. Fortunately ApplicationInitHW is called with the bWakeupReason parameter which lets you know the watchdog fired when equal to ZW_WAKEUP_WATCHDOG. Note that usually ApplicationInitHW just stores the bWakeupReason and later in ApplicationInitSW we check it as the NVM isn’t available in InitHW.

ApplicationInitSW(...) {
...
if (wakeupReason==ZW_WAKEUP_WATCHDOG) { // Increment WDOG counter with max 255
    i=MemoryGetByte((WORD)&EEOFFSET_NumberWatchDogResets_far);
    if (i<255) MemoryPutByte((WORD)&EEOFFSET_NumberWatchDogResets_far, i+1);
}

Use a Configuration Command Class parameter to read or update this value for testing purposes. I also like to put in a small block of code wrapped in #ifdef WATCHDOG_TESTING_ENABLED that upon receiving a BASIC_SET with a value of 0xDE (not a valid value) calls GetResponseBuffer() which locks up the mutex and in 30 seconds the chip should reboot. If not, then you have a bug in the watchdog code! You can test all the branches in your watchdog code with various values of a BASIC_SET.

When to Enable Watchdog

Perhaps a better question is when NOT to enable the watchdog since ALL production builds absolutely must have the watchdog enabled! My recommendation is to disable the watchdog during development. You want the chip to lock up if you have a bug. The watchdog is really good at masking major bugs since things just keep on working. If the device locks up, then you know something is wrong and you need to chase it down. If you power cycle and the device is fine again, IT IS NOT FINE! You have a bug in your code! During production testing I usually turn the watchdog back on but I also have the testing scripts check the watchdog counter and if it increments then the test fails.

Watchdog Best Practices for Z-Wave Developers

  1. Disable Watchdog during development using #defines
  2. Only kick the watchdog when everything is idle
    1. Kicking every ApplicationPoll is INSUFFICIENT
    2. Check the ActiveJobs() being stuck (aka Mutex)
    3. Check other conditions within your product
  3. Check that the RF has received something every X minutes or hours
  4. Have a way to test the Watchdog during development
  5. Store the number of Watchdog resets in NVM and retrieve them via a configuration parameter

 

10 Questions when Reviewing Embedded Code

Design News posted a great article “10 Questions to Consider When Reviewing Code” and I’m just posting the list here. Follow the link for the full article with the details behind each question.

  1. Does the Program build without warnings?
  2. Are there any blocking functions?
  3. Are there any potential infinite loops?
  4. Should this function parameter be a const?
  5. Is the code’s cyclomatic complexity less than 10?
  6. Has extern been limited with a liberal use of static?
  7. Do all if…else if… conditionals end with an else? And all switch statements have a default?
  8. Are assertions and/or input/output checks present?
  9. Are header guards present?
  10. Is floating point mathematics being used?

My personal pet peeve is #3 – I am constantly reviewing that uses WHILE loops waiting for a hardware bit to change state. But what if the hardware bit is broken? Then the device is DEAD. Always have some sort of timeout and use a FOR loop instead of a WHILE loop. At least the code will move on and won’t be dead. Maybe it won’t work properly because of the broken hardware but at least the device can limp along.

Seven Habits of Highly Effective Z-Wave Networks for Consumers

You have a Smart Home using Z-Wave as a wireless technology for all these Internet of Things (IoT) devices to communicate with each other. But maybe things are not working quite as well as you expect. You press a button on your phone and 1… 2… 3… and then finally a light comes on or maybe it doesn’t come on at all! Another common problem is when a battery powered sensor was updating the temperature last week and this week it just doesn’t seem to be sending updates anymore or at best sporadically. As a Z-Wave expert I’ve built and rebuilt hundreds of Z-Wave networks and have come up with a few habits to make Z-Wave networks more reliable.

1. Minimize Polling

This is probably THE number one mistake new users of Z-Wave make. They figure Z-Wave is a high speed network so they can just poll a light switch every 3 seconds and then react to any change in the switch. Z-Wave and most other wireless networks work best when the network is highly available. If the network is busy, every device that needs to send a message has to wait its turn and then compete (and often collide) with all that polling traffic. Collisions slow everything down just like rubber-necking on the highway.

Polling used to be the only way to get around a patent that fortunately expired in February 2016. The patent forced many light switch manufacturers to not send a message when you flipped the switch. Several manufacturers found ways to get around this or they licensed the patent. But now that the patent has expired, you can get light switches that do send a report immediately when their state has changed.

So the primary way to minimize polling is to replace the few devices in your Smart Home that trigger an event  (or SmartApp or Magic or whatever your hub calls it) with one that will instantly send an update. If you have some older switches but they’re not that important to instantly know their state has changed, you can still poll them but no more than once every few minutes. Remember that if you have 60 Z-Wave devices and you poll each one once/min then you are polling once/second and the network is hammered! So only poll a couple of nodes!

2. Have enough devices to create a mesh

I can’t tell you how many people I’ve worked with that had a door lock and a hub and nothing else, maybe a battery powered thermostat. And they wondered why the connection to the lock was unreliable when the hub was at the far end of the building! Z-Wave relies on Always-On (110VAC powered) nodes to build a “mesh” network. The mesh is the key to Z-Wave reliability. Every Always-On node acts as a repeater in the mesh and is able to forward a message from one node to another in the mesh. But only the Always-On nodes can forward a message. Battery powered devices like door locks and battery powered thermostats cannot forward messages. Only the Always-On nodes can.

Solution: If some devices are not reliable, add more Always-On devices. Add a Z-Wave repeater or any device like a lamp dimmer. Even if you don’t use the lamp dimmer it will act as a repeater and improve the network. I have a few lamp switches I use for my Christmas lights which I leave plugged in year round because they help the Z-Wave network since these nodes are at the periphery of my home.

Distance between nodes is not always the criteria for adding more nodes in a network. The Z-Wave radio signals may bounce off metal objects like mirrors or appliances and cause two nodes that are only a few feet apart be completely unable to talk to each other due to reflections of the radio signals. Adding more nodes in the mesh provide alternate routes to nodes that otherwise might be in a dead zone due to these reflections cancelling out the radio signals.

3. Place the hub in a central location

Putting the hub in a corner of the basement might be convenient, but its a terrible idea for Z-Wave. The hub is the most important node in the network and should have the best location possible. While Z-Wave is a mesh network and can route or hop thru other nodes in the mesh, each hop is a significant delay and chokes up the network with more traffic. Ideally the hub should reach 90% of the nodes in your Smart Home without relying on routing. If the hub has Wifi then putting it in a central location is easy, you just need a wall outlet to plug it in. I have my hub hung off the back of a TV cabinet in roughly the middle of the first floor of my home.

4. Heal the Network

Once a Z-Wave network is built, it has to be “healed” so every node can use all the other nodes in the network to route messages. This healing process can take many minutes to even hours depending on the size of the network. When you first build a Z-Wave network, the first node added only knows that the hub is in the network. When you add a second node, the hub knows that both the nodes are in the network but the first node you added has no idea that node 2 is there – unless you heal the network. So any time you add a node, you need to heal at least a few nodes in the network if not the entire network. Be cautious with the healing process – it uses 100% of the Z-Wave bandwidth during the process and every node will wake up every FliR node (door locks) at least once which will drain the batteries of the FLiR node. Generally only heal when nodes have been added or removed or if there seems to be a problem in the network.

Z-Wave is able to self-heal automatically. Z-Wave nodes will try various routes to get their message thru if at first it doesn’t succeed.  The node will remember the Last Working Route and try that one first for the next message. But if the nodes have no idea there are other nodes in the network they have no way of knowing what routes to try so at least one full heal of the network is required.

HomeSeer

homeseerhealHomeSeer has several platforms so the precise method might be slightly different than shown here. From the web interface home page select the menu Plug-Ins->Z-Wave->Controller Management then select the Action “Fully Optimize a Network”. The network wide heal will take some time depending on the size of the network.

SmartThings

SmartThings Expert Z-Wave Eric Ryherd DrZwaveSmartThings  user interface is thru their app which makes finding the network heal a bit of a challenge. Start from the dashboard and click on the three lines in the upper left corner. Your Hub should be the first choice in the menu that slides out, click on your hub. A new menu comes up, click on the last choice “Z-Wave Utilities”. The last choice on the next menu that slides in is “Repair Z-Wave Network” so click on it and then click on “Start Z-Wave Network Repair”. The repair will take from minutes to over an hour depending on the size of your network.

Vera

verahealVera has several versions of their UI but each of them has a similar menu structure so these instructions should work on any version. The Vera version shown here is UI7. Use a PC to log into GetVera.com and select your hub. From the Dashboard, select Settings->Z-Wave Settings and then click on the advanced tab. At the bottom of the advanced tab is the GO button to run the “Update Node Neighbors”. Depending on the size of the Z-Wave network this process will take several minutes to over an hour.

5. If a device doesn’t pair, first exclude it, then include it

You’ve taken the brand new Z-Wave IoT widget out of the box and you’ve tried to pair it (the Z-Wave term is “inclusion”) but it just won’t include! Arrrghhh! The first thing to try is to exclude the node first and then try including it. Any hub can “reset” or exclude a Z-Wave device even if that device was previously connected to another network. Some manufacturers occasionally fail to exclude the device during testing so the device may already be connected to their test network. Z-Wave Expert IoT WirelessOr you may have inadvertently included the device but the inclusion process failed somehow and the hub is confused. Excluding the node should reset it to the factory fresh state. Newer Z-Wave Plus devices (which have this logo on them) are required to have a way to reset them to factory defaults using just the device itself. Every device is different so you’ll have to refer to the device manual to perform a factory reset but if all else fails this should make the device ready to pair. Naturally having the hub physically close to the device being paired will also help though most devices can be paired from a distance.

Secure devices like door locks are particularly challenging to pair. First the secure device has to join the Z-Wave network, then the AES-128 encryption keys have to be exchanged and if that process fails (which it does on occasion), then you have to exclude and try the inclusion process all over again. Secure devices definitely want to be within a few feet of the hub during inclusion to ensure reliable and speedy Z-Wave communication.

6. Battery life and how to maximize it

When a battery powered Z-Wave device wakes up and turns on its radio, it uses 10,000 times more battery power than when it’s asleep. So the entire trick to making batteries last is to minimize the amount of time the device is awake. Some devices naturally have other battery draining activities mostly involving motors to throw a deadbolt or raise a window shade. Obviously any motor will use a lot more battery power than the Z-Wave radio but the radio will play a significant role in battery life.

When a battery powered device is added to a Z-Wave network the hub should do two things:

  1. Assign the Association Group 1 NodeID to the hub
    1. Association Group 1 is the “LifeLine” in Z-Wave and devices use this lifeline to send all sensor data and alerts to this node
    2. All hubs are required to assign Group 1 but double check this assignment
  2. Set the Wake Up Interval to no more than once per hour and ideally only a few times per day
    1. Every hub assigns the WakeUpInterval differently and largely handles it behind the scenes so this may be difficult to verify or change
    2. If the device is waking up every few minutes and sends a sensor reading then its battery life isn’t going to be more than a few weeks
    3. The battery level of the device is usually reported at the WakeUpInterval  rate

Many sensors have other Association Groups or Configuration Parameters that will let you specify the frequency of sensor readings. Realize that the more often the sensors report in, the shorter the battery life.

7.  Dead nodes in your controller

One of the big problems in Z-Wave network maintenance is eliminating “dead” nodes. When a device fails or for whatever reason is no longer in use, then it needs to be removed from the controller. If it remains in the controller then the controller will try to route thru this dead node on occasion resulting in delays in delivering messages. Eventually the self-healing aspects of Z-Wave will make this less likely but various devices will on occasion attempt to route thru it. Since the node is dead, that wastes valuable Z-Wave bandwidth and potentially battery power of sleeping devices. Occasionally running a Heal on the network will remove the node from the routing tables but it will remain in the controllers routing tables. It is best to completely remove this dead node. Each hub has a different method for removing dead nodes and usually requires going into an advanced Z-Wave menu.

Following these guidelines will help your Z-Wave experience be more robust. If you have more questions please feel feel to reach out via email to drzwave at expresscontrols.com.