Z-Wave Virtual Academy

Z-Wave Virtual Webinar Wednesdays at Noon Eastern US time

Doctor Z-Wave will be giving a hands-on live demo of getting started using Z-Wave with Simplicity Studio on Wednesday June 17. This is a live demo with just a couple of slides so you don’t want to miss it. The session is a short roughly 30 minutes with time for Q&A afterward. I will show you some simple things on setting up Simplicity to make your life easier when getting started. If you can’t make it, it will be recorded and available via the Alliance web site.

There are lots of other topics for Webinar Wednesdays:

Webinar Wednesday Schedule*: *This schedule will be updated regularly on the Z-Wave Alliance website as the series progresses
May 27, 2020  
  Manufacturing During a Global Pandemic: Insight & Strategy from Companies Who Are Coping Hosted by: Avi Rosenthal – Bluesalve Partners  
June 3, 2020   Social Distance Sales for Uncertain Times: Tips & Insight for Integrators Hosted by: Jeremy McLerran – Qolsys
June 10, 2020   Residential Smart Lock Market: Trends, Use-Cases & Opportunities Hosted by: Colin DePree – Salto Systems
June 17, 2020 Z-Wave 700 Series: Getting Started Hosted by: Eric Ryherd – Silicon Labs
June 24, 2020  
  Feature of Leedarson Z-Wave 700 Series Security Products Hosted by: Vincent Zhu & Michael Bailey Smith – Leedarson
 

Fast GPIO Wake Up in Z-Wave 700 Series

The Silicon Labs EFR32 family of IoT microcontrollers are very flexible and can do a ton of cool stuff. However, along with all that flexibility comes a lot of complexity. With that complexity are default settings that work fine for many applications but in some cases you want to dig into the details to come up with an optimal solution. In this post I’ll show how to speed up the wake up time for the Z-Wave ZGM130S chip from a GPIO.

But first – a caveat: This post applies to Z-Wave SDK 7.13.x. Future releases of the SDK may have different methods for sleep/wake and thus may require a different solution.

The Problem

Frequently Listening Routing Slaves (FLiRS) devices like door locks and many thermostats spend most of their time in Energy Mode 2 (EM2) to conserve battery power. Once per second they wake up briefly and listen for a Beam from an always-on device. If there is a beam, the FLiRS device will wakeup and receive the Z-Wave command. This allows battery powered devices to use very little power but still be able to respond to a Z-Wave command within one second. FLiRS devices use more battery power than fully sleeping devices like most sensors which use Hibernate Sleep mode (EM4). To wake every second the ZGM130 has to wake quickly and go right back to sleep to minimize power. The problem with EM4 is that it takes a few tens of milliseconds to wake up as the entire CPU and RAM have to be initialized as they were powered down to save power. For a FLiRS device, it’s more efficient to keep RAM powered but in a low-power state and resume quickly to go right back to sleep if there is no beam. Typically the ZGM130 can wake up in about 500 microseconds from EM2. But in many cases this is still too long of a time to stay awake if there are other interrupts such as UARTs or other sensors.

The scope shot above shows the processing that takes place by default on the ZGM130S. In this case I am using a WSTK to drive the SPI pins of another WSTK running the DoorLockKeyPad sample application. The chip is in EM2 at the start of the trace. When SPISEL signal goes low, the chip wakes up. But it is running on the HFRCO oscillator which is not accurate enough to run the radio but it is stable and usable in just a few microseconds. Thus, the SPI clock and data is captured in the USART using this clock. However, by default the Interrupt Service Routine is blocked waiting for the HFXO to stabilize. The 39MHz HFXO crystal oscillator has the accuracy required for the radio.

The question is what’s going on during this 500usec? The answer is the CPU is just waiting for the HFXO to stabilize. Can we use this time to do some other work? Fortunately, the answer is YES! The challenge is that it takes some understanding and some code which I’ll describe below.

The Solution

There are three functions that do the majority of the sleep processing. These are provided in source code so you can read the code but you should not change it. Instead you’ll provide a callback function to do your processing while the chip is waking up.

Simplified Sleep Processing Code:

  1. SLEEP_Sleep in sleep.c: The main function called to enter sleep
    1. CORE_ENTER_CRITICAL – PRIMASK=1 mask interrupts
    2. DO-WHILE loop
      1. Call enterEMx() – this is where the chip sleeps
      2. Call restoreCallback (return 0 to wake, 1 to sleep)
    1. Call EMU_Restore – waits for HFXO to be ready ~500us
    2. CORE_EXIT_CRITICAL – ISRs will now run
  2. enterEMx() in sleep.c:
    1. sleepCallback called
    2. Call EMU_EnterEM[1-4]
    3. wakeupCallback after returning from EMU_EnterEMx
  3. EMU_EnterEM2 in em_emu.c:
    1. Scales voltage down
    2. Call EMU_EM23PresleepHook()
    3. __WFI – Wait-For-Interrupt instruction – ZGM130 sleeps here
    4. Call EMU_EM23PostsleepHook() ~ 17usec after wakeup
    5. Voltage Scale restored which takes ~20us

The code is in sleep.c in the SDK which has a lot more detail but at a high level this is what you need to know. The important part to understand here is where the “hooks” are and how to use them.

  • Use Sleep_initEx() to assign:
    • sleepCallback – called just before sleeping
    • restoreCallback – Return 0 to wake, 1 to sleep
    • wakeupCallback – called after waking
    • Sleep_initEx() input is a pointer to a structure with the three callbacks or NULL if not used
  • Define the function:
    • EMU_EM23PresleepHook()
    • EMU_EM23PostsleepHook()
    • These are both WEAK functions with nothing in them so if you define them then the compiler will install them

The two EMU_EM23* weak functions are run immediately before/after the Wait-For-Interrupt (WFI) instruction which is where the CPU sleeps. These are very low level functions and while you can use them I recommend using the callbacks from Sleep_initEx().

The SLEEP_initEx() function is the one we want to use and in particular the restoreCallback. The comments around the restoreCallback function talk about restoring the clocks but if the function returns a 0 the chip will wake up and if it returns a 1 then it will immediately go back to sleep which is what we want! You can use the other two hooks if you want but the restoreCallback is the key one since it will immediately put the chip back to sleep if everything is idle.

The key to using ANY of these function is that you CANNOT call ANY FreeRTOS functions! You cannot send any Z-Wave frames or call any Z-Wave function as they all require the RTOS. At this point in the wakeup processing the RTOS is not running! All you can do in these routines is to capture data and quickly decide if everything is idle and to go back to sleep. If there is more processing needed, then return 0 and wait for the event in the RTOS and process the data there. You also don’t want to spend too much time in these routines as it may interfere with the timing of the RTOS. A hundred microseconds is probably fine but longer you should wait for the HFXO.

In ApplicationInit() you will call Sleep_initEx() like this:

const SLEEP_Init_t sleepinit = {NULL, NULL, CheckSPI};
...
ZW_APPLICATION_STATUS ApplicationInit(EResetReason_t eResetReason) {
...
SLEEP_InitEx(&sleepinit); // call checkSPI() upon wakeup from EM2.
...
}
...
uint32_t CheckSPI(SLEEP_EnergyMode_t emode) { 
	uint32_t retval=0; // wake up by default
	if (GPIO_IntGetEnabled() & 0x0000AAAA) { // Check SPI
		GPIO_ODD_IRQHandler(); // service the GPIO interrupt
		// wait for all the bytes to come in and compute checksum 
		NVIC->ICPR[0] = NVIC->ICPR[0]; //clear NVIC pending interrupts
		if (!SPIDataError && !IsWakeupCausedByRtccTimeout())	{
			 retval=1; // go back to sleep!
		}
	}
	return(retval); // 0=wakeup, 1=sleep
}

Recall that every second the FLiRS device has to check for a Z-Wave beam which is triggered by the RTCC timer. Thus the check for IsWakeupCausedByRtccTimer ensures that the beaming still works.

This scope shot shows the wake up processing of the ZGM130S:

  1. SPISEL_N SPI chip select signal goes low triggering a GPIO_ODD interrupt
    1. The chip wakes up, the HFRCO begins oscillating
  2. HFRCO begins oscillating in a few microseconds
    1. Once HFRCO is running, the peripherals are functional
    2. SPI data can begin shifting once the HFRCO is running
    3. The default HFRCO frequency is 19MHz but can be increased
    4. Higher frequencies for HFRCO also may need more wait states for the CPU and will use more power
  3. The WFI instruction that put the CPU to sleep is exited here
    1. EMU_EM23PostSleepHook function is called if defined
    2. After returning from PostSleepHook, the VSCALE is returned to full power which takes about 10usec
    3. It is best to wait for the voltage to be powered up to ensure all logic is running at optimal speeds
  4. EMU_EnterEM2 is exited and restoreCallback is called if initialized
    1. This is the function where the ISR should be called to process data
    2. If the data says things are idle and want to go back to sleep, return 1
    3. If more analysis is needed, then return 0
    4. Carefully clear the interrupt bits
      1. First clear the peripheral Interrupt Flags
      2. Then clear the NVIC Interrupt pending register
        1. NVIC->ICPR[n]=NVIC->ICPR[n] where n is 0-1 depending on your interrupt
    5. Make sure there aren’t other reasons to wake up fully
      1. !IsWakeupCausedByRtccTimeout() is the 1s FLiRS interrupt
      2. There may be other reasons to wake up which is application dependent
  5. In this example the SPI data is being fetched from the USART at each toggle of the GPIO
    1. The final toggle shows that the checksum was computed and the data is idle so go back to sleep
  6. The chip returns back to sleep in a few more microseconds
    1. Total processing time of this interrupt is less than 200usec which is a fraction of the time just waiting for the HFXO to stabilize
    2. Much of that time is receiving and processing the SPI data
    3. It is possible to sleep in under 50usec if the check for idle is quicker

If your peripheral processing will take significantly less than 500usec, then it may be more efficient to process the data using the HFRCO and not wait for the HFXO to power up. But if your application needs more processing, then you are probably better off waiting. Each application must make their own calculations to determine the most efficient path.

What About Sleeping Devices?

Fully sleeping devices (EM4 also known as RSS – Routing Sleeping Slaves) have entirely different wake/sleep processing. For sleeping slaves the processor and RAM have to be re-initialized and the chip essentially boots out of reset. All that initialization takes quite a bit of time – a few tens of milliseconds. If your device needs to do a lot of frequent checking of a sensor, then it might make more sense to force it to stay in EM2 by setting a Power Lock to PM_TYPE_PERIPHERAL. For more details on power locks see INS14259 section 7.6. Deciding which way to go is application specific so you have to make the calculations or measurements to find the right balance for your project.

This is a complex posting but I hope I’ve made it clear enough to enable you to optimize your application firmware. Let me know what you think by leaving a comment below.

How to Upgrade Your 700 Series Project from SDK 7.12 to 7.13

This is a very specific posting for Z-Wave developers and specifically for those developing with the new 700 series chips. If you’re not a 700 series developer you can probably stop reading…

I have posted details on upgrading from the 7.12 to the 7.13 Software Developers Kit at this Knowledge Based Article on the Silicon Labs web site: https://www.silabs.com/community/wireless/z-wave/knowledge-base.entry.html/2020/03/30/upgrading_700_seriesprojectfrom7122to7133-VZrM

Z-Wave SDK 7.13.3 released last week with a number of important stability improvements – you want to upgrade your 700 series project to this release!

  • Several stability improvements to prevent lockups in certain corner cases
  • RSSI reporting corrections (both 500 and 700)
  • Improved timing for routed acks and fixed sticky Last Working Routes
  • OTA Firmware Activate support delaying rebooting into the new firmware until all units have been downloaded
  • Details are found in SRN14629.pdf which is included in the Simplicity Studio release: SDK Documentation->End Device->SRN14629 Z-Wave 700 SDK 7.13.x

Z-Wave Watchdog Timer Best Practices

WatchDogVirtually all embedded systems must run 24 x 7 x 365 x many many years without ever being rebooted. Since there is no one there to “press the reset button” if the device fails, the watchdog timer is there to do just that. The 500 series Z-Wave chips from Silicon Labs have a watchdog timer and the example code provides a very minimal use of the watchdog timer. However, the minimal use in the example code is not sufficient to provide a robust watchdog for embedded Z-Wave devices. This post explains some rules and methods to code a robust watchdog timer.

Long time embedded expert Jack Ganssle has a great article on Watchdog timers. He describes the use of a watchdog timer on the Clementine spacecraft where a fault in the system caused the spacecraft to dump virtually of its fuel resulting in the loss of the mission. The lead software engineer had wanted a watchdog but the designers decided not to include it. Jacks example shows how important it is to spend at least some time coding a robust watchdog for our IoT devices. While our devices aren’t controlling multi-million dollar spacecraft, we are coding light switches that are hardwired into the wall and cannot be easily rebooted. Try telling the customer to go into the basement and toggle the power to his entire house to reboot the light switches!

What is a Watchdog?

A watchdog timer is a timer that runs constantly. Typically a complex combination of events resets (or “kicks”) the watchdog timer every now and then, usually every few milliseconds. If the combination of events ever gets stuck, the timer will continue to run. If the watchdog timer “times out”, the system is reset – basically the reset button is pushed! Your embedded system reboots and keeps on running. Generally no one even realizes it has rebooted (I’ll discuss that problem in more detail shortly).

WatchdogTimerThis diagram shows the Watchdog timers value which is constantly counting up. Every time the Watchdog is “kicked”, the counter is reset to zero. Somewhere in your code the ZW_WatchDogKick() routine is called which resets the watchdog timer. Sometimes this reset condition happens on a nice regular basis, sometimes it happens at varying times as shown by the level of the timer. The key is the timeout threshold has to be longer than any normal operating condition. If a fault condition occurs, the timer keeps on counting up until the threshold is reached and then the system is reset. When the watchdog timer fires, the Z-Wave chip goes thru a full reset just as if power had been removed and reapplied. Your embedded system is back up and running as if nothing had happened.

SiLabs Sample Code = Minimal Watchdog

The SiLabs sample code has the following implementation of the watchdog:

BYTE ApplicationInitSW(ZW_NVM_STATUS nvmStatus) {
...
#ifdef WATCHDOG_ENABLED
 ZW_WatchDogEnable();
#endif
} 

void ApplicationPoll(void){
#ifdef WATCHDOG_ENABLED
 ZW_WatchDogKick();
#endif
}

The sample code has the good implementation practice of putting the Watchdog code inside #defines so it can be easily enabled/disabled. Unfortunately it blindly kicks the dog every ApplicationPoll without checking any other conditions. ApplicationPoll is called roughly every few hundred microseconds and a lot of fault conditions can exist and ApplicationPoll will still be called. With this implementation the only way the watchdog is going to fire is if there is a catastrophic failure and ApplicationPoll is no longer being called. While this implementation is better than nothing, it won’t reset the system in many cases where the device has become unresponsive. This is where you come in, you have to add more code to the watchdog algorithm. It may be easy to just use what SiLabs provides, but for a robust product you really need to spend some time adding your own conditions to the watchdog algorithm.

A Better Watch Dog Example

Writing good watchdog code requires some significant thought and testing. The possible sources of failure need to be discussed with members of the team and with other Z-Wave developers who are fighting the same fight (thus the need for this blog). I can provide a few guidelines to include in your analysis but this is not a complete solution. Only you know all the possible failure modes of your product and that requires some serious thought and analysis.

Mutex Gets Stuck

The most common failure I have seen is the fact that the SiLabs provided Application Framework (AF) mutex can get stuck. When the mutex is stuck, it most often results in the device still able to receive Z-Wave traffic but often can’t respond. If the device is power cycled, then it returns to full operation. So often this failure goes unnoticed both in testing and in actual use.

What is the mutex you ask? The mutex is a simple flag in the AF that prevents the code from overwriting the Send Buffer while a message is currently being sent over the radio. When a GET command comes in, the AF will call a command class handler to handle the GET and build a REPORT frame in memory. When ready to send the frame, the AF will call pTxBuf=GetResponseBuffer() to get a buffer for the radio to send. There is only one buffer so if the buffer is already in use, you get a NULL pointer back and will have to wait and send the frame later.  This in general works fine as long as frames don’t come in too fast. But in a large network with lots of repeated and re-routed frames you will occasionally get a bunch of GETs quickly and it is possible for the REPORTs to get cross wired and end up locking up the mutex for a frame that will never be sent. If the code then doesn’t properly release the buffer, the mutex is stuck. The Application Framework code is known to lock the mutex occasionally so you must code around this problem. The easiest solution to this rare event is to ensure the watchdog is watching the mutex and simply reboot if it gets stuck for too long.

My solution is to have a counter that counts up once per second in ApplicationPoll anytime ActiveJobs() is true (in SDK 6.81.xx its now called ZAF_mutex_isActive()). ActiveJobs is true anytime a buffer is in use and false when all the buffers are free. There are actually two buffers, one for response frames (REPORTs sent as a result of a GET) and a second buffer for request frames (unsolicited notifications).

Application Specific Reasons

Beyond the mutex you must think long and hard about application specific failure conditions. The most obvious is that the device has not received or sent a frame in 25 hours. Most hubs will poll a device at least a couple of times per day to make sure it is still alive. So if there has been no traffic in a day, maybe something is stuck and a reboot is in order. Plus if nothing has happened in a day then probably no one will notice the reboot (which only takes 1.5 seconds). You do have to be careful that some other part of the application isn’t impacted as a result of the reboot. For example, if you are a light switch and by default you turn the light off on a reboot, then people will be really annoyed if the light randomly turns off because your hub hasn’t polled it in day. There are lots of potential checks you can make here but every application will have different requirements so you will have to think hard about all the possible conditions for your specific case.

Sample good watchdog:

E_APPLICATION_STATE ApplicationPoll( E_PROTOCOL_STATE bProtocolState ) {
...
if (ActiveJobs()) {              // Mutex buffer is busy
    if (OneSecondTimer) ActiveJobsCounter++;  // Once/sec increment
} else {
    ActiveJobsCounter=0;         // When buffer is free clear counter
}
...
if ((ActiveJobsCounter<30) &&       // Mutex isn't stuck 
    (LastCommsHours<25) &&          // Got a frame in the last 24 hrs
    ApplicationSpecificReasons) {   // Other reasons
    ZW_WatchDogKick();              // Everything is OK so reset WDOG
}

In the example code above we do have a major issue in that if the counters stop counting for some reason, the watchdog will never fire! But that’s easy to check for in ApplicationPoll and if ApplicationPoll itself isn’t running then the WatchDog is no longer being kicked so it will reset.

Doesn’t Work If Not Tested

The old coding adage (proven totally true by me many many times) goes “If the code hasn’t been tested, it doesn’t work”. Same thing applies to your Watchdog code. So how do you test the watchdog? The first thing to do is to log the number of times the watchdog has triggered. This has to be stored in NVM since RAM will be lost when you reboot. Fortunately ApplicationInitHW is called with the bWakeupReason parameter which lets you know the watchdog fired when equal to ZW_WAKEUP_WATCHDOG. Note that usually ApplicationInitHW just stores the bWakeupReason and later in ApplicationInitSW we check it as the NVM isn’t available in InitHW.

ApplicationInitSW(...) {
...
if (wakeupReason==ZW_WAKEUP_WATCHDOG) { // Increment WDOG counter with max 255
    i=MemoryGetByte((WORD)&EEOFFSET_NumberWatchDogResets_far);
    if (i<255) MemoryPutByte((WORD)&EEOFFSET_NumberWatchDogResets_far, i+1);
}

Use a Configuration Command Class parameter to read or update this value for testing purposes. I also like to put in a small block of code wrapped in #ifdef WATCHDOG_TESTING_ENABLED that upon receiving a BASIC_SET with a value of 0xDE (not a valid value) calls GetResponseBuffer() which locks up the mutex and in 30 seconds the chip should reboot. If not, then you have a bug in the watchdog code! You can test all the branches in your watchdog code with various values of a BASIC_SET.

When to Enable Watchdog

Perhaps a better question is when NOT to enable the watchdog since ALL production builds absolutely must have the watchdog enabled! My recommendation is to disable the watchdog during development. You want the chip to lock up if you have a bug. The watchdog is really good at masking major bugs since things just keep on working. If the device locks up, then you know something is wrong and you need to chase it down. If you power cycle and the device is fine again, IT IS NOT FINE! You have a bug in your code! During production testing I usually turn the watchdog back on but I also have the testing scripts check the watchdog counter and if it increments then the test fails.

Watchdog Best Practices for Z-Wave Developers

  1. Disable Watchdog during development using #defines
  2. Only kick the watchdog when everything is idle
    1. Kicking every ApplicationPoll is INSUFFICIENT
    2. Check the ActiveJobs() being stuck (aka Mutex)
    3. Check other conditions within your product
  3. Check that the RF has received something every X minutes or hours
  4. Have a way to test the Watchdog during development
  5. Store the number of Watchdog resets in NVM and retrieve them via a configuration parameter

 

8051 8-bit CPU Coding Tricks & Tips

The death of the 8-bit CPU has been prognosticated for over a decade but these old tiny workhorses keep going and going and going. Z-Wave is currently based on the venerable Intel 8051 CPU but we’re about to get an upgrade to a 32-bit ARM CPU via the 700 series which is due out later this year. In this posting I’ll give a few tips on coding in C for the 8051 to typically improve speed of execution.

I’ve been writing code for 8-bit CPUs since the 1980s and I designed a few in the 1990s. I also designed a couple of 32-bit RISC CPUs in the 1990s and early 2000s. I often had to squeeze the code just a little harder to get an operation to happen just fast enough to meet a system requirement. This meant that I either had to code in assembly or often just coding C in a slightly different way would convince the compiler to generate the most optimal assembly code for me.

8-bit CPU Architecture

I could go thru a long discussion of the block diagram of the 8051 CPU but instead I’ll just cut to the  chase – the 8051 is not at all “C friendly”. The 8051 was architected back in the bad ol’ days of assembly programming – especially for embedded systems where resources were in very short supply (think 1K ROM and 64 bytes of RAM). Thus the architecture has all sorts of funny things that a C compiler can’t take advantage of efficiently. The PSR flags, the D pointer, the single accumulator, memory mapped registers and the SFR registers are all parts of the 8051 that make it special but unfortunately are just not efficient in C. The folks at Keil have worked really hard to make their compiler as efficient as possible and to make access to these non-C hardware resources available without too much effort. But at the end of the day, a C compiler really wants a nice simple block of 32-bit registers to perform integer arithmetic on and a simple flat memory – in other words, a RISC CPU.

In the Z-Wave world, the 8051 in the 500 series chips is a 32 MHz 8051 with 128K bytes of FLASH and 16K bytes of RAM. “But wait” you say, “the 8051 is an 8-bit CPU with a 16-bit Program Counter, how can it address 128K bytes?”. The answer is Bank Switching which is just plain crazy but I won’t get onto that soapbox in this posting. The Z-Wave code from Silicon Labs is delivered as pre-compiled C libraries that the application developer links into their code. The application code is written in C and the Keil compiler does a fine job of squeezing a reasonable amount of code into the tiny 8051. Fortunately in IoT, the task to be performed is usually pretty simple – turn a light on or off by activating a relay connected to a GPIO so we don’t need a multi-Gigahertz CPU with gobs of RAM.

The Silicon Labs SDK comes with a number of sample applications and a bunch of “helper” routines all written in C. Thus, a Z-Wave application can generally be written completely in C, compiled using the Keil compiler and it’ll generally squeeze into the limited code space as long as you don’t try to do too much on this little CPU. But it seems there are always some little things that I just need to do a little faster. The following tips are just a few of the simple tricks I’ve learned from decades of embedded coding.

The Fastest 8051 Loop

What is the fastest 8051 loop you can execute in C?

The most common loop is a simple FOR loop:

for (iter=0;iter<16;iter++) {
...
}

Which is easy to read and does the job just fine – if we don’t need to do it very fast. Often the innermost loop is executed a lot and squeezing just a few instructions out of the code will significantly improve the performance of the routine.

The for loop above compiles into the following assembly code:

B02:0xA04D E4 CLR A
B02:0xA04E FF MOV R7,A
B02:0xA04F 0F INC R7                    ; TOP OF LOOP
B02:0xA050 EF MOV A,R7
B02:0xA051 B410FB CJNE A,#demodState(0x10),B02:A04F

Which as you can see is quite a few instructions and even longer when you consider that each instruction is at least 4 CPU clocks. The Compare and Jump if Not Equal (CJNE) is a 3 byte instruction which requires many clock cycles to execute. If we can squeeze a few instructions out of the loop then it will run much faster. Obviously we could code in assembly but this exercise is to show that HOW you code in C can make a big difference in the performance. An alternative coding of the for loop above is:

iter=16;
do {
...
} while (--iter);

With this slight change in the coding style the while loop turns into a single byte DJNZ opcode as shown below so this is the fastest and most efficient looping structure when targeting the 8051.

B02:0xA055 7F10 MOV R7,#demodState(0x10)
B02:0xA057 DFFE DJNZ R7,B02:A057         ; single instruction loop

The real trick to improving the performance of your code is to see what assembly instructions the code compiles into.

How to See What Your C Code Turns Into

You can’t improve the performance of your code unless you can measure what the performance is and where all the time is being spent. A classic method of measuring the performance is to set a GPIO pin high or low during specific routines. Then use an oscilloscope to observe how long the routine takes. Other options involve printing out markers out a UART or using a hardware timer to measure the duration of a routine.

KeilDebug1A simple method to observe the instructions our C code generates for our little 8051 CPU is to use the simulator built into the Keil C compiler. By default the Keil IDE does not have a simulator when using the Silicon Labs sample projects so you have to assign one. Right click on the project then select “options” then click on the “Debug” tab. Then enter “s8051” into the CPU DLL box as shown here. You can now click on Debug->Start/Stop Debug Session or press <CTL>F5. This will enter the Keil Simulator for the 8051. One thing to understand is that this simulator does NOT understand the bank switching of the Silicon Labs version of the 8051 used in the 500 series. Unfortunately you can’t debug code much using this simulator as any bank switching doesn’t work. But it will work well enough to debug small snippets of code and of course to see what you code turns into.

KeilDebug2Once the debugger opens in the IDE, click on the line of code you are interested in and the Disassembly window (use View->Disassembly if its not visible) will take you right to the line of code you are interested in as shown here. Note that the C source code is mixed in as comments in the assembly code. This helps guide you to match the C code to the assembly code which can be a little convoluted depending on the optimization the compiler has applied. You can see here that the Do-While has turned into our desired DJNZ single instruction loop.

Don’t rely strictly on the instruction count to guide your C coding style. One slow-down I often see in the Silicon Labs code is they call a subroutine, that calls a subroutine, that calls (several more) subroutines which finally just returns a value. While this may be structured C coding it is very slow in an 8051. Each subroutine call pushes and pops a number of registers and parameters on the stack and each of those takes several clock cycles to perform. Ideally a C macro is used to specify a value or a register which becomes just a single instruction fetch of the value from a register or memory instead of all this pushing and popping.

Unrolling Loops

Another easy speedup in C is to unroll short loops. A classic situation for unrolling loops is when emulating a serial protocol like I2C with a GPIO. Since the loop is typically only 8 passes you can often significantly improve performance by unrolling the loop – often taking the I2C bit rate from under 10Kbps to nearly 100Kbps.

for (bitnum=0x80; bitnum!=0;bitnum>>1) {
    I2C_SEND_BIT(data & bitnum);  // macro to set SDA then toggle SCL
}

When unrolled turns into:

    I2C_SEND_BIT(data & 0x80);
    I2C_SEND_BIT(data & 0x40);
    I2C_SEND_BIT(data & 0x20);
    I2C_SEND_BIT(data & 0x10);
    I2C_SEND_BIT(data & 0x08);
    I2C_SEND_BIT(data & 0x04);
    I2C_SEND_BIT(data & 0x02);
    I2C_SEND_BIT(data & 0x01);

Which works well for short fixed length loops but obviously won’t work in every case. The slow down with the FOR loop involves reloading the accumulator and the D pointer with various constants and the iteration value. The inline version doesn’t have any of these nor are there any delays from looping. This technique also works well on 32-bit RISC processors.

Conclusion

There are a lot more tricks when coding for small 8-bit CPUs. But I’m hoping I won’t have to bother with them for much longer and the dominance of the 32-bit CPU will finally crush the 8-bit out of existence. The price of silicon continues to drop and 32-bit CPUs are often as cheap or even cheaper than these ancient 8-bit boat anchors. The modern CPUs also come with advanced debuggers unlike the 8051 which has… wait for it… printf or worse yet simply toggling an IO. Ugh.

 

 

Z-Wave Summit 2017 at Jasco in Oklahoma City

“IoT Device Testing Best Practices” by Eric Ryherd

Summit1Click HERE to see the entire presentation including my notes. If you are a Z-Wave Alliance member a video of the presentation is usually posted on the members only section of their web site. The main takeaways from my presentation are:

  • Have a written test plan
  • Use the Compliance Test Tool (CTT) as the START of your test plan
  • Vary the environmental conditions during testing
  • Test using real world applications
  • Test using complex Z-Wave networks with routing and marginal RF links
  • Test with other hubs and devices
  • Automate testing using tools like the ZWP500
  • Code firmware with failure in mind
  • Utilize the WatchDog timer built into the Z-Wave chip

ZWaveSummit2017aThe presentation goes into detail on each of these topics so I won’t duplicate the information here. I also go thru several failures of devices I’ve been working with. You learn more from failures than you do when everything just works. Feel free to comment and let me know what topic you’d like to see for next years summit.

Z-Wave Summit Notes

One of the main purposes of the summit is to learn what’s new in Z-Wave and what Sigma is planning for the future. The most important news at this year’s summit is SmartStart. The goal for SmartStart is to simplify the user experience of installing a new device on a Z-Wave network. The concept is that a customer will open the package for a device, plug it in, the hub is already waiting for the device to be joined and the device just shows up on their phone without having to press a button or enter the 5 digit pin code. This is a “game changer” as Sigma pointed out many times during the summit. Typically a user has to put their hub into inclusion mode, read the product manual to determine the proper button press sequence to put the device into inclusion mode, wait for the inclusion to go thru, write down the NodeID number, with an S2 device they have to read the teeny-tiny 5 digit PIN code printed on the product (or scan the QR code) and then MAYBE the device is properly included. Or more often, they have to exclude and retry the process all over again a couple of times. SmartStart as you can see will make the user experience much easier to get started with Z-Wave.

SmartStart enables “pre-kitting” where a customer buys a hub and several devices as a kit. The hub and the devices in the kit are all scanned at the distribution warehouse and are all white listed on the hub web site. When the customer plugs all the devices in, they automatically join and all just magically show up ready to be used without the frustration of trying to get all the devices connected together. Unfortunately there are no devices that support SmartStart and there are no hubs that support it either – yet. We’ll get over that eventually but I suspect it’ll take a year before any significant numbers of SmartStart supported devices show up on Amazon.

SmartStart is enabled in the SDK release 6.81 which occurred during the summit. There are some other handy features in this release. The main new feature (after SmartStart) is the ability to send a multi-cast FLiR beam. One problem with FLiR devices is that they are all sleeping devices and briefly wake up once per second to see if someone wants to talk to them. Prior to 6.81 you had to wake up the devices one at a time and each one would take more than one second to wake up. If you have battery powered window shades like I do, there is a noticeable delay as the shades start moving one at a time instead of all together. Both the shades and the remote (or hub) will need to be upgraded to 6.81 before we can use this new feature. That means it’ll be again probably another year before this feature is widely available, but it’ll get there eventually.

There are rumors that Sigma will be announcing a new generation of the Z-Wave transceiver chip in early 2018. I am hoping it will will finally include the upgrade from an 8-bit 8051 CPU to a more capable 32-bit ARM CPU.  The current 500 series relies on the ancient 8051 with very limited debugging capabilities which significantly slows firmware development. With an ARM CPU developers like Express Controls will find it easier to hire engineers who can code and debug firmware and thus we’ll be able to bring more Z-Wave products to market in less time.

A new web site, Z-WavePublic.com, has been populated with the Z-Wave documentation as well as images for the Beagle Bone Black and Raspberry Pi loaded with Sigmas Z/IP and Z-Ware. With one of these boards and a USB Z-Stick anyone can start developing with Z-Wave without having to sign a license agreement. Nice way to get started with Z-Wave for you DIY nerds out there. There were many other presentations on Security S2, Certification, The CIT, Z/IP, HomeKit and many other topics on the technical track of the summit. The marketing track had a different set of presentations so I recommend sending both a technical person and a marketing person to the summit.

Summit isn’t all work, work, work

IMG_20170927_090355The Summit isn’t all work all day though the days are long and tiring. Tuesday evening was a reception at Coles Garden which is a beautiful event venue. Unfortunately it was raining so we couldn’t wander thru the gardens much but Mitch, the Alliance Chairman, kept us entertained.

Wednesday evening was the Members Night at the Cowboy museum. Oil profits made a lot of wealthy Oklahomans who were able to make sizable donations to this huge museum. There is a lot more to see than we had time to explore so I’d recommend spending more time here if anyone is visiting Oklahoma City. Lots of food and drink made for an ideal networking environment with your fellow Z-Wave developers.

 

10 Questions when Reviewing Embedded Code

Design News posted a great article “10 Questions to Consider When Reviewing Code” and I’m just posting the list here. Follow the link for the full article with the details behind each question.

  1. Does the Program build without warnings?
  2. Are there any blocking functions?
  3. Are there any potential infinite loops?
  4. Should this function parameter be a const?
  5. Is the code’s cyclomatic complexity less than 10?
  6. Has extern been limited with a liberal use of static?
  7. Do all if…else if… conditionals end with an else? And all switch statements have a default?
  8. Are assertions and/or input/output checks present?
  9. Are header guards present?
  10. Is floating point mathematics being used?

My personal pet peeve is #3 – I am constantly reviewing that uses WHILE loops waiting for a hardware bit to change state. But what if the hardware bit is broken? Then the device is DEAD. Always have some sort of timeout and use a FOR loop instead of a WHILE loop. At least the code will move on and won’t be dead. Maybe it won’t work properly because of the broken hardware but at least the device can limp along.