Z-Wave Alliance is Now an SDO

What does an SDO mean you might ask? An SDO is a Standards Development Organization and the Z-Wave Alliance has now legally become a non-profit SDO. What this means to you is that Silicon Labs no longer control the progress of Z-Wave, the members of the SDO now control it. Read more details about the SDO in the Z-Wave Alliance Press Release.

There are seven founding members: Alarm.com, Assa Abloy, Leedarson, Ring, Silicon Labs, StratIS, and Qolsys. If you’re employed by one of these companies, join a working group and make your ideas known! There are six different membership levels with varying “voting rights” and costs so your organization can choose a level based on interest and budget.

How will this impact you and your IoT device development? In the short term probably not much, early next year however, expect to see the first Z-Wave product pass thru the new certification requirements based on the specifications produced by the SDO. Longer term this is all part of Z-Wave becoming an open standard with more silicon providers and software stack provides implementing new features all to make Z-Wave last for years to come.

The goal is to make Z-Wave THE sub-GigaHertz radio standard for IoT devices. Z-Wave is simple, low power, doesn’t require a lot of FLASH/RAM (IE: it runs on cheap MCUs) and most of all interoperable all the way back to devices released over two decades ago. Sub-GigaHertz means the radio passes thru walls and travels longer distances with less interference than the 2.4GHz protocols.

I want to remind everyone to register for the Works With Virtual Conference coming up in just a few weeks! Click below to check it out – HEY it’s FREE!

Register for the Works With Conference by clicking here. Learn how to integrate into Amazon, Google, Apple an other IoT ecosystems.

Works With is much more than just Z-Wave. All the key eco-system players are there explaining in detail how to be a part of their world. This is a technical conference so don’t miss it. I’ll be giving a demo of the latest version of Simplicity Studio V5 and how to quickly build, debug and certify Z-Wave applications.

Z-Wave Works With Amazon, Google, Samsung, Apple, Comcast Virtual Conference

Silicon Labs is hosting what was intended to be an in-person conference in Austin Texas but is now a virtual online conference on IoT ecosystems – the Works With Smart Home Developer Event September 9-10. The best part is it is now FREE to attend any of the in-depth technical sessions and you don’t have to wear a mask. The downside is that we don’t get to experience all that great music down in Austin – well, there’s always next year!

Virtual IoT Works With EcoSystems from Google, Amazon, Apple for Z-Wave development engineers
https://workswith.silabs.com/

I am hosting the Z-Wave track and will be making several presentations including a detailed look at Silicon Labs latest release of Simplicity Studio V5 which just came out yesterday. We’ll also have presentations on developing Z-Wave Smart Hubs and Z-Wave Certification. I’ll also be describing some IoT failures – you learn more from your failures than your successes. We have speakers and engineers from all of the ecosystem partners, not just Silicon Labs folks. Learn from the experts from across the industry!

What is Works With 2020? The smart home developer’s virtual event where you will have the opportunity to interact with our ecosystem partners from Amazon, Google, Samsung, and Z-Wave to connect devices, platforms and protocols and be able to immerse yourself in keynotes, a panel discussion on Project CHIP, hands-on, and technical sessions led by smart home engineers who are building the latest advanced IoT devices. The Works With event is live, all-online, free of charge, and you can join from anywhere around the world.

Works With Z-Wave Apple, Google, Amazon, Samsung IoT SmartHome conference 2020

Click here to Register Today and feel free to forward to the rest of your team.

Here’s an overview of what you won’t want to miss:

Specialized Engineer-Led Tracks – Educational sessions and technical training designed for engineers, executives, developers, business development and product managers.

Hands-On Workshops More than 12 workshops and hands-on sessions to give you experience, knowledge and confidence to develop and accelerate smart home development.  

One-on-One Developer Meetings – Schedule a meeting with Silicon Labs or an ecosystem partner to get 1:1 technical guidance.

Join me in September and learn how to smoothly get your IoT device plugged into any and all of the ecosystem partners. Register today, it’s totally free and you can join from anywhere in the world. See you September!

How Much FLASH/RAM Am I Using?

One of the most common questions in embedded programming is “How much FLASH/RAM am I using?” or more precisely, “How much do I have left before I run out?” or even “How much do I have to squeeze my code to fit in the available space?” Yikes! Very often the code size quickly fills to fit the available space and then you start struggling to fit all the features in your product. This problem afflicts the Z-Wave 700 series just as much as any other IoT development. I’ll give you a few hints on tools to measure the code size and figure out where the bloat is and options to squeeze a little more code in.

ZGM130S Resources

The first step is to understand how much FLASH/RAM we have in the Z-Wave ZGM130S. Open the datasheet and we see there is 512K FLASH and 64K RAM. Seems like a TON! But wait, a closer look at the datasheet and there is a note that only 64KB FLASH is available for the application and 8KB RAM. That’s not a lot for a complex IoT device like a thermostat with an OLED screen but is plenty for a simple on/off light switch. Like any engineering trade off, the chip balances the available resources to match the most common use cases.

The Z-Wave stack isn’t huge so fortunately there is sufficient space available for most applications. However, the stack developers have reserved most of the the FLASH and RAM space for future upgrades. There is no easy to use tool that precisely measures how much code space is being used for the stack versus the application. In this post I’ll give you some tools to see how close you are to the total and then subtract a typical sample application size to find the amount your application is using. INS14259 section 5.1 gives the typical FLASH usage for the Z-Wave sample applications.

Half of FLASH (256K) is reserved for the Over-The-Air (OTA) firmware image. This block of flash is used when the firmware is updated and the data is stored here temporarily until the signature is checked and the code can be decrypted. Once that test has passed then the code is copied down into the normal FLASH space and the chip reboots into the new firmware version. If you need a lot more than 64K of FLASH you can consider moving the OTA storage from the upper half of the ZGM130S to an external serial FLASH. This is supported in the Silicon Labs Gecko Bootloader but requires some coding to free up all that space. This also requires hardware support for the external FLASH chip. So if you think you’re going to be short on code space, I highly recommend adding a serial FLASH chip even if you don’t use it right away. I plan to describe the OTA to external FLASH process in a future blog posting so stay tuned.

ARM Tools

Before starting with code size analysis be sure you are working with the “release” build and not the debug build. Click on Project->Build Configurations->Set Active and select the Release build. Then build the project. The debug build uses minimal optimization and has tons of ASSERT and PRINTF code in it which invalidates the code size analysis.

ARM eabi-size

When you compile a Z-Wave project it will run the arm-none-eabi-size -A <project.axf> command which prints out an obscure listing of the sizes of various FLASH segments. The DoorLockKeyPad sample application produces the following:

DoorLockKeyPad.axf  :
section             size        addr
.nvm3App           12288      475136
.simee             36864      487424
.text             168760           0
_cc_handlers         120      168760
.ARM.exidx             8      168880
.data               1132   536870916
.bss               28956   536872048
.heap               3072   536901008
.stack_dummy        1024   536901008
.ARM.attributes       46           0
.comment             126           0
.reset_info            4   536870912
.debug_frame        1120           0
.stabstr             333           0
Total             253853
  • What does all this mean?
  • FLASH = .text + .data
    • .text = code which lives and runs out of on-chip FLASH
    • .data = initialized variables
      • IE: int myvar=12345; results in 12345 being stored in FLASH and then copied to RAM on power up
      • Thus .data uses both FLASH and RAM
    • The other 2 segments are in FLASH space but subtract from the total available
    • .nvmApp = Application non-volatile memory
    • .simee = SDK non-volatile memory
  • RAM = .bss + .data
    • .bss = Variables not explicitly initialized
      • gcc normally zeroes on power up
    • .data = initialized variables
    • .heap = heap used for dynamic memory allocation
    • .stack = the stack for pushing return addresses, function parameters and other things
  • The other segments can be largely ignored
  • The available FLASH is 256K minus the .simee and .nvmApp=256K-12K-36k=208K
  • The available RAM is 64K minus the heap/stack=64K-3K-1K=60K
  • Thus:
  • FLASH=168760+1132 = 169,892 bytes = 80% utilized
  • RAM=28956+1132 = 30,088 bytes = 49% utilized

You can see that the SDK code and the application are all mashed together without a way to identify how much the application is using. But at least you know when you are running out. Note that each release of the SDK will change the amount of flash used by the SDK code and possibly the ZAF. Note that the ZAF is considered part of the Application code.

Commander Flash Map

Another easy way to check how much FLASH is being utilized is to use Commander to display a map of FLASH. Start commander and connect to the DUT then use Device Info->Flash Map to get a chart like this one:

ARM eabi-nm

If you want to know which functions and variables are the biggest chunks of FLASH/RAM usage use the nm command: arm-none-eabi-nm <project.axf> --print-size --size-sort -l | tail -30

Address  Size   Type Symbol
00018c84 00000444 t process_event
0001c760 00000454 T IsMyExploreFrame
000172a4 00000454 T TransportService_ApplicationCommandHandler
000185aa 000004d2 T S2_application_command_handler
0001de00 000004e4 T crypto_scalarmult_curve25519
0001098c 0000054c T IsMyFrame
00017ee4 00000590 t S2_fsm_post_event
00010318 00000674 T IsMyFrame3ch
20006c14 00000708 B channelHoppingBuffer
000138a0 000007e8 T CommandHandler
00021960 00000888 T FRC_IRQHandler
00011790 00000890 T ReceiveHandler
2000628c 000008ac B the_context
20007590 00000c00 N __HeapBase
00019788 00000e04 T mbedtls_internal_sha1_process
00026f68 000019cc T RAILINT_0cdb976df793f6799e20dfa42e2be4c6
00074000 00003000 b nvm3AppStorage
00077000 00009000 B __nvm3Base
00077000 00009000 B nvm3Storage

The third column need a little decoding: T/t=.text (FLASH), B/b=.bss (RAM) D/d=.data (both FLASH and RAM)

You can also tell if it’s FLASH or RAM by the address – FLASH starts at 0 and RAM starts at 0x20000000. Starting from the bottom of the list above you can see that the NVM3Storage is 36K which is naturally the largest block of FLASH. Followed by the 12K of NVM3 Application storage. From there the sizes drop fairly quickly but you can guess the function based on the name. RAILINT is a bunch of Hardware Abstraction Layer (HAL) code. mbedtls is the Security S2 encryption functions. The HEAP is the largest single block of RAM followed by “the_context” which is a fairly large structure the ZAF and the SDK use to store the security and routing information.

Now that you can see the heavy users you can see if there is something amiss. Perhaps a buffer can be reused instead of using unique buffers for various functions. Look carefully for any unused functions in your source code. GCC often will leave “dead” code in place because it can’t tell if you’re using it as a dynamic callback function so to be safe it leaves the code in there. Thus, review your code and make sure you don’t have dead functions or variables or entire buffers that are never used.

The most common method to squeeze more code in is to try various options in the GCC compiler. The more recent versions of GCC have added Link Time Optimization (LTO) which can significantly reduce the code size (claims are up to 20%!). Simplicity Studio is moving to newer versions of GCC later this year so more of these options will be available. Worst case is to refactor your code to make it more efficient or drop features.

Other Tools

There are other tools like Puncover and Bloaty which can help with managing code size growth. I haven’t personally tried these but they seem like they would help. If you use a tool that helps manage code/RAM let me know in the comments below. We all need help in squeezing into the available space which is never enough!

Z-Wave Virtual Academy

Z-Wave Virtual Webinar Wednesdays at Noon Eastern US time

Doctor Z-Wave will be giving a hands-on live demo of getting started using Z-Wave with Simplicity Studio on Wednesday June 17. This is a live demo with just a couple of slides so you don’t want to miss it. The session is a short roughly 30 minutes with time for Q&A afterward. I will show you some simple things on setting up Simplicity to make your life easier when getting started. If you can’t make it, it will be recorded and available via the Alliance web site.

There are lots of other topics for Webinar Wednesdays:

Webinar Wednesday Schedule*: *This schedule will be updated regularly on the Z-Wave Alliance website as the series progresses
May 27, 2020  
  Manufacturing During a Global Pandemic: Insight & Strategy from Companies Who Are Coping Hosted by: Avi Rosenthal – Bluesalve Partners  
June 3, 2020   Social Distance Sales for Uncertain Times: Tips & Insight for Integrators Hosted by: Jeremy McLerran – Qolsys
June 10, 2020   Residential Smart Lock Market: Trends, Use-Cases & Opportunities Hosted by: Colin DePree – Salto Systems
June 17, 2020 Z-Wave 700 Series: Getting Started Hosted by: Eric Ryherd – Silicon Labs
June 24, 2020  
  Feature of Leedarson Z-Wave 700 Series Security Products Hosted by: Vincent Zhu & Michael Bailey Smith – Leedarson
 

How to Upgrade Your 700 Series Project from SDK 7.12 to 7.13

This is a very specific posting for Z-Wave developers and specifically for those developing with the new 700 series chips. If you’re not a 700 series developer you can probably stop reading…

I have posted details on upgrading from the 7.12 to the 7.13 Software Developers Kit at this Knowledge Based Article on the Silicon Labs web site: https://www.silabs.com/community/wireless/z-wave/knowledge-base.entry.html/2020/03/30/upgrading_700_seriesprojectfrom7122to7133-VZrM

Z-Wave SDK 7.13.3 released last week with a number of important stability improvements – you want to upgrade your 700 series project to this release!

  • Several stability improvements to prevent lockups in certain corner cases
  • RSSI reporting corrections (both 500 and 700)
  • Improved timing for routed acks and fixed sticky Last Working Routes
  • OTA Firmware Activate support delaying rebooting into the new firmware until all units have been downloaded
  • Details are found in SRN14629.pdf which is included in the Simplicity Studio release: SDK Documentation->End Device->SRN14629 Z-Wave 700 SDK 7.13.x

700 Series Debug Header

We’re all trying to make the Smart Home products smaller and less visible. Using coin cells instead of bulky cylindrical batteries significantly reduces the size of many products. The challenge with making products smaller is that the area available on the PCB for a debug header is in short supply.

With the 500 series I usually used a 0.1″ spacing 12 pin header from the ZDP03A programmer to the target board. The header was normally not installed in the final product but for debug purposes the solid thru-hole connector meant I would reliably program a device the first time and every time.

However, many customers I’ve worked with want to use less PCB real estate which means they come up with a custom set of test points. Typically a jig with spring loaded pins are used to contact to the PCB or more often wires are soldered to the PCB. The problem with this solution is that the jig is large, expensive and fragile. Soldering a cable to a board often results in a fragile connection where the cable can easily break a pin and not be immediately obvious. I’ve spent far too much time trying to figure out why I could program the part a minute ago but now I can’t only to realize the cable has a loose wire.

Nasty unreliable hand-soldered fragile Z-Wave debugging cable
Unreliable Z-Wave programming cable

500 Series Header

My recommendation for the 500 series is to use a full size 12 pin 0.1″ spacing connector for programming and debug. Either SMT or thru-hole is fine but either way you have a solid, reliable, portable connection. While this worked OK with the 500 series which typically used large cylindrical batteries, the 700 series often uses coin cells which doesn’t have the real estate for a full size connector.

Reliable 500 series header = 15x5mm

700 Series Debug Header

Fortunately Silicon Labs has an even better solution for the 700 series – use a 0.05″ spacing 10 pin SMT header. The Mini Simplicity Debug connector is described in AN958. If you have a little room then use the standard SMT header which is 6x6mm. If you are very tight on real estate then put down the pads for the thru-hole version of the connector but hand-solder the thru-hole header to the pads. Using just the pads results in a header only 3x6mm. You can’t tell me you can’t come up with 18sqmm to make the PCB debug reliable!

Either solution requires only a small amount of space on a single side of the PCB. Usually the header pads can be under a coin cell since during debug a power supply is used instead of the battery. This same header can be used for production programming using a jig to contact to the pads. Having a standard and reliable connection to the PCB will save you time during debug and on the production floor.

Reliable 700 series header = 6x6mm

Conclusion

No matter how tempted you might be to come up with your own cable/connector/test points, DON’T DO IT! Use the standard Mini Simplicity connector to save you so many headaches during debug. A solid, reliable debug connection is an absolute must otherwise you risk spinning your wheels chasing ghosts that are caused by a flakey connector. Take it from me, I’ve tried to debug just way too many of these over the years and it is not fun.

Z-Wave Summit 2019

ZWaveSummit2019

If you missed the Spring Z-Wave summit in Amsterdam, you won’t want to miss the fall summit in Austin Texas this fall! The Z-Wave summit is a great place to meet other Z-Wave developers and hear about the latest technology and marketing advances Z-Wave has to offer. The summit is a three day event that is packed with valuable information and learning for both developers and marketing people. The first day is an evening networking get together, the 2nd day is mostly roadmap presentations and information about how Z-Wave is doing and where it is going. The final day splits into two tracks with developers learning about details of the Z-Wave technology and marketing folks learning how to leverage the Alliance resources and all the events coming up.

Join us October 2-4, 2019 for the Z-Wave Fall Summit 2019 in Austin. Hot topic will be the 700-series! The 3-day event features a keynote & reception on the evening of the 2nd by 2-days of informative and insightful Developer’s Forum Technical Track and a Business / Marketing Track and a member networking evening event on the 3rd. All members are invited to attend!

 Date: 10/2/2019 to 10/4/2019
When:

10/2/19 – 5pm-9pm – Opening Reception

10/3/2019 – 9am-5pm – Developers/Marketing forums and Evening Reception

10/4/2019 – 9am-4pm -Keynote & forums – UnPlugFest

Where: W Hotel
200 Lavaca Street
Austin, Texas  78701
United States

Notes from EU Summit in May

The EU summit was right on the heels of the 700 series general release so it’s been a very busy time for everyone. The Z-Wave roadmap has major releases every 6 months and minor between them. Silicon Labs plans on a long life and improving volume shipments for Z-Wave and is investing in the technology to realize these gains. Attendance at the summit was up again with attendees from around the world.PANO_20190509_160327.vr

Certification costs recently went up but adding frequencies (countries/regions) is now just paperwork and free. The all important Certification Test Tool (CTT) was recently updated to version 2.8.4. If you are heading to certification with your new whiz-bang product you’ll want to test it with the latest version of all the scripts. The CTT is actively being improved so future releases will be even more powerful.

All 700 series Certifications must be Z-WavePlus V2. The logo remains the same but V2 raises the bar on several fronts. The main upgrade for V2 is that devices largely advertise their capabilities without the need for Hubs to write custom drivers for each and every device. Specifically, Configuration Command Class now requires the name/default/min/max values as well as some text describing the function of the parameter to be returned via a GET command.  However, 500 series are not required to support V2 but are recommend if you have the code space for it. For Hubs the bar has been raised significantly with Security S2, SmartStart and many interoperability improvements that will require some resources to achieve certification.

Expect the RF range minimums to go up from the current 40m (132′)  with the improved radio of the 700 series. The range requirement hasn’t changed yet but there are lots of discussions on the topic. Most of the devices I’ve worked on (over 3 dozen) have consistently achieved over 100m of line-of-sight RF range so I expect the minimum to probably double. This does make testing a bit more difficult as finding a location with 100m of open space can be a bit of a challenge, especially in winter!

Presentations

The presentations at the EU summit (and most of the previous ones) can be viewed from the Alliance web site. You need to be a member to gain access to them. My presentation (along with Axel Brugger) was a deep dive on using the 700 series and getting familiar with Simplicity Studio for developing end devices. Hans Kroner gave a similar presentation on developing gateways using ZIPGW/ZWare. While you can view the presentation online I highly recommend attending in person so you can ask questions and get straight answers right there on the spot.

Fun and Games

The members night was at The Beach which is an indoor beach games facility with lots of crazy stuff. Great food, Great music, Great games made for an enjoyable evening after a long day of stuffing information in your brain!

IMG_20190509_175126

 

The US Summit was held in Austin Texas Oct 2-4

The US summit was held at the Silicon Labs office and at the nearby W Hotel in Austin Texas. Another excellent turnout with lots of informative sessions from technical training to marketing by leveraging the Z-Wave Alliance.

This is just a fraction of the turnout at the US Summit in Oct 2019

Interoperables are BACK

Mitch Klein received a Lifetime Achievement award from CEDIA a few months back. As the leader of the Interoperables band we had quite the show all to ourselves along with plenty of food, drink and excellent conversation. By far the most beneficial part of the Summit is the chance to talk to your fellow Z-Wave developers, marketers, executives and enthusiasts.

The Z-Wave band “Interoperables” are back with a wide repertoire of classic rock

Zniffer File HowTo

Z-Wave developers have a handy tool for debugging firmware and Z-Wave network issues called the Zniffer. The Zniffer consists of two parts, the first is a USB dongle with special firmware and the second is the Windows program. You can’t buy just a Zniffer USB dongle (they come as part of some of the developers kits) but you can make one out of a standard UZB. You can even make a SuperZniffer as described in my previous blog posting. The Zniffer program is included in the Simplicity Studio IDE tools for developing Z-Wave products.

Zniffer traces are INVALUABLE when submiting a support case to the Silicon Labs Z-Wave support web site. I am an Field Applications Engineer so I often review Zniffer traces captured by developers who have questions or are reporting bugs. The problem is that many times I get a support case that says “Zniffer trace attached – what is problem?” and the Zniffer trace is several hundred megabytes with dozens of Z-Wave networks and maybe one hundred Z-Wave nodes captured across days of time. Talk about the proverbial needle in a haystack! So I am asking everyone to follow a few rules BEFORE attaching a Zniffer trace to a support case.

Zniffer File Rules

Before attaching a Zniffer file for Z-Wave support to review, include the following:

  1. The HomeID of the network with the problem
  2. The NodeID of the Z-Wave node that demonstrates the problem
  3. The line number or the date/time of the where the problem occurred (or a range)
  4. The Security Keys of the Z-Wave network
  5. A clear and concise description of the problem, what should have happened, what didn’t happen, what you believe is wrong

ZnifferHowTo1

HomeID

The HomeID of a Z-Wave network is a 4 byte, eight digit hexadecimal number that uniquely identifies a single Z-Wave network. Only devices with the same HomeID can talk to each other. In a development environment there are often dozens or even hundreds of Z-Wave networks in range. Remember the Zniffer captures every network in the air. Please do not filter the HomeID when saving out the Zniffer file as there may be critical interactions with other network or even noise that will be filtered out if you save only the matching HomeID. We can always filter by HomeID when displaying the network on our PC but we can’t see the data if its not in the file.

NodeID

The NodeID of the node that is displaying the issue has to be identified. You might have dozens of nodes in the network who are all talking at once so we need to know which one is the one with the problem. Please include details of the device as well such as what type it is (binary switch, thermostat, sensor, battery powered, etc) . Ideally if you can include the device within the zniffer file that will tell us just about everything we need to know as the NIF will be exchanged and the interview will take place.

Date/Time

Each transaction in the Zniffer trace is identified by a line number on the left side or the date/time. Indicating the line number or date/time or a range of these will help us navigate the potentially huge Zniffer file and quickly zoom in on the problem. Wading thru days of Zniffer data to finally find the interesting bit is just wasting our time and yours.

Security Keys

If you are working with Secure devices you MUST include the security keys. Without the security keys the data is encrypted and it is all just meaningless ones and zeroes and we can’t help you. Now that all devices are required to be secure, the key file is critical. The Zniffer trace has to include the SPAN table update as without the SPAN table we again cannot decrypt the message. The easiest way to be sure the SPAN is included is to add the device-under-test (DUT) to the network while capturing the Zniffer trace. The other option is to power cycle the DUT which will usually cause the DUT and the controller to exchange Nonces to resynchronize the SPAN table and we can once again decrypt the messages in the Zniffer.

To extract the security keys, join the PC Controller to the Z-Wave network. Be sure to enable all levels of security by providing the S2 DSK of the PC Controller. Once joined to the network, the keys can be saved to a file using the procedure below:

ZnifferHowTo2

The filename is the HomeID.txt which in the case above is FFE5B5C9.txt and contains:

9F;C592557B5F99DDC9BDD12D0D926BAFE5;1
9F;31944FE8F8DE2330E79741313A949190;1
9F;18FD847446AFD7E410B2BCF8912BC632;1
98;C65D55C44FB2156635CA07A48D362AD3;1

To decyrpt the messages in the Zniffer, just click on Load Keys and enter the directory for the file. Then all the messages are decrypted and we can help you solve your problem.

Pilgrimage to the Z-Wave Homeland

I finally made the trip to the home of Z-Wave, Copenhagen Denmark. Z-Wave began as Zensyszensys-logo back in 2001. My journey with Z-Wave began shortly after that in 2003 when I was disgusted with my highly unreliable X10 home automation experiments. I just couldn’t get that X10 junk to work! It was cheap, but it wasn’t worth my time and frustration so I was looking around for other technologies that would be reliable. I experimented with several custom baked wireless solutions but quickly realized that wireless is really hard and complicated. Z-Wave caught my eye because it was a real mesh network and actually worked. From there I have continued to be impressed by the technology improvements always with full backward compatibility and wide choice of fully interoperable products from many manufacturers.

My purpose is to meet the engineering team and to learn in much greater depth the details of Z-Wave and especially the new 700 series. We have an intense group of smart engineers working diligently on the many aspects of a wireless system as complex as Z-Wave. One team is busy with the gateway specific parts of the protocol and the Z/IP Gateway and Z-Ware code. Another team is working on the protocol and solving very complex issues that we find are happening in the real world. The support team (of which I am part) helps customers get their products to market quickly by answer their questions and providing training. And of course there are the marking and sales folks who make sure you all know about the benefits of Z-Wave.

ZWaveCPHThe Z-Wave team in Copenhagen resides in this modest building. Danes love to bicycle to work or take the excellent train/bus system. Only a few travel via car unlike those of us in the US who just love sitting in traffic for hours. The food in Copenhagen is wonderful with plenty of international choices as well as the Danish favorites. My hotel room is more of a spaceship pod than a boring room with the Danish penchant for efficient minimalism. The view of the windmills in the distance and the quaint classic European architecture are beautiful.

Z-Wave EU Summit

The other purpose of my visit across the pond is to attend and speak at the Z-Wave EU Summit. If you are a Z-Wave developer, I highly recommend attending as you’ll learn about the latest Z-Wave technology and best practices to build robust IoT products. We have two technical tracks in addition to the marketing track. Meeting with your fellow Z-Wave developers to share your experiences and learn from theirs is the main value of the summit. The only way to get that experience is to attend in person. See my other postings about last years EU summit and the US summits to get a feel of what goes on.

The summit is next week in Amsterdam. Click HERE for more details on the summit.

z-wave_spring_summit_19

 

 

Z-Wave Watchdog Timer Best Practices

WatchDogVirtually all embedded systems must run 24 x 7 x 365 x many many years without ever being rebooted. Since there is no one there to “press the reset button” if the device fails, the watchdog timer is there to do just that. The 500 series Z-Wave chips from Silicon Labs have a watchdog timer and the example code provides a very minimal use of the watchdog timer. However, the minimal use in the example code is not sufficient to provide a robust watchdog for embedded Z-Wave devices. This post explains some rules and methods to code a robust watchdog timer.

Long time embedded expert Jack Ganssle has a great article on Watchdog timers. He describes the use of a watchdog timer on the Clementine spacecraft where a fault in the system caused the spacecraft to dump virtually of its fuel resulting in the loss of the mission. The lead software engineer had wanted a watchdog but the designers decided not to include it. Jacks example shows how important it is to spend at least some time coding a robust watchdog for our IoT devices. While our devices aren’t controlling multi-million dollar spacecraft, we are coding light switches that are hardwired into the wall and cannot be easily rebooted. Try telling the customer to go into the basement and toggle the power to his entire house to reboot the light switches!

What is a Watchdog?

A watchdog timer is a timer that runs constantly. Typically a complex combination of events resets (or “kicks”) the watchdog timer every now and then, usually every few milliseconds. If the combination of events ever gets stuck, the timer will continue to run. If the watchdog timer “times out”, the system is reset – basically the reset button is pushed! Your embedded system reboots and keeps on running. Generally no one even realizes it has rebooted (I’ll discuss that problem in more detail shortly).

WatchdogTimerThis diagram shows the Watchdog timers value which is constantly counting up. Every time the Watchdog is “kicked”, the counter is reset to zero. Somewhere in your code the ZW_WatchDogKick() routine is called which resets the watchdog timer. Sometimes this reset condition happens on a nice regular basis, sometimes it happens at varying times as shown by the level of the timer. The key is the timeout threshold has to be longer than any normal operating condition. If a fault condition occurs, the timer keeps on counting up until the threshold is reached and then the system is reset. When the watchdog timer fires, the Z-Wave chip goes thru a full reset just as if power had been removed and reapplied. Your embedded system is back up and running as if nothing had happened.

SiLabs Sample Code = Minimal Watchdog

The SiLabs sample code has the following implementation of the watchdog:

BYTE ApplicationInitSW(ZW_NVM_STATUS nvmStatus) {
...
#ifdef WATCHDOG_ENABLED
 ZW_WatchDogEnable();
#endif
} 

void ApplicationPoll(void){
#ifdef WATCHDOG_ENABLED
 ZW_WatchDogKick();
#endif
}

The sample code has the good implementation practice of putting the Watchdog code inside #defines so it can be easily enabled/disabled. Unfortunately it blindly kicks the dog every ApplicationPoll without checking any other conditions. ApplicationPoll is called roughly every few hundred microseconds and a lot of fault conditions can exist and ApplicationPoll will still be called. With this implementation the only way the watchdog is going to fire is if there is a catastrophic failure and ApplicationPoll is no longer being called. While this implementation is better than nothing, it won’t reset the system in many cases where the device has become unresponsive. This is where you come in, you have to add more code to the watchdog algorithm. It may be easy to just use what SiLabs provides, but for a robust product you really need to spend some time adding your own conditions to the watchdog algorithm.

A Better Watch Dog Example

Writing good watchdog code requires some significant thought and testing. The possible sources of failure need to be discussed with members of the team and with other Z-Wave developers who are fighting the same fight (thus the need for this blog). I can provide a few guidelines to include in your analysis but this is not a complete solution. Only you know all the possible failure modes of your product and that requires some serious thought and analysis.

Mutex Gets Stuck

The most common failure I have seen is the fact that the SiLabs provided Application Framework (AF) mutex can get stuck. When the mutex is stuck, it most often results in the device still able to receive Z-Wave traffic but often can’t respond. If the device is power cycled, then it returns to full operation. So often this failure goes unnoticed both in testing and in actual use.

What is the mutex you ask? The mutex is a simple flag in the AF that prevents the code from overwriting the Send Buffer while a message is currently being sent over the radio. When a GET command comes in, the AF will call a command class handler to handle the GET and build a REPORT frame in memory. When ready to send the frame, the AF will call pTxBuf=GetResponseBuffer() to get a buffer for the radio to send. There is only one buffer so if the buffer is already in use, you get a NULL pointer back and will have to wait and send the frame later.  This in general works fine as long as frames don’t come in too fast. But in a large network with lots of repeated and re-routed frames you will occasionally get a bunch of GETs quickly and it is possible for the REPORTs to get cross wired and end up locking up the mutex for a frame that will never be sent. If the code then doesn’t properly release the buffer, the mutex is stuck. The Application Framework code is known to lock the mutex occasionally so you must code around this problem. The easiest solution to this rare event is to ensure the watchdog is watching the mutex and simply reboot if it gets stuck for too long.

My solution is to have a counter that counts up once per second in ApplicationPoll anytime ActiveJobs() is true (in SDK 6.81.xx its now called ZAF_mutex_isActive()). ActiveJobs is true anytime a buffer is in use and false when all the buffers are free. There are actually two buffers, one for response frames (REPORTs sent as a result of a GET) and a second buffer for request frames (unsolicited notifications).

Application Specific Reasons

Beyond the mutex you must think long and hard about application specific failure conditions. The most obvious is that the device has not received or sent a frame in 25 hours. Most hubs will poll a device at least a couple of times per day to make sure it is still alive. So if there has been no traffic in a day, maybe something is stuck and a reboot is in order. Plus if nothing has happened in a day then probably no one will notice the reboot (which only takes 1.5 seconds). You do have to be careful that some other part of the application isn’t impacted as a result of the reboot. For example, if you are a light switch and by default you turn the light off on a reboot, then people will be really annoyed if the light randomly turns off because your hub hasn’t polled it in day. There are lots of potential checks you can make here but every application will have different requirements so you will have to think hard about all the possible conditions for your specific case.

Sample good watchdog:

E_APPLICATION_STATE ApplicationPoll( E_PROTOCOL_STATE bProtocolState ) {
...
if (ActiveJobs()) {              // Mutex buffer is busy
    if (OneSecondTimer) ActiveJobsCounter++;  // Once/sec increment
} else {
    ActiveJobsCounter=0;         // When buffer is free clear counter
}
...
if ((ActiveJobsCounter<30) &&       // Mutex isn't stuck 
    (LastCommsHours<25) &&          // Got a frame in the last 24 hrs
    ApplicationSpecificReasons) {   // Other reasons
    ZW_WatchDogKick();              // Everything is OK so reset WDOG
}

In the example code above we do have a major issue in that if the counters stop counting for some reason, the watchdog will never fire! But that’s easy to check for in ApplicationPoll and if ApplicationPoll itself isn’t running then the WatchDog is no longer being kicked so it will reset.

Doesn’t Work If Not Tested

The old coding adage (proven totally true by me many many times) goes “If the code hasn’t been tested, it doesn’t work”. Same thing applies to your Watchdog code. So how do you test the watchdog? The first thing to do is to log the number of times the watchdog has triggered. This has to be stored in NVM since RAM will be lost when you reboot. Fortunately ApplicationInitHW is called with the bWakeupReason parameter which lets you know the watchdog fired when equal to ZW_WAKEUP_WATCHDOG. Note that usually ApplicationInitHW just stores the bWakeupReason and later in ApplicationInitSW we check it as the NVM isn’t available in InitHW.

ApplicationInitSW(...) {
...
if (wakeupReason==ZW_WAKEUP_WATCHDOG) { // Increment WDOG counter with max 255
    i=MemoryGetByte((WORD)&EEOFFSET_NumberWatchDogResets_far);
    if (i<255) MemoryPutByte((WORD)&EEOFFSET_NumberWatchDogResets_far, i+1);
}

Use a Configuration Command Class parameter to read or update this value for testing purposes. I also like to put in a small block of code wrapped in #ifdef WATCHDOG_TESTING_ENABLED that upon receiving a BASIC_SET with a value of 0xDE (not a valid value) calls GetResponseBuffer() which locks up the mutex and in 30 seconds the chip should reboot. If not, then you have a bug in the watchdog code! You can test all the branches in your watchdog code with various values of a BASIC_SET.

When to Enable Watchdog

Perhaps a better question is when NOT to enable the watchdog since ALL production builds absolutely must have the watchdog enabled! My recommendation is to disable the watchdog during development. You want the chip to lock up if you have a bug. The watchdog is really good at masking major bugs since things just keep on working. If the device locks up, then you know something is wrong and you need to chase it down. If you power cycle and the device is fine again, IT IS NOT FINE! You have a bug in your code! During production testing I usually turn the watchdog back on but I also have the testing scripts check the watchdog counter and if it increments then the test fails.

Watchdog Best Practices for Z-Wave Developers

  1. Disable Watchdog during development using #defines
  2. Only kick the watchdog when everything is idle
    1. Kicking every ApplicationPoll is INSUFFICIENT
    2. Check the ActiveJobs() being stuck (aka Mutex)
    3. Check other conditions within your product
  3. Check that the RF has received something every X minutes or hours
  4. Have a way to test the Watchdog during development
  5. Store the number of Watchdog resets in NVM and retrieve them via a configuration parameter