Z-Wave is a wireless mesh protocol with over two decades of real-world learning built into the latest version. While the other new wireless protocols are still writing the specification for their mesh network, Z-Wave has learned a thing or two over the past twenty years. Z-Wave is a Source Routing protocol where the Primary Controller of the network keeps track of the best paths thru the network to/from any point to any other point.
Z-Wave limits the number of hops thru the mesh to four hops to bound the routing calculations to something an inexpensive microprocessor can handle. These four hops quickly explode into a huge number of routing combinations as the size of the network grows to more than a few dozen nodes. The trick is to pick the optimal set of routes to get from one node to the next. This is where the two decades of learning have proven to be the key to Z-Waves robust delivery.
Source Routing Introduction
The 500 series Appl. Prg. Guide section 3.4 describes the “routing principles” used in Z-Wave. While this is a 500 series document the 700 series uses the same algorithm with a few minor enhancements. The key to source routing is that the Primary Controller (PC) calculates the route from Node A to Node B. Each node along the way does not need to know anything about the routing, it just follows the route in the packet header determined by the PC. When an end node needs to talk to the PC or any other node, the PC will send the end node four routes to get from Node A to Node B. As a final backup route, Node A can send out an Explorer Frame asking all nodes within radio range if they can help get the message to Node B. If a node is able to help and the message is delivered, this route becomes what is known as the Last Working Route (LWR). Node A will then use the LWR route whenever it needs to talk to Node B.
There are a total of five routes stored in any node to get to any other node. Note that routes are calculated and stored only if a node is Associated with another node. Since most nodes usually only talk to the PC (Associated via the Lifeline – Association Group 1), that is the only set of routes it stores. The primary controller has the full network topology but still follows the same basic algorithm when sending a message to a node. The five routes are held in a list for each destination. If a message is delivered successfully, that route is moved to the top of list and is called the Last Working Route (LWR). The LWR will be used from now on until it fails for some reason. RF communication is fraught with failures and they will happen occasionally so the LWR often changes over time. When the LWR route fails, the list is pushed down and once a working route is found, it is placed at the top of the list as the new LWR.
Application Priority Routes
Application Priority Routes (APR) are special routes the Application can assign to a node to get messages from Node A to Node B. They are called “Application” Priority Routes because the protocol never assigns APRs, only the APPLICATION can assign APRs. Typically the application is the software that is talking directly to the PC – a Hub application like SmartThings or Hubitat or one of the many other Hub applications. The protocol assumes that someone smarter than it (meaning an expensive powerful CPU with tons of memory) can figure out a better route from A to B than it can. The protocol places the APR at the top of the 5 routes in the list and always keeps it there. Even ahead of the LWR. While this gives the application a great deal of power, it also means the application can make a mess of routing and inadvertently cause a lot of latency. Large Z-Wave networks tend to have dynamic routing which is why the LWR has been the key to the routing algorithm – Once you find a working route, keep using it!
I generally don’t recommend using APRs since the routing tends to be dynamic and it is often best to let the protocol find the best route. However, adding Direct Route APRs where the node will talk back to the Hub directly rather than routing thru other nodes can reduce latency. This sometimes solves the problem where the LWR gets stuck with a multi-hop route when the Hub could reach it directly. A direct route is the fastest way to deliver messages and multi-hop messages often can have noticeable delay to them. When a motion sensor detects motion in a dark room, speed and low-latency are central to maintaining a high WAF factor and quickly turn on a light.
Using the PC Controller to Assign APRs
The PC Controller has a section called “Setup Route” which has a number of ways of setting up various routes.
There are 5 different types of Routes that the PCC can setup:
|1||Return Route||Assigns 4 controller computed routes between 2 nodes||ZW_AssignReturnRoute() (0x46)|
|2||Priority Return Route||Assigns an Application Priority Route between 2 nodes||ZW_AssignPriorityRoute() (0x4F)|
|3||Set Priority Route||Assigns an Application Priority Route from the controller to a node||ZW_SetPriorityRoute() (0x93)|
|4||SUC Return Route||Assigns 4 controller computed routes from the end node to the controller||ZW_AssignSUCReturnRoute() (0x51)|
|5||Priority SUC Return Route||Assigns an Application Priority Route from the controller to an end node||ZW_AssignPrioritySUCReturnRoute() (0x58)|
1. Return Route
Return Route assigns four routes to the source node (left) to reach the destination node (right). Anytime an Association is made from one node to another, a Return Route MUST be assigned so the source knows how to reach the destination. The most common application is a motion sensor turning on a light without going thru the hub. For example; a motion sensor (Node 10) is associated with the light (Node 20) and then a call to ZW_AssignReturnRoute(10,20,SessionID) will send four messages to node 10 with four different routes to get to node 20. In this case the Application does NOT specify the route to be used but lets the Primary Controller calculate the best 4 routes. The source node can still use Explorer Frames to find a route if all four fail. During inclusion a controller should always assign return routes to the end node back to the PC so the end node has routes for any unsolicited messages (or use the SUC Return Route below). If the network topology changes significantly (nodes added or removed), then all the return routes of every node in the network should be reassigned to ensure the optimal route is used.
2. Priority Return Route
Priority Return Route is used to assign an Application Priority Route between two nodes. The only time I recommend using this command is to assign a priority route back to the controller to use no routing assuming the node is within direct range of the controller. It is too easy to mess up the routing with this command so in general I do not recommend using it.
3. Get/Set Priority Route
Get or Set the Application Priority Route (APR) the primary controller uses to reach a node. Since the node will use the same route to return the ACK this will become the LWR for the end node so both sides will use this route first. Note that this route is not set at the end node, only the controller will use this route. If the end node needs to send a message to the controller it will use this route if it is the LWR otherwise it will use one of its own assigned routes. Note that you can set the speed in this command. Be careful not to blindly set the speed to 100kbps. If the nodes in the path are older or the destination is a FLiRS device then they may only support 40kbps. Old 100 series nodes can only do 9.6kbps but they can still be part of the mesh. Note that you can GET the priority route (0x92) with this command if one has been assigned. If a Priority Route has not been assigned then the current LWR is returned.
The only application of Set Priority Route I recommend is to force nodes close to the controller to always try direct communication first. In this case, you would Set Priority Route with all zeroes in the route. This tends to make scenes that turn on a lot of lights run quickly so there is less popcorn effect. If a scene with a lot of lighting nodes fails to deliver to one of the nodes, the PC then searches thru routes to find a new route, the routed route becomes the LWR and the controller will continue to use the LWR until that route fails for some reason. By assigning a Priority direct route the controller will always try the direct route first. Since 700 series devices usually have excellent RF, if the controller is in the same room or at least on the same floor as the lights it is controlling, then the direct routes will minimize the popcorn delay. However, if the lights are not in direct range, it will just delay everything making the popcorn worse! So be careful in assigning APRs! Don’t make things worse.
The example above shows how to assign an APR direct route to Node 2. The function call for this would be: ZW_SetPriorityRoute(2, 0, 0, 0, 0, 3); Every time the PC sends a message to node 2 it will always try this direct route first, if that fails to ACK, then it will use the LWR then the other return routes it has calculated.
The example above shows an extreme example where we force routing to be the maximum number of hops of four. This is a handy way to test your product with a lot of routing! A zniffer trace of a message looks like:
The function call for this would be: ZW_SetPriorityRoute(6, 5, 4, 3, 2, 3); The PC will always use the route to send a message to node 6, if it fails, it will try the LWR and then the other return routes and finally an Explorer Frame.
4. SUC Return Route
The SUC Return Route is a shorter version of the Assign Return Route (1. above) which simply sets the Destination NodeID to be the SUC which in most cases is the Primary Controller.
5. Priority SUC Return Route
The Priority SUC Return route is again a short version of the Assign Priority Return Route (2. above) which automatically sets the Destination NodeID to be the SUC. It is generally easier to simply use the normal Return Route commands (1. aan 2. above) and fill in the Destination NodeID as the PC (which is usually the SUC) than to use these two commands.
The techniques explained here are not intended for general Z-Wave users but instead for the Hub developers and end-device developers. Since these are low-level commands and not something a user typically has access to, you’ll have to pressure your Hub developer to follow these recommendations.
Hub developers MUST assign return routes ANY time an Association is made between two nodes especially back to the Hub immediately after inclusion and assignment of the Lifeline. If the network topology changes such as when a node is added or removed, it may be necessary to reassign ALL of the routes to all nodes to take advantage of the new routes or eliminate nodes that no longer exist. Be careful assigning Priority routes especially if a node in a Priority Route is removed from the network. If a now non-existent NodeID is in an APR, the node will try really hard using the APR with the missing node before finally giving up using the LWR. This will result in annoying delays in delivering commands or status updates. Z-Wave will still deliver the message, but only after you’ve banged your shin into the coffee table in the dark because the motion sensor is still trying to send thru the missing NodeID in the Application Priority Route.
Priority Routes are important in connection with UL 1023 Household Burglar Alarm Systems.
The Hubitat folks are saying that secondary controllers are being dropped from certification going forward, will that change how any of the above works?
Secondary controllers are not being dropped from certification. Secondary controllers are a complex issue.
But relative to Hubitats comment, what they are talking about is that the Hubitat controller is no longer REQUIRED to be a secondary controller which brings with it a ton of complexity and things they are not prepared to support. So life is much simpler for them now that they are not required to support being a secondary controller.
THANK YOU for writing this blog post. I recently deployed a large z-wave network at a commercial property (about ~100 GE / Jasco In-Wall Switches, Dimmer, Motion Switches, Motion Dimmers) with 2 buildings that have metal exterior walls and roof. They are separated by an outdoor coridoor / walkway that’s 5-6 feet wide. I am using Home Assistant and Z-Wave JS to control the z-wave network using an Aeotec USB Z-Stick Gen 5+. I am having all sorts of problems with latency across the z-wave network. I discovered that I am able to move the USB stick over to my Windows machine and use the PC Controller to troubleshoot the network, but I didn’t quite understand the Routing component. This blog helped solve my problem!! No more latency issues!!
LikeLiked by 1 person
Thank you for the posts, very informative. Are you available for consulting work? I would like to ask you a few questions regarding z-wave mesh networks and I am happy to pay you for your time. Please email me if you are interested. Thanks
No I am not available for consulting work but my job is to help customers use Z-Wave effectively so I’m happy to host a 30 min call anytime to discuss your situation.