Announcement

Collapse
No announcement yet.

Portal hangs. Why?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Portal hangs. Why?

    I am running Portal 2.4.36 on a WinXP Pro, ver 2002, SP3. I am using SNAPstick on a USB expansion port, self powered. The remaining SNAP network consists entirely of (4)RF200 and (6)SM200 nodes. Data from the SM200 nodes is mcast to the RF200 nodes which in turn do a rpc(portalAddr, "logEvent", msg).

    These RPCs arrive at Portal at an average of 2.4 msgs/sec (from all RF200 nodes). Each message consists of 12 ASCII bytes. After a variable amount of time, 4-12 hours, the Portal eventLog eventually stops, no messages or scrolling. Using a sniffer on another computer, I see that the RF200 nodes are still sending RPCs to Portal. In Windows Task Manager, the Portal.exe process is still present, but no activity is indicated.

    I can ping any of the RF200s nodes from Portal to restart the eventLog.

    Do you have any idea what is happening? Thanks in advance!

  • #2
    When you say a "SnapStick", are you talking about the integrated USB keyfob, or a USB PaddleBoard with an RF200?

    What version of firmware is loaded on your RF200s and SM200s?

    Are you using encryption?

    Are the RF200s just acting like repeaters/aggregators for the SM200s, or do the RF200s also generate messages of their own independently besides forwarding the SM200s' info?

    Is that 2.4 msgs/sec/RF200? Or just 2.4 msgs/sec total?

    Are you using any scheme to try and prevent remote nodes from transmitting simultaneously?

    Do the remote-nodes self-initiate the messages, or is Portal or your Snap-Bridge requesting the info sequentially through the remote-nodes?

    Comment


    • #3
      Re: Portal hangs. Why?

      jpwyatt:Thanks for the quick reply!

      The SNAPstick is the USB keyfob.
      RF200 firmware are all 2.4.32 with the exception of one node which is 2.4.32 with AES-128.
      I am not using encryption.
      The SM200s came off the shelf from Digikey and are built into a custom circuit. They have firmware version 2.4.22 with AES-128 (out of date). I cannot easily update the firmware on the SM200s as they are built in a custom circuit.
      The RF200s are acting like aggregators for the SM200s. The RF200s just pass the SM200 messages on to Portal. The 2.4 msgs/sec are the total from all RF200s. To help reduce simultaneous transmits on RF200s I have saveNvParam (28,1) and (19,1) to reduce hops. On the SM200s, nothing.

      The SM200, nodes self-initiate the mcast msgs to the RF200s. All nodes and the SNAPstick are reachable without hops from Portal.

      Thanks in advance!
      Last edited by cfpops; 08-11-2014, 08:39 AM. Reason: forgot something

      Comment


      • #4
        A few ideas...

        2.4.34 is the current newest firmware. It includes a few important bug fixes versus 2.4.32, such as a radio-transmission lockup bug fix. 2.4.22 is *very* old firmware, with a variety of possible issues. I do understand that, as 2.4.22 does not support Over-The-Air (OTA) upgrades, it might be difficult to upgrade the SM200s. However, if you intended to eventually develop your solution in a large-quantity deployment, I would *strongly* recommend some methodology to get all nodes on 2.4.34 prior to deployment.

        However, since your sniffer is still showing transmissions coming from the RF200s, I'm hesitant to say that this is the root cause of the issue that you're seeing at the moment.

        An ID28 (Mesh Routing Maximum Hop Limit) of "1" is too low, I believe. In fact, I'm not sure why you would ever be getting anything from the RF200s. The RF200 data-path is one-hop to the Snap-Bridge, followed by a second-hop to Portal. In addition, I always recommend adding at least one or two "bonus" hops to any limit like this, so an ID28 of "3" on the RF200s would be the lowest I would recommend.

        An ID19 (Unicast Retries) of "1" is possibly ok, but it's my understanding that this value actually gets decremented on the initial transmission. So an ID19 of "1" ends up being just the initial transmission, with no retries at all. You may want to try setting this to "2", just to see if it is something with the initial transmission being lost.

        I always recommends IDs 16, 17, and 18 be set to "True" on every node in the network, even your Snap-Bridge. ID18 already defaults to True, but 16 and 17 start at False. These apply more to multicast messages than unicast, but it's still almost always a good idea. Please reference page 98 of the Snap Reference Manual.

        Bit 8 of ID11 (Feature Bits) enables a 2nd CRC on the data within the packet. It's my opinion that this should always be enabled on every node in the network as well. Be careful when enabling, though, as nodes with the 2nd CRC can't talk to nodes that don't have the 2nd CRC and vice versa. In addition, you'll need to change Portal via File -> Preferences -> "Append CRCs to RPCs" checkbox. You'll need to develop the total ID11 values yourself based on the bitfield that you're wanting. If all you want to do is leave it at the default and enable the 2nd CRC, then look at the current ID11 value on each node, and just add Bit 8 to what you're already seeing to determine the new value.

        If all your other NV Params are set at their defaults, then that should be relatively fine... assuming your mesh itself is solid.

        When in Portal, what is the Link Quality and Trace Routes to each of the RF200s in the network. Do all RF200s connect directly back to the Snap-Bridge? Do they hop the messages over some of the other RF200s? Are the Link Qualities low or variable? You'll need to do several "pings" to the node and note the Link Quality and how it changes to get a rough "average". Does the LQ ever drop into the single-digits?
        Last edited by jpwyatt; 08-11-2014, 09:05 AM.

        Comment


        • #5
          Re: Portal hangs. Why?

          Thanks again for your quick reply.
          I will upgrade to version 2.4.34 on the RF200s.

          Upgrading the SM200s is problematic right now. I must create a fixture to access the UART and get the firmware upgrade through a RS-232 port on my ancient laptop. The root problem is that the SM200s from the distributor have been sitting in inventory and have obsolete firmware. It would be fantastic if they could be pgraded before shipping. We do anticipate high volume usage so we will be sure that we have the latest hardware and firmware.

          Thanks for the good discussion on NvParams. I will experiment with IDs 11, 16-19, 28 as you suggested. All the other NvParams are at default except IDs 5 & 6 which are set at 1 for the RF200s and 4 for the SM200s.

          The LQs for all the nodes is stable in the current test procedure as they are all fixed in location. I have not yet checked Trace Routes. The RF200s, I believe, connect directly back to the SNAPstick. From the sniffer data, I see no hopping.

          I have attached a snapshot (LOL no pun intended)of the Portal network config including a bit of a node, M203. All the Mxxx are SM200s and all the Fxxx are RF200s.

          Thanks again, in advance!
          Attached Files

          Comment


          • #6
            Might I suggest changing your Link Quality display to Percentage rather than dB? At least for my own benefit.

            The problem for me with using dB is that I always forget that there's a missing "-" there. So, actually, higher values are lower Link Qualities. Best I remember, it's something like 95dB (really -95dB) is 0%, while 17dB (really -17dB) is 100%.

            In other words, those 84dB listings I see in your screen shot are pretty low Link Qualities. If there was some sort of random RF interference added to that environment, I wouldn't be surprised if your link to that node entirely went down.

            Depending on the location of your Sniffer, your Sniffer might be able to detect transmissions from your RF200s, while the SnapStick/Portal would not be able to do so.

            But, yes, please let me know how your own investigations and NV Param modifications go.

            Comment


            • #7
              Re: Portal hangs. Why?

              It is hard to think in -dBm ;>) See attached for % version. Caution: these % values may not correspond to the prior dBm values because I re-pinged the network!

              I will try some of your suggestions as time permits and get back to you.
              In the meantime, have you heard of anyone else experiencing Portal log freeze ups? It is as if the Portal process refuses input after a certain point. Can you simulate this problem?

              Again, many thanks!
              Attached Files

              Comment


              • #8
                Also under File Preferences increase the Data Logger Limits, Check the Refresh node information automatically and you may want to change the "Node Views" drop down from "Active Nodes" to "All Nodes"

                Comment


                • #9
                  Re: Portal hangs. Why?

                  Thanks for the tips. Will do.

                  Comment


                  • #10
                    PS: You can always write a SNAP Connect application and do what every logging you want. Just remember you can't share a bridge node with Portal at the same time. (http://forums.synapse-wireless.com/showthread.php?t=9)

                    Comment


                    • #11
                      Re: Portal hangs. Why?

                      The link in your last post takes me to the Synapse Support Forums Latest Releases page. Was there a specific thread you suggested I read?
                      Here is the link I followed:
                      http://forums.synapse-wireless.com/showthread.php?t=9

                      Comment


                      • #12
                        We did change the NV params as you suggested. We now have our first application in the field and we are still experiencing the Portal hang ups, otherwise things seem to be running well. I still do not have a way of easily upgrading the firmware on the SM200s 'mobile nodes' (in our lingo). Our new server is a Win 7.0 based system. We have a way to inspect the Portal process running under Windows using https://technet.microsoft.com/en-us/...rnals/bb896653 (a great tool!). While the event logger is hung, the portal.exe process continues to run and to be serviced by the Win kernel through Ready, Running, Wait:WrResource and other states. All it takes to get our logger or the Portal Event Logger to run again is to issue a command to any of the nodes which show up on the Network Configuration. (e.g. Node F001, ping).

                        Is there a 'back door' available to the portal.exe process to 1. detect, via an external "deadman" python program, whether the event logger is hung, and 2. via the external program, send a command to any of our nodes to un-hang.

                        We currently go through a semi-auto method with man-in-the-loop every hour or so during a normal day. HELP!

                        Comment

                        X