Diagnostics Viewer

Diagnostics Viewer is a tool you can use to troubleshoot network communications problems and related network errors.

Starting Diagnostics Viewer

  1. Start Management Console and log in.
  2. Click Tools > System > Diagnostics Viewer.

TIP: For instructions on using filtering, sorting, column selection, and pin/unpin to customize the Diagnostics Viewer display, see Customizing and navigating interface displays.

Navigation pane

Diagnostics information is grouped as follows:

  • Service Diagnostics: Contains diagnostics information for certain services (ION Network Router Service, ION Site Service and ION Log Inserter Service).
  • Communication Diagnostics: Contains diagnostics information for the sites, hardware devices, and software nodes.

Select an item in the navigation pane to display its diagnostics information.

If you add a new device to the system while Diagnostics Viewer is open, you can refresh the tree view to display the new device by collapsing then expanding the root node of the tree.

Diagnostics Information pane

The diagnostics information pane displays detailed data about the state of your power monitoring system and devices.

Service Diagnostics

Service Diagnostics records communication problems and similar events occurring with the product's software components.

Communication Server diagnostics

Information about the communications server is arranged in these tabs:

  • Console Messages lists all ION Network Router Service and ION Site Service console messages for the current session.

    TIP: The blank area below the Description column header is a dynamic filter field. Type the wildcard character (*) in front of the text you want to search (for example, to display only messages prefixed with WARNING, type *warning). The diagnostics information pane automatically displays only those records that match the text you typed in the box.

  • Connection Status displays the current status of the software components connected to Network Router.
  • Tree States displays the ION tree status of all nodes (hardware devices and software nodes).

Log Inserter diagnostics

The Log Inserter diagnostics information pane is split into two sections. The top section (Select Nodes pane) contains the available nodes, while the bottom section contains the node details.

Select nodes to display

In the Select Nodes pane, select the check box beside a node to display its diagnostics information. Clear the check box to hide that node’s diagnostics information.

TIP: If there are many nodes and you want to display only a few of them, right-click the Select Nodes area then click Clear All. Select only the nodes you want to display. To display all the nodes again, right-click the Select Nodes area and click Select All.

Node details

The node details are organized in these tabs:

  • Node Information provides diagnostics associated with the communication status of each selected node. If the Log Inserter is not configured to gather data from a given node, it does not appear in the list in the Node column. If the Log Inserter is configured to automatically gather information for a node, but that node has not yet been processed, it does not initially appear in the list. Once information becomes available, the node appears (if it has been selected).
  • Node Performance provides per-node performance summary information.
  • Log Performance provides performance information on a per log basis.

The following table summarizes the columns on the Node Information tab:

Column Description
Node The name of the device, VIP, or Log Inserter.
DeviceType The device type of the associated node that is returned by the device itself. The Log Inserter uses this to detect device swap outs.
SerialNumber The serial number of the device that is returned by the device. The Log Inserter uses this to detect device swap outs.
Configured Polling Interval (s) The requested polling interval in effect. It can be configured either from the log upload control or from the custom Windows Registry value. All of the nodes for which polling is disabled are identified with Polling Disabled in this column.
Average Update Interval (s) A weighted average time between polled results for the device. The most recent interval accounts for 20% of the value, and the previous average accounts for the remainder. If the value deviates from the average by more than 30s, then the old average is discarded and the current interval is used. By default, the expected value for devices that support logs is the Configured Polling Interval (s) value. The expected value for devices that do not support logs is 60 seconds.
If the Log Inserter is selected but it is not configured to collect data from its System Log Controller, it appears in the diagnostics and shows 300s for Average Update Interval. Initially this value is n/a.
Time Since Update (s) The time in seconds since the last communication with the node. This time includes polling updates, record uploads, and configuration loads.
CommStatus

Can be one of the following values:

  • alive – The node is communicating.
  • late – If a response to a polling program is not received within 3 minutes, the Log Inserter sends a ping. If the ping does not respond in 10 minutes, the communication status is set to late and another ping is sent. The system continues pinging every 10 minutes until a response is received.
  • expired – If a ping returns before the response for any preceding request, the original request was lost. The request is abandoned and the communication status is set to expired. A request can be lost if a destination Site Server or VIP is shut down. The state changes from expired when the device responds to a request.
  • timeout – A request to the device timed out. The device is not communicating.
  • site not connected – The site is currently not connected.
  • cannot send – An unrecoverable error. The Log Inserter cannot send a program to the communications subsystem. The Log Inserter shuts down if Network Router is not running. Restart the system.
  • invalid password – The password entered for this device in Management Console is invalid.
  • password changed – The password for this device has been changed. Update the password for the device in Management Console.
  • site not responding – The connection is unexpectedly broken during communication with the device.
  • device disabled – The device or its site is not enabled. Note that Log Inserter automatically removes this node from this list if the node had been detected by automatic means.
  • does not exist – The device is not registered in the system. In auto-mode, the device eventually disappears from the list unless it is referenced remotely by a VIP.
  • pending – No responses have been processed.
  • nack'd – The request was not acknowledged. This could mean that the Site Server hosting the device is not running.
  • validating – Treemon reported that the device is not responding. A signal is sent to Treemon to validate the state. This state clears once Treemon (via Validator) establishes communications with the device.
Comments Under steady-state conditions, this is blank. While the Log Inserter attempts to upload configuration information, this can contain a string value indicating that the Tree is in use by another client. This indicates that the Log Inserter cannot process the device until the aforementioned client releases it. If the client is ION Designer, it is not released until the node is closed in Designer or Designer is closed. If the client's name ends with -not-clean, the node is currently being evaluated by Treemon/Validator.
AggregateSetupCount The aggregate setup count of the device. The Log Inserter uses this to detect configuration changes.
RequestedIONs

The number of ION registers, modules, and/or managers that have been requested from the tree. The Log Inserter needs to upload configuration information to determine which logs need to be processed, which labels should be used for measurement mapping and source resolution, and which labels to use for event cause and effects.
The Log Inserter retrieves the currently cached tree from Treemon, populating as needed by communicating directly with the device. The tree is locked for the duration of this process, and this prevents Designer from opening the tree.
If the value is:

  • none – No configuration information is currently required. This is typical in a steady-state condition.
  • cache – Only the currently cached configuration is required. This is typically seen at startup.
  • A number – The Log Inserter needs specific information and that number of ION objects has been explicitly requested.
RequestStatus The status of the tree requests can include one of the following values:
  • ready – The Log Inserter does not require any configuration information.
  • requesting – The Log Inserter requires configuration information and is in the process of gathering it. The value in Request Update Time indicates how long it has been processing this request.
  • retrying – A previous tree request was not successful. (See the Comments column for the reason.) The request is retried, as shown by the value in Request Update Time. The amount of wait time before retrying a request depends on the nature of the unsuccessful tree request:
    • Tree in use by another node – 10 seconds.
    • Tree dirty – 10 seconds.
    • Not responding – 60 seconds.
    • Tree request timed out after 10 minutes – 5 minutes.
    • Comm error – 10 seconds.
    • Other errors – 5 minutes.
  • blocked – The Log Inserter requires configuration information but all available resources are in use. By default, the Log Inserter can simultaneously request only up to 2 trees per site and 6 trees in total. The Request Update Time value indicates how long the request has been pending.
  • processing – The Log Inserter has received the requested ION objects and is processing them. The Request Update Time value indicates how long this request has been processed, including the time during the "requesting" state.
  • abandoned – This is the same as the retrying status but the request of some of the configuration information was not successful following the successful receipt of some information. The Log Inserter recovers when it retries the request.
Request Update Time (s) The time varies depending on the status of the tree requests described for RequestStatus.
pID The program ID of the program used to poll the current position counters. The Log Inserter now performs its own polling, and as a result, the entry in this column is not used for diagnostic purposes.

The following table summarizes the columns on the Node Performance tab:

Column Description
Node The name of the device, VIP, or Log Inserter.
Responding Indicates whether or not the node is responding. For a VIP, this includes all external nodes connected, directly or indirectly, to the input of a Recorder. The responding state is used to determine whether or not the download of the log is caught up.
All Logs Polling Disabled Indicates if log upload is disabled for all recorders on the device. A Yes in the column indicates that log upload is disabled.
TotalLogs The total number of Data Recorders, Waveform Recorders, Event Log Controllers, and System Log Controllers that the Log Inserter is configured to collect data from a given node. Note that when automatically detecting these modules, this number may change as the Log Inserter gathers configuration information.
PendingRecords The total number of records that the Log Inserter has requested from the node but has not yet received.
OutstandingRecords The total number of records not yet uploaded based on the last read position counter on the device and the position of the last uploaded record, taking into account the maximum depth of each log.
ProcessedRecords The number of records that have been inserted into the database. Note that a record typically corresponds to a number of DataLog entries. The term "record" refers to records at the device level.
Generated Rec. per sec An estimate of the number of new records being generated per second.
Retrieved Rec. per sec An estimate of the number of records being uploaded per second.
Avg Retrieval Time (s) The average round-trip time in seconds taken to retrieve a record from a device.
Avg Processing Time (s) The average time in seconds necessary to insert a record into the database.
RestoredLogs The total number of logs that the Log Inserter is configured to gather information for.
ManagedLogs The total number from the value in RestoredLogs that is being monitored by an enabled Log Acquisition Module (LAM).
ConfiguredLogs The total number from the value in RestoredLogs that are Recorders and have source inputs or are Event Log Controllers or System Log Controllers.
ConfirmedLogs The total number from the value in RestoredLogs for which the current configuration is known.
NumCaughtUp The total number from the value in RestoredLogs for which the node is responding and there are no records outstanding or pending.

The following table summarizes the columns on the Log Performance tab:

Column Description
Node The name of the device, VIP, or Log Inserter in question.
LogHandle The handle of the Log Register or Event Log Register for this Node.
Responding Indicates whether or not the node is responding. For a VIP, this includes all external nodes connected, directly or indirectly, to the input of a Recorder. This state is used to determine whether or not it is caught up.
Polling Disabled Indicates which individual recorders are excluded from polling requests. A Yes in the column indicates which recorders are excluded.
PendingRecords The total number of records that the Log Inserter has requested from the node but has not yet received. This number includes event records that have been uploaded but are cached internally pending configuration information necessary to complete the processing of the cause and/or effect ION objects.
OutstandingRecords The total number of records not yet uploaded based on the last read position counter on the device and the position of the last uploaded record, taking into account the maximum depth of each log.
ProcessedRecords The number of records that have been inserted into the database. Note that a record typically corresponds to a number of DataLog entries. In this context, "record" refers to records at the device level.
Generated Rec. per sec An estimate of the number of new records being generated per second.
Retrieved Rec. per sec An estimate of the number of records being uploaded per second.
Avg Retrieval Time (s) The average round-trip time in seconds taken to retrieve a record from a device.
Avg Processing Time (s) The average time in seconds necessary to insert a record into the database.
Restored This is always yes. If the log is not "restored", it does not appear in the list.
Managed A Log Acquisition Module (LAM) is enabled that is monitoring this log.
Configured The log is a Recorder that has source inputs or it is an Event Log Controller or a System Log Controller.
Confirmed The latest configuration for the log has been uploaded. For a VIP Recorder that references external devices, directly or indirectly, the configuration information includes information from the external device.
CaughtUp The node is communicating, the current configuration is known, and there are no outstanding or pending records. For a VIP, any device on which the log depends for information must also be responding.

Alarm Service

Alarm Service provides the status of alarms that you configure and enable in the Software Alarms application.

The information is organized in a grid. The column labels indicate the type of information provided, such as Rule Name, Alarm Name, Alarm Status, and so on. See the Software Alarms Help (accessible from the Software Alarms application) for further information about configuring alarms for multiple sources and measurements.

Log Pipeline Service

The Log Pipeline Service diagnostics provides information on the Log Subsystem Data Pipeline. It shows a variety of statistics on log collection and insertion performance.

Log Inserter writes log data into a message queue instead of writing it to SQL Server directly. Another process (the Log Subsystem Router Service) reads the messages from the queue and writes the data to SQL Server.

Previously, the Log Inserter would wait for database writes to complete before processing the next piece of data. This effectively limited the rate of data insertion to something that SQL Server could handle. Writing to MSMQ is much faster: MSMQ can store messages in the queue faster than Log Inserter can retrieve them from the devices. However the performance of SQL Server has not changed, which means that data can accumulate in the queue faster than it can be inserted into the database. Allowing the message queues to become full (they have a limited storage capacity) results in failure modes that are difficult to handle automatically. To avoid this scenario we monitor the size of the inbound data message queue and prevent writes when it contains more than a set number of bytes. If the message queue reaches a specified capacity, then no future messages will be accepted from LogInserter and the write thread is "put on hold" until the message queue has dropped below a specified capacity. This ensures that LogInserter never considers data written that may be missed in the message queue due to over capacity.

The following table summarizes the columns on the Pipeline Status tab:

Column Description
Name Identifies the message queue (Primary or Secondary) and the type of data being tracked.
Duration Total time that the diagnostics have been counted. In practice this value will be identical for all rows.
Message Count Number of messages that have been processed from this queue since the service was started.
Messages Per Minute Number of messages that have been processed in the last minute.
Messages Per Minute Average Average of the messages per minute over the last hour (60 samples).
Messages Per Minute Max Maximum number of messages per minute over the last hour (60 samples).
Messages Per Second Number of messages that have been processed in the last second.
Messages Per Second Average Average of the messages per second over the last minute (60 samples).
Messages Per Second Max Maximum number of messages per second over the last minute (60 samples).
Processing Time Milliseconds Average Average time taken to process each message (milliseconds).
Processing Time Milliseconds Max Maximum message processing time (milliseconds).
Time Since Last Elapsed time since a message was last processed.
Start Time Utc Time in UTC at which the service was started.

The following table summarizes the columns on the Message Queues tab:

Column Description
Identity Identity of the message queue.
Queue Type Data or Control
Message Count Number of messages currently in the queue.
Message Kilobytes Size of messages currently in the queue (kilobytes).
Maximum Kilobytes

Maximum size allowed for the queue.  Note that a value of 4294967295.00 (4GB) is the maximum amount of data that the entire MSMQ service can support.  This effectively indicates that no queue-specific limit has been defined.

The Message Queues tab allows you to inspect messages flowing through the pipeline. Enabling message capture increases the load on the pipeline so this should not be left activated indefinitely.

The controls on the top select which messages are written to the table. This filtering only occurs as the messages arrive at the tab; once a message is displayed in the table it will not be removed when the Message Type or Source Filtering fields are changed. Rows already in the table may be filtered by the fields below the header.

The viewer buffers messages as they arrive to avoid locking up the receiving thread. If the buffer fills (which indicates that messages are arriving faster than the viewer can process them) a message indicating how many messages were missed will be written to the table.

Communications Diagnostics

Communications Diagnostics provides diagnostics information for sites and devices connected to the workstation.

Site overview

Diagnostics information for the sites are contained in these tabs:

  • Device Summary displays communications statistics for each site.
  • NetUser Status displays the number of ION programs currently in the ION Network Router Service queue (awaiting processing) and the total number of ION programs already processed.

NOTE: Requests and responses transmitted between the software components are referred to as “ION programs”.

Site/Device Diagnostics

Diagnostics information for sites and devices are summarized in these tabs:

  • Communication Status displays error rates and connection statistics for the selected site or device. The following information is available from the Communications Status tab:

    Column Description
    Node The device (or software node) name.
    Requests The number of communications requests transmitted to the meter.
    Responses The number of successful responses received.
    Request Ratio The number of requests sent to the device to fulfil the last client request. The value is always 1 for ION devices but it varies for Modbus devices.
    Total Errors The total number of communication errors.
    Total Err Rate (%) The ratio of Total Errors to Requests.
    Sliding Err Rate (%) The error rate in the last 100 requests. This can indicate a trend in communications performance.
    Time Util (%) The percentage of the communication channel utilized (serial line or Ethernet) on the site.
    Avg Resp Time (s) Average time in seconds for the meter to respond.
    Last Resp Time (s) The last response time, in seconds.
    Timeouts The number of timeouts. A timeout occurs when no data is received in response to a request.
    Bad CRC The number of bad packets received, that is, those that do not pass the error-detection checksum.
    Incompl. Frm The number of incomplete packets received, that is, those that did not have all the expected bytes.
    Broken Conn. Number of times the connection was lost to the meters on a site.
    Bad Frames The number of received packets that had an internal error.
    HW Errors Number of errors reported by the computer’s communication hardware.
    Misc Errors Number of other errors that do not fit any of the above descriptions.
  • Site Status displays site statistics such as connection status and totals.
  • Polling Status displays the number of programs currently in the ION Site Service queue (awaiting processing) and the total number of programs already processed.

Communication Status vs. Site Status

This section explains the difference between the statistics provided on the Communication Status tab and those on the Site Status tab.

“Total Errors” in the Communication Status tab is an ION Site Service derived statistic, while “Bad Responses” in the Site Status tab is a client derived statistic.

To explain this difference, consider a situation where a direct site is experiencing timeouts. Communications with the device is attempted according to two parameters: Connect Attempts (an advanced site property) and Maximum Attempts Multiple (an advanced device property). Multiplying the values of these two properties determines the number of attempts made to re-establish communications with the device.

For instance, if Connect Attempts is set to 1 and Maximum Attempts Multiple is set to 3, the device will go offline after 3 attempts (that is, 1 x 3).

The “Total Errors” statistic increases by one every time ION Site Service detects a timeout. However, the “Bad Responses” statistic only increments every time a response is sent back to a client.

Using the previous example, consider the case where four timeouts occurred and the device went offline. In this case, “Total Errors” increases by four, while “Bad Responses” only increases by one. If only two timeouts occurred, “Total Errors” would increase by two, while “Bad Responses” would not change.

The following information is available from the Site Status tab.

Column Description
Node The device (or software node) name.
Status The device communication status.
Current Attempt The current number of repeated attempts to communicate with the device.
Max Attempts The maximum number of attempts before flagging the device as offline (Timed-out).
Offline Count The total number of times the device went offline.
Bad Responses The total number of errors sent back to the clients, such as to Vista.
Last Response The time when the last response was received.
Last Attempt The last time that a request was sent to the device.
RT Data Reqs The total number of requests to the device sent by the Real Time Data Service.
TreeMon Reqs The total number of requests to the device sent by the TreeMon service.
VISTA Reqs The total number of one shot requests to the device sent by a Vista client (control, label requests...).
LogInserter Reqs The total number of requests to the device sent by the LogInserter service.
IONSERVICE Reqs The total number of requests to the device sent by ION real-time services.

Note that the last five columns on the Site Status tab are dynamic. That is, the columns are only shown when requests were sent to the device from a Power Monitoring Expert service or client.

Additional commands

The following sections describe additional display options and shortcut menus available in Diagnostics Viewer.

Diagnostic Details

In the tabs on the diagnostics information pane, double-click a row to display its Diagnostic Details screen. This displays the diagnostic information for the selected item only.

Use the Previous and Next buttons to view the details of other rows in that tab of the diagnostics information pane.

To copy information to the clipboard, select the rows you want to copy, then press CTRL+C.

Diagnostics Information pane shortcut menu options

Right-click the diagnostics information pane to display a shortcut menu. The following table lists all the commands available (though not all panes in Diagnostics Viewer provide all the commands listed):

Right-click Option Description
Update Refreshes the information in the diagnostic table.
Reset Resets the information in the diagnostic table (not available in the Communications Server Diagnostics display).
Copy All Copies all selected information to the clipboard.
Auto Scroll Enabled by default, this option is only available in the Console Messages tab of the Communications Server Diagnostics display. This option automatically scrolls and selects the latest console message. Clear this option to disable scrolling (that is., select and view an older console message without jumping to the latest one when Diagnostics Viewer refreshes).
Options Displays the Options dialog where you can change the diagnostics refresh rate. Note that changing the refresh rate frequency can affect the product's performance.