PO to PME ETL job performance

NOTE: The following settings do not represent a recommendation for production environments due to the numerous variables involved when approximating them.

They simply show the details of an in-house test system that was used to show the effect of these settings in a test environment.

They may be used as starting points for the application engineer when determining how to configure jobs in the field.

The application engineer should determine appropriate settings for each job based on observations of job execution time and other factors.

Testing Environment and Setup

Power Operation – Server 2012 with 4GB of RAM, 2 Processors 3.46 GHz

Added CM4000 meters with 70 trend tags logging 15 minutes intervals.
For the 35K trend tags: 500 CM4000 meters
For the 105K trend tags: 1500 CM4000 meters.

All the CM4000s were in memory mode. Outside of just logging the trend tags, the SCADA project was not doing anything else.

Power Monitoring Expert – Server 2012 with 4GB of RAM, 2 Processors 3.46 GHz

Test execution

In these tests, the number of requests was set to cover 1 day worth of data. For 35,000 tags, that equals 3,360,000 rows of data. For 105,000 tags, that equals 10,080,000 rows of data.

The ETL task for 35,000 tags was configured to make 35 requests with each request containing up to 100,000 records.

The ETL task for 105,000 tags was configured to make 30 requests with each requests containing up to 1 million records.

Due to the 4GB of RAM available on the virtual machines, the maximum number of records inserted into SQL had to be set to 10,000 records. If the number of records were higher, SQL insertion performance could be affected and the ETL task would stop and write a message to its log files.

For these tests, the ETL job was run again right after it finished. For the 35,000 tag test, it ran 2 to 3 times and for the 105,000 tag test, it ran 6 to 7 times.

Using a value of 1 hour for the 'sleep time between executions' job setting, it would take 2 to 3 hours to catch up for the 35K tag scenario, and 6 to 7 hours to catch up in the 105,000 tags scenario.

NOTE: The grouping tests that were conducted were not done under load. 2.5 GB of RAM was dedicated to SQL Server. If other tasks were occurring on the server, then it is very likely the ETL job would take longer to execute. Since systems vary so much, use the settings listed here as a starting point, not as a recommendation. Application engineers should calibrate PO to PME ETL performance on each system based on observations such as job execution time and other factors.

Test 1 Grouping Settings – 35,000 tags (recorded every 15 minutes for 3,360,00 records per day):

Setting	Value
PO Extract Task > Data Request Record Size	100,000
PO Extract Task > Maximum Number of Requests	35
PME Load Task > Enable Limit on Records per Insert	True
PME Load Task > Maximum records per insert	10,000
Advanced > Grouping Options > Max Data Request Per Group	7
Advanced > Grouping Options > Max Groups Per Job Run	5

Test 2 Grouping Settings – 105,000 tags (recorded every 15 minutes for 10,080,000 records per day):

Setting	Value
PO Extract Task > Data Request Record Size	1,000,000
PO Extract Task > Maximum Number of Requests	30
PME Load Task > Enable Limit on Records per Insert	True
PME Load Task > Maximum records per insert	10,000
Advanced > Grouping Options > Max Data Request Per Group	6
Advanced > Grouping Options > Max Groups Per Job Run	5

Recommendations

In general, running ETL jobs on servers with more RAM can have a positive effect on performance.

The time between execution can be set accordingly. If you want to run the ETL tasks more frequently, the time in between the job execution can be set lower. However, this can lead to the ETL task running and requesting data when no new data is available in Power Operation.

Background information

Internally, the ETL job processes all available data for each device-topic pair before moving on to the next pair. This operation is based on the position counter for each device-topic pair, and the max number of records per request.

Example: A device-topic pair records data every 15 minutes, each pair would log approximately 70,000 records every 2 years:

4 records/hour * 24 hours/day * 365 days/year * 2 years = 70,080 records

When running the job for the first time every device-topic pair will be starting from the default position of 2 years ago relative to job creation time. This can also be changed via the 'Initial Position Counter Time Stamp' task setting.

Assuming the job is configured as follows: 1 million records per request, and only 1 request, then 1 million records will fit 14 device-topic pairs each time the job runs.

(1,000,000 / 70,080 = 14.27)

If you increase the number of requests allowed per job to 3,000,000 then 42 device-topic pairs (each having 2 years worth of data) could be extracted each time the job runs. Once the job progresses forward (closer to the current time), then the expected number of records per device-topic pair gets smaller, and thus the number of pairs that fit into 1,000,000 records increases.

So the first few times the job is run, it is advantageous to configure it to run more often than it will once it catches up to the current time. For example, when running the job as a service, set 'sleep times between executions' to be 30 seconds or lower -- if appropriate for this PO installation. Once the job progresses closer to the current time for all device-topic pairs, then the 'sleep time between executions' could be set back to 900 seconds (15 minutes).

You can tell how far along the job is by checking the Positions tab when editing a job. Timestamps for each pair are listed there.

The appropriate choice for 'sleep time between executions' during the initial runs will depend on how much data is in the system and other variables. If there is less than 2 years of data available in the system, then it will help to adjust the 'Initial Position Counter Time Stamp' task setting forward in time. This will mean that more device-topic pairs would fit into the allotted 1,000,000 records per request.