Method to find data gaps in many tags (PIOLEDB Provider)

Method to find data gaps in many tags (PIOLEDB Provider)

 

Situation: Some disturbance on your control system causes a data outage. This could include one large data gap in one tag, many tags, or could cause intermittent data collection issues impacting different tags. Your job is to analyze the extent of data loss on a large set of tags quickly.

Here is a PI OLEDB Provider query that you could execute in PI SQL Commander to do all the heavy lifting. You will need to tweak some minor parameters, such as the date ranges, pointsource you want to analyze, what type of bad value do you want to look for (in this case Comm Failure), etc.

Hints:

  1. Keep the time window as short as possible and the list of tags you are analyzing as small as possible. In the example below, approx 20K tags where analyzed over a 5 hour period and the query execution time was approx. 30 mins. Make sure you have your timeout setting in PI SQL Commander set appropriately to accommodate the longer execution time.
  2. Look for **UPDATE ME** for parameter you will likely need to update for your use
  3. If there were no bad values, there will be no results.
  4. Tags that had NO good values during the window used will also not be returned in the list, so make sure the window you choose starts just before the bad data happened and ends just after a return to good status for accurate results. You can determine this by looking at high level interface health points.

 

Disclaimer: You should test this in a dev environment first.

Here she is:

-- This query retrieves and analyzes communication failures (status = -313) between events for specified tags.
-- It calculates the following metrics per tag:
-- 1. MaxTimeGap(sec): The longest time gap between consecutive communication failure events (status = -313), measured in seconds.
-- 2. AvgTimeGap(sec): The average time gap between consecutive communication failure events (status = -313), measured in seconds.
-- 3. AddedGaps(sec): The total accumulated time of all gaps between communication failure events, calculated by multiplying AvgTimeGap by the number of gaps.
-- 4. TotBadVals: The number of communication failures (status = -313). Other bad values (e.g., Unit Down) are excluded.
-- 5. TotGoodVals: The number of good values (status = 0), i.e., the count of successful communications during the specified period.
-- 6. FirstBadVal: The timestamp of the first communication failure event within the specified period.
-- 7. LastBadVal: The timestamp of the last communication failure event within the specified period.
-- 8. FirstGoodAfterLastBadVal: The timestamp of the first good value (non-comm failure) after the last communication failure event.
-- The results are ordered by the total accumulated gap time in seconds, the longest gap in seconds,the number of bad values in descending order, followed by tag in ascending

-- Notes:
-- (1) Running this against 20K tags for a 5-hour time period took approximately 33 minutes.
-- (2) Adding +1s to the NextEvent function ensures that the next good timestamp is selected without impacting calculation values.


SELECT stat1.*,stat2.TotGoodVals 
FROM (
    SELECT 
        tag,
        (hour(Max(NextEvent(c1.tag, c1.time + '1s') - time)) * 60 * 60) 
        + (minute(Max(NextEvent(c1.tag, c1.time + '1s') - time)) * 60) 
        + (second(Max(NextEvent(c1.tag, c1.time + '1s') - time))) AS "MaxTimeGap(sec)",

        (hour(Avg(NextEvent(c1.tag, c1.time+ '1s') - time)) * 60 * 60) 
        + (minute(Avg(NextEvent(c1.tag, c1.time + '1s') - time)) * 60) 
        + (second(Avg(NextEvent(c1.tag, c1.time + '1s') - time))) AS "AvgTimeGap(sec)",

        (((hour(Avg(NextEvent(c1.tag, c1.time+ '1s') - time)) * 60 * 60) 
        + (minute(Avg(NextEvent(c1.tag, c1.time + '1s') - time)) * 60) 
        + (second(Avg(NextEvent(c1.tag, c1.time + '1s') - time))) * Count(*))) AS "AddedGaps(sec)",
        min(c1.time) as "FirstBadVal", 
        max(c1.time) as "LastBadVal", 
        NextEvent(c1.tag,max(c1.time + '1s')) as "FirstGoodAfterLastBadVal",
        Count(*) AS "TotBadVals"

    FROM (
        SELECT *
        FROM piarchive..picomp2
        WHERE tag IN (
            SELECT tag 
            FROM pipoint..pipoint2 
            WHERE pointsource LIKE 'YourPointSource%' --***UPDATE ME***
              AND tag NOT LIKE 'excluded tags%' --***UPDATE ME*** tags to exclude
              AND tag NOT LIKE 'sy.st%'  --  --***UPDATE ME*** tags to exclude
        )
        AND status = -313 --Comm Fail (You can change this to whatever bad value you need or make this simply status <> 0
        AND time BETWEEN '9/19/2024 2:00 PM' AND '9/19/2024 3:00 PM' --***UPDATE ME*** your time range, keep it tight
    ) AS c1
    GROUP BY tag
) AS stat1
INNER JOIN (
    SELECT tag, 
           Count(*) AS "TotGoodVals"
    FROM piarchive..picomp2
    WHERE tag IN (
        SELECT tag 
        FROM pipoint..pipoint2 
            WHERE pointsource LIKE 'YourPointSource%'  --***UPDATE ME***
              AND tag NOT LIKE 'excluded tags%'  --***UPDATE ME***tags to exclude
              AND tag NOT LIKE 'sy.st%'   --***UPDATE ME***tags to exclude
    )
    AND status = 0
    AND time BETWEEN '9/19/2024 2:00 PM' AND '9/19/2024 3:00 PM'  --***UPDATE ME*** your time range, must match previous time range
    GROUP BY tag
) AS stat2
ON stat1.tag = stat2.tag

ORDER BY "AddedGaps(sec)" DESC, "MaxTimeGap(sec)" DESC, "TotBadVals" DESC, stat1.tag ASC;