Finding the most efficient way to handle large data request.

I’m looking for the best way to handle a large data request. By large, I am given a file containing about 200,000 to 300,000 different meters for up towards 150 various nonconsecutive days. I might have X meters for day 1, Y meters for day 2…, N meters for day 150. Each meter in the request will be repeated for different dates up 30 days per meter.

Our PI system consist 3 million meters scattered across 6 archive servers. Each meter will only be on one of the six archives.

I wrote a C# program using the PI SDK to handle this large request. Currently, I read the import file and then create a data structure based on date. I will then spawn a process for each date giving the list of meters and the specific date as input to the process; i.e. 150 processes. (I have a time delay of five minutes between every fifth process I spawn.) Each process will make a connection to the 6 archive servers, retrieve the requested data, and write out the meters and their data for the date specified to a file. Thus, I will end up with a file for each specific date in the data request. 

This program is working but I’m wondering if there is a better method to handle such a large data request? 

 

Parents
  • If you were using AF SDK in a .NET Framework 4.8 application, it could be slightly faster than PI SDK.

     

    Other than sharing really big numbers with us, you really haven't offered much to help with. You have roughly 500K PIPoints, aka meters, spread out among the 6 PIServers. My first question is how to you query them? If someone enters a Tag name, aka Meter number, do you automatically know which PIServer to query, or do you have to query each of the 6 PIServers hoping for a valid PIPoint to be returned?

     

    Obviously the app would be more efficient if you know which meter is on a given PIPoint on a given PIServer. This suggests having a file that links meters to PIServer and PIPoint. If you are using something like this, then the app would also be faster if you stored the PointID along with the Tag name. Searches by PointID are much faster than by Tag name, and since you have 500K+ plus tags on a server, this could help a lot.

     

    How much data exists for a given meter on a given day? Is it a total daily usage reading, hourly readings, etc? If it is a single daily value per day per meter, this is easy enough to work with. If it is hourly readings, then the large data request only gets larger.

     

    If you know which meter resides on which PIServer, then you can make bulk PIPointList (many meters) requests to that one PIServer. If it was me, I would therefore make my first pass through an input file segregate my meters into which PIServer they belong to. That may yield up to 6 lists of PIServers + PIPoints.

     

    Keep in mind a single data request can only return 1.5 million recorded values. Pretending you have only one value per meter for one day, I would advise against requesting 500K meters in a single data request. Instead, I would chunk my requests in pages of 10K PIPoints to a server.

     

    I do compliment you on being cautious about dealing with large numbers. Pausing between 5 minutes every 5th process may be overly generous, but again GOOD FOR YOU for at least being concerned about not choking your PIServers. But if you are only reading from the PIServer, you could shorten the timeout a wee bit.

     

    WIth timeouts programmed, etc., you seem to have realistic expectations that this will be a long running process. There could be a few other things to do to make it somewhat faster, but I would not recommend trying to push it too far.

  • Thank you for your reply. I guess I should have been a little more specific. Yes, I am using the AF SDK in a .Net Framework 4.8 console app. Yes, I actually do know which PI server each PI Point is on. The PI server is maintained as an AF Attribute, so it is a quick API call to retrieve the PI server.

    The process I use is as follows, I load the meters ID into a data table, with a column for PI server name. I iterate through the table of meters using the PI Server attribute to find the PI Point which is added to one PIPointList. I will then make one call to retrieve the data using PIPointList.RecordedValues. Breaking up the PIPointList by server does sound like a good idea. I will try that.

    The meters will have either 24 values (hourly) or 96 values (quarterly) per day.

    Again, thank you.

Reply
  • Thank you for your reply. I guess I should have been a little more specific. Yes, I am using the AF SDK in a .Net Framework 4.8 console app. Yes, I actually do know which PI server each PI Point is on. The PI server is maintained as an AF Attribute, so it is a quick API call to retrieve the PI server.

    The process I use is as follows, I load the meters ID into a data table, with a column for PI server name. I iterate through the table of meters using the PI Server attribute to find the PI Point which is added to one PIPointList. I will then make one call to retrieve the data using PIPointList.RecordedValues. Breaking up the PIPointList by server does sound like a good idea. I will try that.

    The meters will have either 24 values (hourly) or 96 values (quarterly) per day.

    Again, thank you.

Children
  • I would plan on worst case scenario of 96 values (quarter hour) per day per meter. If you query 10K meters in one PIPointList.RecordedValues call, then you will receive 960K values. I would suggest not using more than 10K in a page of PIPointList, or you may consider dropping it down to 5K.

     

    Good that you are using AFAttributes, as when the PIPoint data reference is created, there is a slow lookup to PIServer and PIPoint via names, but once found, the PIServer.ID and PIPoint.ID are persisted back to an internal ConfigString. After that initial create data reference call, then finding the PIPoint(s) should be fast since it uses the ID's.

     

    I trust you are using other bulk calls such as AFAttributeList.GetPIPoint.

     

    Rather than one-big timeout every 5 batches, you may want to consider a smaller timeout after every batch. As I say, give the PIServer "a chance to catch its breath" and share access for other services and users.