Weird behavior processing IEnumerable<AFEventFrame>

I have some code where I search for some event frames, then try to process the results.

The issue I have is that processing the IEnumerable results is very, very slow. It consistently takes my application time from 5 seconds to 45 seconds.

 

Here is the code:

        public void LoadClosedTasks(CurrentShift currentShift)
        {
            //CodeTimer LoadClosedTasksTimer = new CodeTimer();
            if (currentShift == null) return;
            List<AFSearchTokenBase> tokens = AppliedSearchTokens.ToList();                  //Copy the list
            tokens.RemoveAll(x => x.Filter == AFSearchFilter.InProgress);                   //Remove tokens for "In Progress"
            tokens.Add(new AFSearchValueToken("|COMPLETED_SHIFT_ID", currentShift.ShiftID.ToString()));
            IEnumerable<AFEventFrame> RetrievedFrames = null;
            using (AFEventFrameSearch search = new AFEventFrameSearch(LogDatabase, "baseSearch", tokens))
            {
                //DateTime searchStart = DateTime.Now;
                RetrievedFrames = search.FindObjects(0, true, 0);
            }
            ConcurrentBag<OperatorTask> createdFrames = new ConcurrentBag<OperatorTask>();  //Add to concurrent bag to avoid multi-thread issues.

            return;

            Parallel.ForEach(RetrievedFrames, frame =>
                {
                    //OperatorTask frameInstance = new OperatorTask();
                    //frameInstance.PopulateFromFrame(MainForm, frame, this);
                    //createdFrames.Add(frameInstance);

                });
            }

For info, this routine is called using an await Task.Run.

You can see where I place the "return" statement in the code, to bypass the Parallel.ForEach() block. With return there, the application loads (doing lots of other stuff) in 4.5 seconds. If I remove the return statement, the application takes 45 seconds. And this is with the code body in the Parallel.ForEach block commented out, as it shows above! It is not the code in the block that is slow, because there is no code running and I still get the problem.

I also tried a synchronous ForEach across the collection. Same issue.

This suggests to me that the simple act of enumerating the returned search is taking 40 seconds. (Timing it confirms that).

 

So... how many elements is it iterating?

 

Glad you asked. The search results are returning exactly 1 result!

 

Any idea on what is happening here? Search.FindObjects does have LoadFull set to true, but switching it to false doesn't help. Setting the page size doesn't help. I really am at a loss on how to fix this.

  • Hello ​ ,

    Looks like an interesting problem. What's happening in line 14?

    Can you try the ForEach, not a Parallel.ForEach, inside the using? Like it is done in the example shared with AFSearch(T).FindObjects Method.

    You can also try if a trace identifies which activity the time is spent on: How do I use the AFGetTrace utility?

  • Hi Gregor.

     

    Line 14 is creating a concurrentbag. The Parallel.ForEach processes the event frame into a class instance, and adds the resulting instance to a collection. I use a ConcurrentBag, so that the Parallel.ForEach doesn't run into issues with multiple threads all trying to add to the collection at the same time. The ConcurrentBag is designed to allow that.

    I did try a standard ForEach. No difference. In fact, in the debugger, in the Locals window, trying to expand the RetrievedFrames variable, I get a timeout. I have tried it inside and outside the using - moving it outside the using was something I was trying to see if it helped.

    Thanks for the AFGetTrace details. Unfortunately, it requires admin privileges, and I'm not allowed them on my laptop. I might try to get one of the IT staff to help with that later....

    In the interim, I've cleaned-out the existing event frames. There were about 20,000 of them. I wouldn't expect the number to cause issues (should it?), but figured if it was due to there being so many, and this one being so old, that maybe?

    Note - other searches (for in progress frames) worked fine.

  • Hi ​ ,

    Event Frames are stored in the PIFD, the AF Server's backend SQL database. The performance of relational tables suffers the amount of records stored. Yes, the amount of Event Frames has an impact and there is limits. In the past the total limit of Event Frames was way lower than today. Since then, improvements were made to handle Event Frames better in the SQL backend and, if I recall this correctly, the Search namespace has been added to AF SDK. Maybe the problematic Event Frame originates from earlier days and maybe the migration didn't go well at some point. Would you be able to delete and re-create the problematic Event Frame? Does this eventually speed up operation?

     

  • You have obfuscated so much of your code that its difficult to focus on the problem. For example, you have an AppliedSearchTokens property that offers NO clues as to its contents. Does it filter on an event frame template? Does it filter on an element?

     

    Does attribute |COMPLETED_SHIFT_ID use a data reference? Some times, never, always?

     

    Performance is not determined by how many event frames are in your result set. It's about how many event frames are sent to the local host performing any client-side filtering.

     

    The RetrievedFrames property should be defined as a List or IList<AFEventFrame>, and the search loop should have

     

     RetrievedFrames = search.FindObjects(0, true, 0).ToList();

     

     

  • HI - yes, I've deleted the problematic event frames, and the problem has disappeared... for now! Will monitor if it comes back.

  • Hi Rick,

    AppliedSearchTokens is just a general list of tokens common to searches for a specific asset. There are three:

    1.) AFSearchFilterToken on AfSearchFilter.TemplateName

    2.) AFSearchValueToken on "|SITE" attribute (name data type)

    3.) AFSearchValueToken on "|Completed_Shift_ID", which is the GUID of an EventFrame (EventFrame.ID)

     

    Why do you recommend converting RetrievedFrames to a List or IList? The search.FindObjects method returns IEnumerable<AFEventFrame>; all I want to do is enumerate over it. List is derived from IEnumerable, and I don't need any of the additional properties that List gives me.

  • You did not answer my question about "|Completed_Shift_ID", Does it use a data reference? You say it is a Guid of an event frame. What relation, if any, does the "|Completed_Shift_ID" have to the found event frame's ID property?

     

    Does the "|SITE" attribute have a data reference?

     

    An AFSearch is a disposable object. It applies server-side filters against ALL event frames in the PIFD, and may send event frames to the client for additional client-side filtering. It also is a forward-only iteration. All of that means that if you want to persist found event frames beyond the scope of the disposable search, then you need to persist to the some appropriate object, and IEnumerable<AFEventFrame> is NOT such an object. That way, when you want to process over the found event frames, you do not have to take a lengthy search again, but rather have them readily available in a nice memory structure such as a List<AFEventFrame>. As for the List properties you do not need, may I be so bold as to suggest that you can ignore them?

     

    And, no, List is not derived from IEnumerable. Rather, List implements the IEnumerable interface by providing concrete code for the interface method signatures.

     

  • |Completed_Shift_ID is a string value type, with no data reference. |Site is a String Builder data reference, specifically to %..\..\Element%. All other attributes are String Builders. I understood that the FullLoad parameters means that these are evaluated for the EventFrames returned.

    I understand about disposable objects. Disposing of the Seach object does not dispose the results returned by it. And even if it did, IList derives from IEnumerable - they both store references to the underlying objects, not copies of those objects.

    As I said in my first reply, I tried processing the returned event frames both inside and outside the using. No difference in result, nor performance.

     

  • Good day again, Andrew. I am not trying to be rude, but you do seem to have a fundamental misunderstanding regarding .NET as well as how AFSearch operates. What you call "weird behavior" can be explained. Whether or not you accept the explanation depends upon your willingness to learn and your desire to improve the performance.

     

    IEnumerable<T> is an interface. It declares properties and method signatures for any class that implements the interface. In .NET Framework, an interface lacks any code. Thus, List<T> is NOT derived from IEnumerable<T>. Rather, List<T> implements its own specific code to adhere to the interface signatures required by the IEnumerable<T> interface.

     

    AFSearch classes produce a forward-only collection. The FindObjects method, or specifically the AFEventFrameSearch.FindObjects method does NOT actually return AFEventFrame items. Rather it returns a type of IEnumerable<AFEventFrame>, which potentially can return an enumerable list of the objects found. I say potentially because that returned type must be enumerated in order to bring the individual AFEventFrame items into memory. One way to achieve this is by using a foreach. Another way is to use the LINQ ToList() extension method.

     

    By all means you can keep RetrievedFrames as IEnumerable. But you still need to change your code to:

     

    IEnumerable<AFEventFrame> RetrievedFrames = null;
    using (AFEventFrameSearch search = new AFEventFrameSearch(LogDatabase, "baseSearch", tokens))
    {
        RetrievedFrames = search.FindObjects(0, true, 0).ToList();
    } // search object is disposed and any server-side cache is disposed.

    As of now, you are not using server-side caching. None the less, the variable called search is disposed. The found items have been enumerated into memory and saved. This is the critical step you have been missing.

     

    Now let's look at PERFORMANCE.

     

    Your search query will perform server-side filtering on (1) the template name and (2) |Completed_Shift_ID because |Completed_Shift_ID  is a static attribute, i.e. does not use a data reference.

     

    Quick definitions: a closed event frame is one where the EndTime is assigned, and an open or in-progress event frame is one in which the EndTime is not assigned. Technically, the EndTime always has a value; an open, blank, or seemingly unassigned EndTime is actually equal to AFTime.MaxValue.

     

    For event frames that are NOT in-progress, server-side filtering may be used with "|SITE", but only IF you have captured values. Note value capturing for closed event frames is the default.

     

    If you have not captured values OR the event frame is in-progress, then "|SITE" will be filtered client-side. In which case, a superset of event frames is sent to the client PC, where the AFSearch applies the client-side filters to reduce this superset to the final set, which you can enumerate over INSIDE of your AFSearch block.

     

    You have not shared any particulars about how many total event frames are in your PI System, or how many are used by the requested template. It would be nice to know how many you are truly expecting to be output, as well as a rough guess as to how many are to be sent for client-side filtering. That is, it would be helpful to know how big the superset will be versus the final set.

     

    This help is taken from the REMARKS section from the online help:

     

    <quote>

    To optimize getting items from several pages of search results, a search can be cached by setting the CacheTimeout to a non-zero value. If you will only be getting items from the first page, then it is best to leave the cache disabled. The cache is disabled by default. 

    </quote>

     

    There is a lack of clarity regarding "first page". The number of RPC calls, aka trips to the server, is not determined by the first page of final results, that is event frames that have passed the client-side filter. Rather, it is the first page of the superset of results that have passed the server-side filter. If you think it the search requires an additional trip to fetch more items, then you should employ server-side caching for a nice performance boost. If you think the superset will be less than a full page (default is 1000 items), then you do not need server-side caching.

     

    If you do need server-side caching, this code can be used.

    IEnumerable<AFEventFrame> RetrievedFrames = null;
    using (AFEventFrameSearch search = new AFEventFrameSearch(LogDatabase, "baseSearch", tokens))
    {
        search.CacheTimeout = TimeSpan.FromMinutes(10);
        RetrievedFrames = search.FindObjects(0, true, 0).ToList();
    } // search object is disposed and any server-side cache is disposed.

    I give a very generous cache timeout, but that is okay. Whether your search takes 2 seconds or 2 minutes, the cache will be immediately disposed of thanks to the closing brace on line 6.

     

    So I would suggest you use ToList() and you may want to experiment with using a cache timeout.

     

    Beyond that, there are potential performance benefits to using a "skinny search", that is to say using a FindObjectFields overload. But this would only be worth pursuing if (A) none of the event frames passing server-side filters are in-progress and (B) you have captured values. If neither of those conditions are met, then the skinny search will resort to being a fat one.

     

  • Thank you for your detailed reply.

    You are correct. I don't understand how the AFSearch operates, that is why I asked this question.

    The purpose of this community is to share information, build an understand, learn from others and help others in a respectful way. I am clearly wanting to improve performance, and I am clearly wanting to learn, because I have "solved" the problem by deleting the event frames, but am still wanting to understand the root-cause of the issue, in the hope that if it reoccurs, I can fix it, or ideally prevent that recurrence.

    As I understand it, you are recommending that I use ToList(), on the basis that it has the effect of forcing the items to be enumerated, thereby loading them into client-side memory.

    My point the execution of the was taking about 100 ms, with one result returned.

    I was observing that when enumerating the IEnumerable collection it would take 40 seconds to execute. This was whether I used ForEach, Parallel.ForEach or whether I was doing this inside or outside of the Using block, or whether I was using fullLoad=true or fullLoad=false, or even when I tried to look at the RetrievedFrames object in the locals window (that actually timed-out, without displaying the result). Even search.GetTotalCount() was taking 40 seconds to return. Calling search.GetTotalCount() followed by ForEach across RetrievedFrames (within the using block) was taking twice as long - 80 seconds. While I didn't try it, I would expect that calling ToList() would force the enumeration and also take 40 seconds.

    ​So I appreciate that I can optimise, and I appreciate that I can filter better and use server-side filtering, and I now understand that forcing the enumeration loads the results into memory (which I was doing....), but regardless of this, I don't see why enumerating or loading a single result into memory should take 40 seconds.

    In terms of metrics, we have ~215,000 event frames on our AF Server. This database ​had about 8,000 when I was running into this issue. I since deleted all the event frames with this specific template (which was about 6,000), and the problem has gone away.

    For the results, this query is looking at tasks that the operators complete on a given shift. The Shift itself is an event (one parent Event Frame per shift) with around 10 event frames for tasks that they are expected to complete (child event frames) per shift. These start out as InProgress event frames, and are closed when the tasks are complete (or closed by the server if they are not completed within a certain time period). The search and subsequent enumeration for the InProgress event frames was taking a couple of hundred milliseconds. The search for the closed frames (in this case, only one) was taking ~100 ms. Enumerating the returned result was taking 40 seconds.