Performance of Skip and Take in Linq to Objects

Note: This is not relevant for Linq to Sql or Entity framework, where Skip and Take are transalted into their SQL equivalents.

This week I had to write some code where some 500.000 objects in a collection had to chopped up in pieces of 100 objects. My first thought was to use Skip and Take in a construction like this:

int count = collection.Count();
for(int i=0; i<count; i+=100)
{
  var x = collection.Skip(i).Take(i);
  //do stuff with x
}

But it turns out this does not perform at all. I wanted to understand why, and I also wanted to understand how bad the problem is.

First I made a micro benchmark. Normally I don’t micro benchbark very much, as I often find the small differences you find doesn’t make a difference in the big picture. But then again: sometimes it does. In my case the collection is a List<T> so an alternative is to use GetRange(index, Count). I decided to benchmark this for collection sizes from 1000 item to 1000000 items. The Y-axis below shows number of milliseconds to iterate through the collection. The X-axis is number of items in the collection. “do stuff with x” in my test is a simple x.ToList() in order to simulate that all items in x is enumerated. The lower graph shows time elapsed when GetRange is used.

image

Beside a few bad measurement points we se that the GetRange graph is constant at 1-3 milliseconds, while the time to itereate through the complete collection rises with something like n2.

A collection size of 500.000 elements takes about 13 seconds to iterate throgh. Double the size and it takes more than 45 seconds.

I then did a second experiment where I kept the collection size constant, but changed how many items to Take() at a time. Note that the X-axis is logarithmic in this case.

image

It is clear that the more we Take() at a time, the fewer time we have to call Skip(), and the faster it goes.

Why is Skip() so slow and why is GetRange() for List<T> so fast?

Well looking at the MS reference code shows that everytime you invoke Skip() it will have to iterate you collection from the beginning in order to skip the number of elements you desire, which gives a loop within a loop (n2 behaviour).

GetRange() on the other hand can use the knowledge that a list is used, and do some highly optimized copying of the desired elements to new collection. It doesn’t have to iterate to get to the first item to copy since the collection is in one continuus piece of memory, so a simple calculation can find the memory location to start copying from.

Conclusion: For large collections, don’t use Skip and Take. Find another way to iterate through your collection and divide it.

Advertisements

About Lund

Owner of iCodeIT, a software consulting company. I am primarily working the .NET development and architecture.
This entry was posted in C#, Linq and tagged , , , . Bookmark the permalink.

One Response to Performance of Skip and Take in Linq to Objects

  1. Pingback: LINQ and counting | iCodeIT

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s