Fastest way of finding a row index in a DataView

alanjfpaterson

I have been spending a little time in fixing some bugs in the SourceGrid, which led on to making a few changes in the grid 'wrapper', which includes a method to find the row index of a specified data row in the grid's bound data view. The way this works is literally to iterate each row, compare it with the DataRow you are searching for until you find the correct one. So very realistically it is possible that you might need to try every row in the grid.

Now here is the thing - the way that our validation code works means that every time you tab away from a control in the 'details panel' we have to 'find' the row index of that row all over again because if you edit a field that is sorted (say by the sort header), the row may now be in a new position and we need to make the highlight follow the data.

So fairly obviously, 999 times out of 1000 the row does not move - but sometimes it does. (Some screens are actually always sorted, so they are more likely to have moving rows).

So to put it another way, every time we tab off a control in the details panel we potentially have to iterate through every row in the grid - even if the row position hasn't changed.

I became concerned that when there are a 'large number' of rows in the grid, the time taken might become noticeable. So I added a new parameter to the method to find the row index. I named this parameter AHintRowToTryFirst, which pretty much does what it says. We can supply a hint row - the row index that we know the row was at previously - so if it hasn't moved we strike gold at the first attempt! Only if that fails do we have to start looking through all the rows.

We then started to think if there was a smarter way of finding the row than what we can call a 'linear search'. ADO.NET does provide methods to 'Find' a DataRow or get the 'IndexOf' a row - but these apply to DataTables. The Find requires primary key values, and basically is a SQL Select statement. IndexOf is quick because every DataRow has an internal table _rowID column. So we can use these DataTable methods but ADO.NET resolutely refuses to provide a mechanism to get the rowID in a DataView. Once you have put your data into a SQL View you no longer get to know anything about a row's position.

But what you can do is to convert your DataView to a DataTable using the DataView.ToTable() method. This creates a new DataTable with rows in the same order as the current (sorted or not) view. Now you can use the table methods to find the row index. I call this a ViewToTable method. The steps are:
a. Create a Table from the View
b. Apply the same primary key columns from the View to the new table
c. Get the primary key values from the search row
d. Call IndexOf(Find(searchForDataRow))

I would suspect that a, c and d are quite quick - but the most time is in b, because that will force indexes to be created on the primary key(s). But having taken the hit for that, item (d) will be quick.

This then led to some discussion that maybe the linear search is quicker for a small number of rows whereas the ViewToTable search could be quicker for a large number of rows. So nothing for it but to run some tests ....

I devised a series of tests that
a. Used tables (and views) with varying row counts
b. Run same test using each of the two search methods
c. Run each test searching for the first row and the last row

My underlying data table had 2 primary key columns, which were both strings of about 30 characters. The difference between the rows was all down to the final three characters in the string. The data rows had 2 additional integer columns that were not primary key columns

The data table was re-created in memory for every test so that no caching was possible between tests (but the data table was always the same for a given row count).

I recorded the time to create the table (just for interest) and the time taken to find the row index - as well as the found index itself because obviously that had to be correct!!

So here are the results from one set as the first task the PC had to do after boot-up.
The first block of tests used linear search, the second block used ViewToTable search

Test: 10 row test using linear search on first record [10 rows] : Create took 0.00 ms, found at 0, took 0.0000 ms

Test: 10 row test using linear search on last record [10 rows] : Create took 0.00 ms, found at 9, took 0.0000 ms

Test: 100 row test using linear search on first record [100 rows] : Create took 15.60 ms, found at 0, took 0.0000 ms

Test: 100 row test using linear search on last record [100 rows] : Create took 0.00 ms, found at 99, took 0.0000 ms

Test: 1000 row test using linear search on first record [1000 rows] : Create took 31.20 ms, found at 0, took 0.0000 ms

Test: 1000 row test using linear search on last record [1000 rows] : Create took 15.60 ms, found at 999, took 0.0000 ms

Test: 10000 row test using linear search for first record [10000 rows] : Create took 187.20 ms, found at 0, took 0.0000 ms

Test: 10000 row test using linear search for last record [10000 rows] : Create took 156.00 ms, found at 9999, took 0.0000 ms

Test: 25000 row test using linear search for first record [17578 rows] : Create took 312.00 ms, found at 0, took 0.0000 ms

Test: 25000 row test using linear search for last record [17578 rows] : Create took 280.80 ms, found at 17577, took 15.6000 ms

Test: 10 row test using table search for first record [10 rows] : Create took 0.00 ms, found at 0, took 0.0000 ms

Test: 10 row test using table search for last record [10 rows] : Create took 0.00 ms, found at 9, took 0.0000 ms

Test: 100 row test using table search for first record [100 rows] : Create took 0.00 ms, found at 0, took 0.0000 ms

Test: 100 row test using table search for last record [100 rows] : Create took 0.00 ms, found at 99, took 0.0000 ms

Test: 1000 row test using table search for first record [1000 rows] : Create took 15.60 ms, found at 0, took 31.2001 ms

Test: 1000 row test using table search for last record [1000 rows] : Create took 15.60 ms, found at 999, took 0.0000 ms

Test: 10000 row test using table search for first record [10000 rows] : Create took 156.00 ms, found at 0, took 78.0001 ms

Test: 10000 row test using table search for last record [10000 rows] : Create took 163.60 ms, found at 9999, took 66.8001 ms

Test: 25000 row test using table search for first record [17578 rows] : Create took 280.80 ms, found at 0, took 156.0002 ms

Test: 25000 row test using table search for last record [17578 rows] : Create took 280.80 ms, found at 17577, took 140.4002 ms

Note: My algorithm could only create 17578 rows maximum. The 25000 was the number of rows I asked for - but I couldn't actually reach that number.

Discussion of Results
Well!

The various create times suggest that I genuinely did not suffer from a caching problem. Creating similar row counts did take similar times.
At 1000 rows the ViewToTable method started to show signs of strain (at the millisecond level) from index creation. By the time we get to 10000 rows the search is taking about 70 milliseconds and it doesn't matter whether we are searching for the first or last row.
The linear search is always the fastest - finding the last row in 10,000 still takes microseconds. Actually running the test multiple times produces slightly varying results, but I never saw more than 2 milliseconds for 10000 rows.

I did find this outcome a little surprising and I was left wondering whether the way the test was coded was inadvertently favouring the linear search. I concluded that in a way the code did favour linear search - but that actually our OpenPetra code is coded the same as the test, which is why linear search is in fact the method of choice. Let me explain ...

When we do a linear search there is just one line inside the for/next loop:
if (ViewToSearch[Counter].Row == RowToFind)

We supply RowToFind and it gets compared with the Row property of the indexer into the View. I had imagined that this comparison would involve testing each column (which is in an ItemArray property) of each of the two data rows. That would be time consuming. The only way that I can see that we can get the speed that we do is if what is being compared is the reference location of the two (i.e. the memory location). So I now envisage the DataView being a linked list of memory references to data objects. The linear search thus becomes iterating through a linked list comparing one 32/64 bit number until we find a match. That would indeed be the quickest way.

So now I needed to think about how my code could have produced that effect ....

When I created my last row (say) I have the typical line: DataRow lastRow = DataTable.NewRow(). I then populate lastRow, add it to the table and keep a reference to it which becomes the searchRow. I initially assumed that DataTable.Add(lastRow) would clone a copy of my row, but actually we now have to believe that the Add method just adds a reference to it - so now in COM terms there are two references to the data object. Once in the DataView these DataRows can be linked into a sorted list in a variety of ways but each object still refers to the same memory location it always had. But in this scenario, finding the index or an item in a view would be very quick in a linear search.

As an aside, the ViewToTable method has no chance of using this technique. It has to do a SQL Select statement to find our row by its primary keys - and construct an index to do that.

So now the question remains. Does the test I constructed 'cheat' in any way? When we use OP will we get the same fast result from a linear search?

The answer, I believe, is yes. We use exactly the same
if (ViewToSearch[Counter].Row == RowToFind)
as in my test (I copied the code). And here's the thing - the ViewToSearch is the DataBoundView that is bound to the grid. And what is RowToFind? That is a reference to our variable FPreviouslySelectedDataRow. Where does that reference come from? The same DataBoundView on which we call the method IndexToDataSourceRow - so our RowToFind reference comes from the DataView in exactly the same way as my test RowToFind does.

Conclusions
We should not worry about using the simple linear search method to find a row index into a view. The way that row references are associated with a view makes this always the fastest way of finding the row index and even when the grid has ten thousand rows the time taken is of the order of 2 milliseconds.

PS If anyone is interested to share my Console Application that runs the tests, I can pass it on but I doubt it is worth having in the project source repository.