Sql deferred aggregation table pattern

I encountered a business requirement to track a teams points in a trivia contest on a social networking site. A team is defined as everyone who accepts your friend request, so an individual can have hundreds/thousands friends (no explicit limit). Each person’s has an implied team that encompasses all of their friends, so the tally of my team’s points is the sum of all points earned by myself and all of my friends.

We love stats that update for each page view rather than every N minutes. We want to show “my team’s ranking” to every person on each page view, but this query is rather intensive when you have a lot of people covering a lot of teams and they are each answering a trivia question every 2 or 3 seconds.

To make it more difficult everyone wants all high point earners to be on their team, so this creates i/o hotspots in the database. If I am the top point earner and 1000 people have me marked as their friend, this means I potentially have 1000+ people that are trying to read my point sum at a very high frequency.

One option is to just allow the readers to join the friends table and the points table, but this is a big operation and encounters a lot of locking as the points are continually updating.

Another option is to have an indexed view (or aggregation table maintained by triggers etc) that maintains the point sum on a per user basis. I have a lot less reads per query then, but I still have a lot of contention from locking.

To go down to even fewer pages read I can make a TeamPoints indexed view, this makes it a simple lookup to get my team’s points and compare it to everyone elses, but this has tons of contention as well. The root issue with this approach is when *I* earn one point the query updates N rows in the TeamPoints table/view, where N is the number of users that have me listed as a friend.

After thinking on the issue a bit it occurred to me that I could perhaps maintain an aggregate roll up table of each team’s points. To alleviate the locking contention I tested some scenarios of index / base table locking and came to similar conclusions as the blog entry Using Indexes to Bypass Locks.

I as a user want to see my own team’s points increment with each answer I submit (each page view). The solution for this is the aggregation table maintains the point sum for all of my friends but not my own points. I get my non-realtime friend’s points from a covering index, and then sum in my own realtime points. So I see my team’s points increase with every page view but my friend’s points are only summed in every N interval by a background process.

I posted the following example code in a forum response to the above blog entry.

I think the blog article fails to bring up the important point that the update *will* take a lock on the index if the update changes any of the columns that the index covers. Here is an example that demonstrates this:

create table t1 (id int, lastAgg int, pending int, currentAgg as lastAgg+pending)
insert t1 values (1, 20, 2)
insert t1 values (2, 30, 3)
insert t1 values (3, 40, 4)
--create clustered index t1_cidx_id_agg on t1(id)
create unique nonclustered index t1_idx_id_lastAgg on t1(id) include (lastAgg)
/*in session 1 update a column not in the covering index*/
begin tran
update t1 set pending = 1 where id = 2;

/*in session 2 run the following, using index hint since table and data is compact so query plan may otherwise use the clustered index and invalidate the test*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: not blocked by session 1*/

/*rollback or commit the previous session before starting the next test*/
/*in session 1 update the column that is covered by the index*/
begin tran
    update t1 set lastAgg = 21 where id = 2;
/*in session 2 run the following*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: session 2 is blocked by session 1*/

This table/index pattern can be used to implement a deferred update aggregation table. All of the readers that can afford a time lag query the lastAgg column. All readers that require up to date info query the currentAgg column and take the hit that they will be blocked by concurrent writers. All of the updates write to the pending column. A scheduled task or other background process occassionally goes through the table and does: set lastAgg=lastAgg+pending, pending=0, dirty=0.

Side note 1: Note that it does not matter if the base table has a clustered index or is a heap. 

Side note 2: I don’t know if this always holds true, but I observe that if your query does an update/set on a column that is covered by the index but does *not* actually change the value (i.e. set to value 3 and current value is already 3), then the index is not locked.

Appropriate Use of Indexes FTW! (for the win)


One Response to “Sql deferred aggregation table pattern”

  1. Tal Olier Says:

    Nice one 😉
    My comments can be found at the article follow up blog post at: http://blogs.mssqltips.com/forums/t/581.aspx?View=Flat

    “Using Indexes to Bypass Locks” is an article/tip and not a blog post.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: