Archive for May, 2008

Manage foam when brewing with Aerospress

May 2, 2008

I love my Aeropress. LOVE. I used to drink my coffee black when brewing with a drip brewer, but now in comparison it tastes bad and requires milk and splenda. But I love the Aeropress coffee straight in its concentrated form (pseudo espresso). My drink of choice used to be straight espresso, but I find now I prefer the smoother (yet still as strong as a rhino) brew of the Aeropress.

It was driving me crazy that there was always a huge head of foam when brewing coffee with this method. The vessel has a pretty low volume to start with so I find it very limiting to not be able to fill the thing with coffee and water.

After adding the coffee grounds add just enough hot water to wet the grounds. Stir it around for a momen (maybe 10 seconds). During this time the grounds all become wet and degass. Since they are not submerged in water the gas can freely escape. Next fill with water as normal and enjoy your non frothy brew!

Sql deferred aggregation table pattern

May 2, 2008

I encountered a business requirement to track a teams points in a trivia contest on a social networking site. A team is defined as everyone who accepts your friend request, so an individual can have hundreds/thousands friends (no explicit limit). Each person’s has an implied team that encompasses all of their friends, so the tally of my team’s points is the sum of all points earned by myself and all of my friends.

We love stats that update for each page view rather than every N minutes. We want to show “my team’s ranking” to every person on each page view, but this query is rather intensive when you have a lot of people covering a lot of teams and they are each answering a trivia question every 2 or 3 seconds.

To make it more difficult everyone wants all high point earners to be on their team, so this creates i/o hotspots in the database. If I am the top point earner and 1000 people have me marked as their friend, this means I potentially have 1000+ people that are trying to read my point sum at a very high frequency.

One option is to just allow the readers to join the friends table and the points table, but this is a big operation and encounters a lot of locking as the points are continually updating.

Another option is to have an indexed view (or aggregation table maintained by triggers etc) that maintains the point sum on a per user basis. I have a lot less reads per query then, but I still have a lot of contention from locking.

To go down to even fewer pages read I can make a TeamPoints indexed view, this makes it a simple lookup to get my team’s points and compare it to everyone elses, but this has tons of contention as well. The root issue with this approach is when *I* earn one point the query updates N rows in the TeamPoints table/view, where N is the number of users that have me listed as a friend.

After thinking on the issue a bit it occurred to me that I could perhaps maintain an aggregate roll up table of each team’s points. To alleviate the locking contention I tested some scenarios of index / base table locking and came to similar conclusions as the blog entry Using Indexes to Bypass Locks.

I as a user want to see my own team’s points increment with each answer I submit (each page view). The solution for this is the aggregation table maintains the point sum for all of my friends but not my own points. I get my non-realtime friend’s points from a covering index, and then sum in my own realtime points. So I see my team’s points increase with every page view but my friend’s points are only summed in every N interval by a background process.

I posted the following example code in a forum response to the above blog entry.

I think the blog article fails to bring up the important point that the update *will* take a lock on the index if the update changes any of the columns that the index covers. Here is an example that demonstrates this:

create table t1 (id int, lastAgg int, pending int, currentAgg as lastAgg+pending)
insert t1 values (1, 20, 2)
insert t1 values (2, 30, 3)
insert t1 values (3, 40, 4)
--create clustered index t1_cidx_id_agg on t1(id)
create unique nonclustered index t1_idx_id_lastAgg on t1(id) include (lastAgg)
/*in session 1 update a column not in the covering index*/
begin tran
update t1 set pending = 1 where id = 2;

/*in session 2 run the following, using index hint since table and data is compact so query plan may otherwise use the clustered index and invalidate the test*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: not blocked by session 1*/

/*rollback or commit the previous session before starting the next test*/
/*in session 1 update the column that is covered by the index*/
begin tran
    update t1 set lastAgg = 21 where id = 2;
/*in session 2 run the following*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: session 2 is blocked by session 1*/

This table/index pattern can be used to implement a deferred update aggregation table. All of the readers that can afford a time lag query the lastAgg column. All readers that require up to date info query the currentAgg column and take the hit that they will be blocked by concurrent writers. All of the updates write to the pending column. A scheduled task or other background process occassionally goes through the table and does: set lastAgg=lastAgg+pending, pending=0, dirty=0.

Side note 1: Note that it does not matter if the base table has a clustered index or is a heap. 

Side note 2: I don’t know if this always holds true, but I observe that if your query does an update/set on a column that is covered by the index but does *not* actually change the value (i.e. set to value 3 and current value is already 3), then the index is not locked.

Appropriate Use of Indexes FTW! (for the win)