Archive for the ‘SQL’ Category

Automatic creation of xml schema (xsd) for a sql server table

March 26, 2010

At times I need to create an xml schema for a table in the db, such as when exposing an xml interface sproc. I usually lean on the built-in functionality of sql server for generating these; check out the following code.

–Create an example table

create table dbo.Person (PersonId int identity primary key, FirstName nvarchar(100) not null, LastName nvarchar(100) not null, Token uniqueidentifier)

–Generate an xsd under the namespace of myTargetNamespace. If you don’t specify a namespace, you will get an autonomically increasing namespace. I find this of little value, so I prefer to choose something specific.

select top 0 * from dbo.Person as Person for xml auto, xmlschema('myTargetNamespace')

–Or you can add the elements directive, if you prefer element rather than attribute centric.

select top 0 * from dbo.Person as Person for xml auto, elements, xmlschema('myTargetNamespace')

–Perhaps you would prefer that the schema was under no namespace? You can retrieve this schema to an xml variable, and then use xml dml to remove the targetNamespace attribute.

declare @myXsd xml
set @myXsd = (select top 0 * from dbo.Person as Person for xml auto, elements, xmlschema('myTargetNamespace'))
set @myXsd.modify('delete /xs:schema[1]/@targetNamespace')
select @myXsd

–ok, now the schema is looking pretty good. Let’s pop it into a schema collection (we could do so with the @myXsd above directly, but I’ll put the text here so you can see it)

create xml schema collection personXsd  as '<xsd:schema xmlns:schema="myTargetNamespace" xmlns:xsd="<a href=""></a>" xmlns:sqltypes="<a href=""></a>" elementFormDefault="qualified">
<xsd:import namespace="<a href=""></a>" schemaLocation="<a href=""></a>" />
<xsd:element name="Person">
<xsd:element name="PersonId" type="sqltypes:int" />
<xsd:element name="FirstName">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="100" />
<xsd:element name="LastName">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="100" />
<xsd:element name="Token" type="sqltypes:uniqueidentifier" minOccurs="0" />

–and here is an example xml instance

declare @person xml(personXsd)
set @person = '<Person>

Sql deferred aggregation table pattern

May 2, 2008

I encountered a business requirement to track a teams points in a trivia contest on a social networking site. A team is defined as everyone who accepts your friend request, so an individual can have hundreds/thousands friends (no explicit limit). Each person’s has an implied team that encompasses all of their friends, so the tally of my team’s points is the sum of all points earned by myself and all of my friends.

We love stats that update for each page view rather than every N minutes. We want to show “my team’s ranking” to every person on each page view, but this query is rather intensive when you have a lot of people covering a lot of teams and they are each answering a trivia question every 2 or 3 seconds.

To make it more difficult everyone wants all high point earners to be on their team, so this creates i/o hotspots in the database. If I am the top point earner and 1000 people have me marked as their friend, this means I potentially have 1000+ people that are trying to read my point sum at a very high frequency.

One option is to just allow the readers to join the friends table and the points table, but this is a big operation and encounters a lot of locking as the points are continually updating.

Another option is to have an indexed view (or aggregation table maintained by triggers etc) that maintains the point sum on a per user basis. I have a lot less reads per query then, but I still have a lot of contention from locking.

To go down to even fewer pages read I can make a TeamPoints indexed view, this makes it a simple lookup to get my team’s points and compare it to everyone elses, but this has tons of contention as well. The root issue with this approach is when *I* earn one point the query updates N rows in the TeamPoints table/view, where N is the number of users that have me listed as a friend.

After thinking on the issue a bit it occurred to me that I could perhaps maintain an aggregate roll up table of each team’s points. To alleviate the locking contention I tested some scenarios of index / base table locking and came to similar conclusions as the blog entry Using Indexes to Bypass Locks.

I as a user want to see my own team’s points increment with each answer I submit (each page view). The solution for this is the aggregation table maintains the point sum for all of my friends but not my own points. I get my non-realtime friend’s points from a covering index, and then sum in my own realtime points. So I see my team’s points increase with every page view but my friend’s points are only summed in every N interval by a background process.

I posted the following example code in a forum response to the above blog entry.

I think the blog article fails to bring up the important point that the update *will* take a lock on the index if the update changes any of the columns that the index covers. Here is an example that demonstrates this:

create table t1 (id int, lastAgg int, pending int, currentAgg as lastAgg+pending)
insert t1 values (1, 20, 2)
insert t1 values (2, 30, 3)
insert t1 values (3, 40, 4)
--create clustered index t1_cidx_id_agg on t1(id)
create unique nonclustered index t1_idx_id_lastAgg on t1(id) include (lastAgg)
/*in session 1 update a column not in the covering index*/
begin tran
update t1 set pending = 1 where id = 2;

/*in session 2 run the following, using index hint since table and data is compact so query plan may otherwise use the clustered index and invalidate the test*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: not blocked by session 1*/

/*rollback or commit the previous session before starting the next test*/
/*in session 1 update the column that is covered by the index*/
begin tran
    update t1 set lastAgg = 21 where id = 2;
/*in session 2 run the following*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: session 2 is blocked by session 1*/

This table/index pattern can be used to implement a deferred update aggregation table. All of the readers that can afford a time lag query the lastAgg column. All readers that require up to date info query the currentAgg column and take the hit that they will be blocked by concurrent writers. All of the updates write to the pending column. A scheduled task or other background process occassionally goes through the table and does: set lastAgg=lastAgg+pending, pending=0, dirty=0.

Side note 1: Note that it does not matter if the base table has a clustered index or is a heap. 

Side note 2: I don’t know if this always holds true, but I observe that if your query does an update/set on a column that is covered by the index but does *not* actually change the value (i.e. set to value 3 and current value is already 3), then the index is not locked.

Appropriate Use of Indexes FTW! (for the win)