Automatic creation of xml schema (xsd) for a sql server table

March 26, 2010

At times I need to create an xml schema for a table in the db, such as when exposing an xml interface sproc. I usually lean on the built-in functionality of sql server for generating these; check out the following code.

–Create an example table

create table dbo.Person (PersonId int identity primary key, FirstName nvarchar(100) not null, LastName nvarchar(100) not null, Token uniqueidentifier)

–Generate an xsd under the namespace of myTargetNamespace. If you don’t specify a namespace, you will get an autonomically increasing namespace. I find this of little value, so I prefer to choose something specific.

select top 0 * from dbo.Person as Person for xml auto, xmlschema('myTargetNamespace')

–Or you can add the elements directive, if you prefer element rather than attribute centric.

select top 0 * from dbo.Person as Person for xml auto, elements, xmlschema('myTargetNamespace')

–Perhaps you would prefer that the schema was under no namespace? You can retrieve this schema to an xml variable, and then use xml dml to remove the targetNamespace attribute.

declare @myXsd xml
set @myXsd = (select top 0 * from dbo.Person as Person for xml auto, elements, xmlschema('myTargetNamespace'))
set @myXsd.modify('delete /xs:schema[1]/@targetNamespace')
select @myXsd

–ok, now the schema is looking pretty good. Let’s pop it into a schema collection (we could do so with the @myXsd above directly, but I’ll put the text here so you can see it)

create xml schema collection personXsd  as '<xsd:schema xmlns:schema="myTargetNamespace" xmlns:xsd="<a href=""></a>" xmlns:sqltypes="<a href=""></a>" elementFormDefault="qualified">
<xsd:import namespace="<a href=""></a>" schemaLocation="<a href=""></a>" />
<xsd:element name="Person">
<xsd:element name="PersonId" type="sqltypes:int" />
<xsd:element name="FirstName">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="100" />
<xsd:element name="LastName">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="100" />
<xsd:element name="Token" type="sqltypes:uniqueidentifier" minOccurs="0" />

–and here is an example xml instance

declare @person xml(personXsd)
set @person = '<Person>

MSSQL: specifying targetNamespace with FOR XML, XMLSCHEMA

January 23, 2010

You have probably found that although you can specify the namespace to use for serializing to xml with FOR XML, there is no option for specifying the namespace of a generated xmlschema. Instead you get an autoincrementing uri for the users session, e.g. urn:schemas-microsoft-com:sql:SqlRowSet7

You can however assign that generated schema to an xml variable, and then simply use an xml dml to replace the value of the targetNamespace attribute. Here is a simple example, enjoy:

declare @x xml, @ns varchar(1024)
set @ns=’newNamespace’
set @x = (
 select *
 from (
  select top 0 1 as intId
 ) as t for xml auto, xmlschema
set @x.modify(‘declare namespace xsd = ““;
replace value of /xsd:schema[1]/@targetNamespace with sql:variable(“@ns”)’)
select @x

Using SQL Server indexed views for time averaged aggregations

June 16, 2009

I found many website games such as quiz systems have a need to report “most popular in last 24 hours” or other such time aggregations. For these high ratios of reads / writes it is usually a good trade off to create an indexed view to maintain some aggregations over time. Sometimes the time slice can be midnight-based 24 hour periods, in other circumstances you need greater granularity.

Here is a simple example of maintaining an aggregated view that is addressable by a given day a minute.

create table timeSeries (
timeSeriesId int identity primary key clustered
,updateDate datetime not null
,payload float not null

insert timeSeries values (‘2009-06-16 12:00:00’, rand())
insert timeSeries values (‘2009-06-16 12:00:59’, rand())
insert timeSeries values (‘2009-06-16 12:01:00’, rand())
insert timeSeries values (‘2009-06-16 12:59:00’, rand())
insert timeSeries values (‘2009-06-16 01:00:00’, rand())
insert timeSeries values (‘2009-06-16 1:30:00’, rand())
insert timeSeries values (‘2009-06-16 23:59:00’, rand())
insert timeSeries values (‘2009-06-17 00:01:00’, rand())
insert timeSeries values (‘2009-06-17 00:01:30’, rand())

create view timeSeriesByMinute_IV with schemabinding as
dayBucket = datediff(day, 0, updateDate)
,minuteBucket = datediff(minute, 0, (updateDate – datediff(day, 0, updateDate)))
,payloadSum = sum(payLoad)
,numRows = count_big(*)
from dbo.timeSeries
group by
datediff(day, 0, updateDate)
,datediff(minute, 0, (updateDate – datediff(day, 0, updateDate)))

create unique clustered index CU_timeSeriesByMinute_IV on timeSeriesByMinute_IV (dayBucket, minuteBucket)

create view timeSeriesByMinute as
,payloadAvg = payloadSum / numRows
from dbo.timeSeriesByMinute_IV with (noexpand)

declare @timeLookup datetime, @dayBucket int, @minuteBucket int
@timeLookup = ‘2009-06-16 12:00:00’
,@dayBucket = datediff(day, 0, @timeLookup)
,@minuteBucket = datediff(minute, 0, (@timeLookup – datediff(day, 0, @timeLookup)))

select * from timeSeriesByMinute where dayBucket = @dayBucket and minuteBucket = @minuteBucket

If you have the Enterprise sku of SQL Server then the NOEXPAND hint is not required to force an indexed view to make use of its index. Unfortunately the Standard sku will not use the index by default and the NOEXPAND hint is required. To help other developers to never miss this required hint I always implement indexed views with the above base-view / wrapper-view approach. The wrapper view is the only one that folks reference so  it guarantees the NOEXPAND hint is applied.

Jump points to file system location in windows shell

January 28, 2009

I spend a ton of my time in the windows shell going back and forth between source trees, tons of notes, todo lists, etc. I got tired of typing in these same paths to CD to all the time so I created a simple pair of scripts to manage this for me.
A usage example is like this:
I CD to a directory I frequently want to use, such as d:\src\db_prj\branch7\tbl\constraint\pk. While in that directory, I run my script with something like “sd.cmd pk7”. This associates my arbitrary token “pk7” with this location by creating a batch file called pk7.cmd.
When I later want to jump that directory I run “gd.cmd pk7”, which invokes the pk7.cmd and puts me in this location. I don’t always remember my defined tokens so I run “ld.cmd” to get a listing of all tokens I have defined. Here are the contents of the scripts:
_begin sd.cmd_
@echo off
rem token is %1
rem path is %~p1

@if not exist "%~dp0\cd_store" md "%~dp0\cd_store"

echo @cd /d "%~dp1" > %~dp0\cd_store\%1.cmd

echo Stored "%~dp1" as token %1, CD to path with: gd %1

_end sd.cmd_

_begin gd.cmd_

@if exist "%~dp0\cd_store\%1.cmd" (@call %~dp0\cd_store\%1.cmd) else (@echo No location defined for token %1)

_end gd.cmd_

_begin ld.cmd_

@echo off

dir "%~dp0cd_store\*%1*cmd" /b

_end ld.cmd_

To use the scripts, just create a location like c:\cmd and place the files there. Then add c:\cmd to your path.

I think there is room for improvement in the token listing at some point. I was thinking of stuffing some metadata in each token.cmd file that could then be parsed out to generate the listing. The current solution is highly effective so I haven’t been motivated to tweak it further. 🙂

Parsing date / time in windows shell for log file timestamp

October 6, 2008

Quick post to share a simple solution I came up with to a problem I encountered this morning. A script I inherited needed to be tweaked so that in an error condition the log file is copied to an auditing location. Currently the filename of the log file is static, so this location would be overwritten each time the error condition occurs. Below is a simple solution to reference a filename with a timestamp for these kinds of issues.

It only has resolution to the minute, this is adequate for my needs. To get resolution to seconds or milliseconds you could pipe a CRLF to the ‘time’ command and redirect output to a text file. Then parse the text file instead of the time command directly.
FOR /f "usebackq tokens=2-4 delims=/ " %A in (`date /t`) do (

REM: Note: converting hours to military time
FOR /f "usebackq tokens=1-3 delims=: " %A in (`time /t`) do (
if "%C" equ "PM" (SET /A _NOW_HOUR = %_NOW_HOUR% + 12))


REM Reference the timestamp as follows:
copy MyLogFile.txt \\server\share\MyLogFile_%_NOW_DATETIME%.TXT

EDIT: ok, I have already come to a better solution 🙂

for /F "tokens=1,2,3,4,5,6,7* delims=.:^/^ " %i in ("%date%.%time%") do (
SET _Timestamp=%l%j%k%m%n%o
REM pad hour if single digit
if "%m" neq "" if %m LSS 10 SET _Timestamp=%l%j%k0%m%n%o)

Using SYNONYMS in SQL Server for cross database dependencies

September 10, 2008

Whenever possible, I prefer to keep the dependencies for production code minimized and well organized.
I am very fond of using synonyms in SQL Server 2005 to manage all cross database references, this makes it easy both to enumerate and/or change all pointers to a given database.
Here is and example of creating two databases and defining some synonyms from one to the other.

create database orders
create database products

use products
create schema Catalog
create table Catalog.Offerings
ProductId int identity primary key
,ProductTitle nvarchar(128) not null
create table Catalog.NewOfferings
ProductId int references Catalog.Offerings(ProductId)
,AvailableDate datetime not null
create table Catalog.BestSellers
ProductId int references Catalog.Offerings(ProductId)
,PopularityRank smallint not null constraint bestSellers_uq_popularityRank unique

insert Catalog.offerings (productTitle) values ('product 1')
insert Catalog.offerings (productTitle) values ('product 2')
insert Catalog.offerings (productTitle) values ('new product 3')
insert Catalog.newofferings (productId, availableDate) values (3, dateadd(day, 1, getutcdate()))
insert Catalog.bestsellers (productId, popularityRank) values (2,1)
insert Catalog.bestsellers (productId, popularityRank) values (1,2)

use orders
create schema ProductsCatalog
create synonym ProductsCatalog.NewOfferings for Products.Catalog.NewOfferings
create synonym ProductsCatalog.Offerings for Products.Catalog.Offerings
select ofr.productId, ofr.productTitle, newofr.availableDate
from ProductsCatalog.NewOfferings as newofr
join ProductsCatalog.Offerings as ofr
on newofr.productId = ofr.productId

Now we will assume that a business requirement has appeared that necessitates updating all existing references to Products to now use the new Products09 database. We can script out all synonyms in the orders database with SQL management studio like:
Tasks=>Generate Scripts=>Choose ‘Orders’ db=>choose synonyms object types=>select all

Then we can update these synonym definitions to point to the new Products09 database in one transaction:
DROP SYNONYM [ProductsCatalog].[NewOfferings];
DROP SYNONYM [ProductsCatalog].[Offerings];

CREATE SYNONYM [ProductsCatalog].[NewOfferings] FOR [Products09].[Catalog].[NewOfferings];
CREATE SYNONYM [ProductsCatalog].[Offerings] FOR [Products09].[Catalog].[Offerings];

You can simulate moving to the new Products09 database by renaming the existing Products DB to Products09. Running the above select query will show that the synonyms are happily referring to the new database and returning the same results as before.

Manage foam when brewing with Aerospress

May 2, 2008

I love my Aeropress. LOVE. I used to drink my coffee black when brewing with a drip brewer, but now in comparison it tastes bad and requires milk and splenda. But I love the Aeropress coffee straight in its concentrated form (pseudo espresso). My drink of choice used to be straight espresso, but I find now I prefer the smoother (yet still as strong as a rhino) brew of the Aeropress.

It was driving me crazy that there was always a huge head of foam when brewing coffee with this method. The vessel has a pretty low volume to start with so I find it very limiting to not be able to fill the thing with coffee and water.

After adding the coffee grounds add just enough hot water to wet the grounds. Stir it around for a momen (maybe 10 seconds). During this time the grounds all become wet and degass. Since they are not submerged in water the gas can freely escape. Next fill with water as normal and enjoy your non frothy brew!

Sql deferred aggregation table pattern

May 2, 2008

I encountered a business requirement to track a teams points in a trivia contest on a social networking site. A team is defined as everyone who accepts your friend request, so an individual can have hundreds/thousands friends (no explicit limit). Each person’s has an implied team that encompasses all of their friends, so the tally of my team’s points is the sum of all points earned by myself and all of my friends.

We love stats that update for each page view rather than every N minutes. We want to show “my team’s ranking” to every person on each page view, but this query is rather intensive when you have a lot of people covering a lot of teams and they are each answering a trivia question every 2 or 3 seconds.

To make it more difficult everyone wants all high point earners to be on their team, so this creates i/o hotspots in the database. If I am the top point earner and 1000 people have me marked as their friend, this means I potentially have 1000+ people that are trying to read my point sum at a very high frequency.

One option is to just allow the readers to join the friends table and the points table, but this is a big operation and encounters a lot of locking as the points are continually updating.

Another option is to have an indexed view (or aggregation table maintained by triggers etc) that maintains the point sum on a per user basis. I have a lot less reads per query then, but I still have a lot of contention from locking.

To go down to even fewer pages read I can make a TeamPoints indexed view, this makes it a simple lookup to get my team’s points and compare it to everyone elses, but this has tons of contention as well. The root issue with this approach is when *I* earn one point the query updates N rows in the TeamPoints table/view, where N is the number of users that have me listed as a friend.

After thinking on the issue a bit it occurred to me that I could perhaps maintain an aggregate roll up table of each team’s points. To alleviate the locking contention I tested some scenarios of index / base table locking and came to similar conclusions as the blog entry Using Indexes to Bypass Locks.

I as a user want to see my own team’s points increment with each answer I submit (each page view). The solution for this is the aggregation table maintains the point sum for all of my friends but not my own points. I get my non-realtime friend’s points from a covering index, and then sum in my own realtime points. So I see my team’s points increase with every page view but my friend’s points are only summed in every N interval by a background process.

I posted the following example code in a forum response to the above blog entry.

I think the blog article fails to bring up the important point that the update *will* take a lock on the index if the update changes any of the columns that the index covers. Here is an example that demonstrates this:

create table t1 (id int, lastAgg int, pending int, currentAgg as lastAgg+pending)
insert t1 values (1, 20, 2)
insert t1 values (2, 30, 3)
insert t1 values (3, 40, 4)
--create clustered index t1_cidx_id_agg on t1(id)
create unique nonclustered index t1_idx_id_lastAgg on t1(id) include (lastAgg)
/*in session 1 update a column not in the covering index*/
begin tran
update t1 set pending = 1 where id = 2;

/*in session 2 run the following, using index hint since table and data is compact so query plan may otherwise use the clustered index and invalidate the test*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: not blocked by session 1*/

/*rollback or commit the previous session before starting the next test*/
/*in session 1 update the column that is covered by the index*/
begin tran
    update t1 set lastAgg = 21 where id = 2;
/*in session 2 run the following*/
select lastAgg from t1 with (index = t1_idx_id_lastAgg) where id = 2
/*result: session 2 is blocked by session 1*/

This table/index pattern can be used to implement a deferred update aggregation table. All of the readers that can afford a time lag query the lastAgg column. All readers that require up to date info query the currentAgg column and take the hit that they will be blocked by concurrent writers. All of the updates write to the pending column. A scheduled task or other background process occassionally goes through the table and does: set lastAgg=lastAgg+pending, pending=0, dirty=0.

Side note 1: Note that it does not matter if the base table has a clustered index or is a heap. 

Side note 2: I don’t know if this always holds true, but I observe that if your query does an update/set on a column that is covered by the index but does *not* actually change the value (i.e. set to value 3 and current value is already 3), then the index is not locked.

Appropriate Use of Indexes FTW! (for the win)

Invoke executable from powershell at low (idle) priority

March 11, 2008

Howdy, this blog is primarily to store tech goodies that I wish I had easily found on someone else’s blog, but had to invent or piece together myself.

Today’s nugget – launching an arbitrary .exe at low thread priority from powershell. I’m used to doing this with ‘start’ from a cmd shell, so I was frustrated at first that I didn’t see a way to do this with the “&” method of invoking an exe. Googled around and found others wanting something similar to start for PS. Then I remembered – *light bulb* – I’m in the CLR, enjoy the freedom and power 🙂

So here is my simple function that uses System.Diagnostics.Process to launch 7z.exe to compress a file at low priority.

function CopyTo7zArchive([System.IO.FileInfo] $inFile){
$archiveName = $inFile.FullName.Remove($inFile.FullName.Length – $inFile.Extension.Length) + “.7z”;

#If source and target both exist, then skip
if ((Test-Path $archiveNAme) -and (Test-path $inFIle.FullName)) {Write-Verbose(“7z archive already present, skipping file”); $true}
$success = $false;

$proc = New-Object System.Diagnostics.Process;
$proc.StartInfo.FileName = “c:\program files\7-zip\7z.exe”;
$proc.StartInfo.Arguments = (“a “”$archiveName”” “”” + $inFile.FullName + “”””);

$proc.StartInfo.UseShellExecute = $false;
$proc.StartInfo.RedirectStandardOutput = $true;
$proc.PriorityClass = [System.Diagnostics.ProcessPriorityClass]::Idle;
$procOutput = $proc.StandardOutput.ReadToEnd();
if ($procOutput -match “Everything is Ok” ) {$success = $true};

if ($success -eq $true) {
Write-Verbose (“Compressed source file ” + $inFile.FullName + ” to target of ” + $archiveName);
#return success
Write-Verbose (“ERROR: 7Z did not report success while compressing ” + $inFile.FullName + ” to target of ” + $archiveName + “. Deleting suspect archive file and dumping 7z output:”);
if ((Test-Path $archiveName)) {del $archiveName;}
Write-Verbose $procOutput;

$false; }


Note that if you don’t want to consume the output you can still easily wait for the .exe to exit by calling WaitForExit() on the Process object.