By now you should be familiar with how to work with Sitecore data via the various APIs, be it Sitecore query, fast query, or via Lucene. However, there are times that it helps to have some knowledge of what is actually going on in the underlying Database structure. Specifically, working via the API can be extremely slow, especially in dev environments, where every time you want to make a change to a query, it means restarting your Sitecore app, which can take some time.
In contrast, a DB query can be quickly changed and re-run. When figuring out how to retrieve data, or even more so when debugging an issue, working directly with the DB can be dramatically faster.
The Sitecore Database structure isn’t particularly complicated, so building your own queries isn’t too difficult.
Commonly used Sitecore DB tables
In master/web/core, you can find the following tables (among others) :
Items : Stores all basic Item data. Name, ID, TemplateID, parentID, created/updated timestamps.
History : Shows create/delete/saved history. Comparing this from master to web can help understand when items were published.
VersionedFields/UnversionedFields/SharedFields : Stores basic field data. ItemID, FieldID (which links to Items), Value. Note that the field name exists in the linked Item representing the field, while the value of the field is in these tables.
There’s also a view named “Fields” which is merely a union of these 3 tables.
So, there’s a few uses for this information, and most of the time it comes down to doing things faster directly with the DB, than they would happen via the Sitecore API.
Direct query vs Fast query
For example, you could find items of a specific template, with a specific value contained within the contents of a specific field (very helpful for finding an id in a pipe-delimited list of item ids in a Multilist field, for example) :
select VF.Value from Items I
inner join Items T on I.TemplateID = T.ID
inner join Fields F on F.ItemId = I.ID
inner join Items F on F.FieldId = F.ID
where I.templateID = @templateID
and F.Name = @fieldName
and F.Value like '%' + @fieldValue + '%'
I’ve found that if data isn’t cached, which it usually isn’t in a dev environment, querying the database directly like this is dramatically faster.
In a live environment, I’ve found that a direct query is dramatically faster than a “fast” query.
For example, if I wanted to do a fast query for the case detailed above, for all items under the home item, with template [[templatename]], with [[fieldvalue]] anywhere in a field called [[fieldname]], this is the query that Sitecore runs on your database :
exec sp_executesql N'SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] LEFT OUTER JOIN (SELECT [Fields].* from [Fields] INNER JOIN [Items] ON [Fields].[FieldID] = [Items].[ID] AND lower([Items].[Name]) = ''[[fieldname]]'') [Fields2] ON [i].[ID] = [Fields2].[ItemId] INNER JOIN [Descendants] ON [i].[ID] = [Descendants].[Descendant] INNER JOIN (SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] INNER JOIN (SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] INNER JOIN (SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] WHERE LOWER([i].[Name]) = ''sitecore'' AND [i].[ParentID] = @value1) [a] ON [i].[ParentID] = [a].[ID] WHERE LOWER([i].[Name]) = ''content'') [a] ON [i].[ParentID] = [a].[ID] WHERE LOWER([i].[Name]) = ''home'') [a] ON [Descendants].[Ancestor] = [a].[ID], (SELECT [ID] FROM [Items] WHERE [TemplateID] = @value3 AND LOWER([Name]) LIKE @value2) [Templates1] WHERE (([i].[TemplateID] LIKE [Templates1].[ID]) and (coalesce([Fields2].[Value], '''') LIKE @value4))',N'@value1 uniqueidentifier,@value2 nvarchar(11),@value3 uniqueidentifier,@value4 nvarchar(38)',@value1='00000000-0000-0000-0000-000000000000',@value2=N'[[templatename’]]',@value3='templateid',@value4=N'%[[fieldvalue]]%'
Just at first glance, you can tell that’s a bad idea. Well, what if I told you the “fast” query takes 18 seconds to run, but the direct query takes just 500ms?
But what about Lucene?
Also, bear in mind that wildcard searches in Lucene can be extremely expensive. So I’ve used this example deliberately. This sort of search, via Lucene, will cripple a content delivery server’s CPU.
Hopefully this post gives you all something to think about. Don’t blindly trust the Data Access Layer of any framework, even Sitecore. Always be sure of what your code is actually doing on a database level, and if necessary, override the behaviour so the results and performance are more in line with what you expect.
I’ve found a combination of Lucene, and direct DB query, to be the best way to balance out the load so that no single server gets overloaded and crashes.