Static Data with Multiple Database Tenants

The loan origination system my team and I are responsible for have 20 tables marked as static data in Redgate's SQL Source Control. One example of static data is a tile type. The loan origination system has a dashboard made up of multiple tiles. There are six different types of tiles a user can have on their dashboard. Marking that as static data really helps our automated deployments because we know that table will always be populated, and populated correctly, as the database change moves through the system.

At the start of the project, this worked great. But as with anything in software development, nothing stays static. We are making a big push towards making our applications multi-tenant, meaning other associations will be able to use our loan origination system for themselves. We elected to do this as a PAAS rather than sell them the source code and say "good luck."

Multi-Tenant Database Approaches

When it comes to the database, there are three typical ways to approach a multi-tenant system.

  1. Same database, with each tenant getting their own schema. Each schema has the same table structure.
  2. Same database, same schema, and affix a "TenantId" onto each table.
  3. Separate database for each tenant. The schema structure is the exact same, only the data is different.

Each approach has their own pros and cons. The first two approaches are a little easier on the database side but make it much harder on the code side. The code has to have an intelligent way to switch schemas or it has to write dynamic SQL for the first approach, while the second approach requires a TenantId be added to every single insert, update, delete and select statement. The final approach, which is the one we ended up adopting, is only difficult if database deployments are not automated. The code only has to have a mechanism to switch connection strings, which can be easily centralized.

Multi-Tenant Static Data Problem

Our database process was fairly mature when my team started working on the second tenant of the loan origination system. The first tenant was Farm Credit Services of America. The second tenant is another association. At first, there were no problems with static data because we were focused on the bigger issues, for example, getting services from each association to talk to one another. Then the first story came in where the other association wanted a block of text to say something different than Farm Credit Services of America. And that text happened to be stored in a table in the database. And that table happened to be marked as static data.

The first solution to this problem was to remove that static data from source control. From then on out any changes would require us to write a migration script or some sort of manual script and push it out through the environments.

The solution of removing it from static data made us take a giant step back in automated database deployments.

A lot of developers did not follow the process of creating a migration script and instead wrote an update statement, ran it on all the environments needed and called it good. A fun side effect was developers couldn't run that update statement in all the needed environments, sometimes they needed a DBA. One day my team submitted so many scripts to a DBA to run that by the end of the day the DBA yelled across the room at me "Bob?!?!?! How many more of these things are you going to make me run?!?!"

The control of the data was slowly slipping away from us. This was also decreasing our confidence of a successful production release. This solution was not acceptable.

The Solution

In a nutshell, we augmented the static data process to handle multi-tenant.

When we first got started we had to think through exactly what static data does. It uses the SQL Data Compare engine to force all the data to be exactly the same. In a nutshell that engine does the following:

  1. Set Identity Insert On
  2. Insert any missing rows found in the source but not in the destination
  3. Set Identity Insert Off
  4. Update any rows on the destination not matching the source
  5. Delete any rows found in the destination, not in the source

But that same thing could be done in a stored procedure using identity insert, a merge statement, and a delete statement.

For this example let's use the tile type table.

  • dbo.TileType
    • TileTypeId int primary key identity(1,1)
    • Name varchar(100) not null

We needed some way to get data into that table during the automated database deployment process. One of the first solutions was to add a TenantId to that table. So it would look like this:

  • dbo.TileType
    • TileTypeId int primary key identity(1,1)
    • Name varchar(100) not null
    • TenantId int not null

The issue was any developer who wrote code or a view or a stored proc which selected data from that table would have to remember to include TenantId in the where clause. But what if we could make use of that idea, but only have to have remember to include the TenantId in the where clause once?

What we landed on was to create a schema called "DefaultData" and create tables that would be marked as static data there. Those tables would be the ones with the TenantId, while the tables in the dbo schema would remain unchanged.

  • DefaultData.TileType

    • TileTypeId int primary key identity(1,1)
    • Name varchar(100) not null
    • TenantId int not null
  • dbo.TileType

    • TileTypeId int primary key identity(1,1)
    • Name varchar(100) not null

The tenant name is in the database name. And all the tenants are stored in a table. We then wrote an SQL function to determine the TenantId based on the database name.

CREATE FUNCTION [dbo].[GetTenantIdFromDatabaseName]()
RETURNS int
AS
BEGIN
DECLARE @TenantId INT

SELECT TOP 1 @TenantId = TenantId FROM dbo.Tenant WHERE CHARINDEX(TenantName, CAST(DB_NAME() AS VARCHAR(50)), 0) > 0

IF @TenantId IS NULL
BEGIN
	SET @TenantId = 1
END

RETURN @TenantId
END

Next up was writing a stored procedure to do the following:

  1. Turn Identity Insert On
  2. Run a Merge statement
  3. Turn Identity Insert Off
  4. Run a delete statement to remove any rows not found in the source
CREATE PROCEDURE [DefaultData].[MergeTileType]

AS
BEGIN
	SET IDENTITY_INSERT dbo.TileType ON

	MERGE dbo.TileType AS Target
		USING (SELECT * FROM DefaultData.TileType WHERE TenantId = [dbo].[GetTenantIdFromDatabaseName] ()) AS Source
		ON Target.TileTypeId = Source.TileTypeId
	WHEN MATCHED THEN
		UPDATE SET Target.Name = Source.Name
	WHEN NOT MATCHED Then
		INSERT (TileTypeId, Name)
		VALUES (Source.TileTypeId, Source.Name);

	SET IDENTITY_INSERT dbo.TileType OFF

	DELETE FROM dbo.TileType
	WHERE TileTypeId NOT IN (SELECT TileTypeId FROM DefaultData.TileType WHERE TenantId = [dbo].[GetTenantIdFromDatabaseName] ())
END
GO

A control stored procedure was needed to ensure these sprocs were run in the right order.

CREATE PROCEDURE [DefaultData].[InsertDefaultData]
AS
BEGIN	
	EXEC DefaultData.MergeTileType;
	EXEC DefaultData.MergeSomethingElse;
	EXEC DefaultData.MergeMoreData;
END
GO

All of that is checked into source control. All of our deployments are handled via Octopus Deploy. Octopus Deploy allows us to run PowerShell scripts. We then created a small PowerShell script to execute that stored procedure using the Invoke-SQLCmd cmdlet. That PowerShell script was then added to the deployment process in Octopus Deploy.

Invoke-Sqlcmd -Query "Exec DefaultData.InsertDefaultData;" -ServerInstance "MyComputer\MyInstance" 

Configuration

There are few caveats to this process. The service account being used by Octopus Deploy must have permissions to run that stored procedure. But chances it is the same service account doing the actual deployments, and they have fairly extensible rights to begin with (DDL_Admin, DataWriter, DataReader, etc.)

After piloting this approach we also ended up making this a step template in Octopus Deploy to make it easier to share between projects and teams.

Finally, chances are there are foreign key constraints pointing to the dbo version of the table. The last thing we wanted to do is break those constraints. So what we did was make the TileTypeId and the TenantId the primary key. That way we could have duplicate TileTypeIds on the initial load. Then we set the TileTypeId as identity field and set the seed value to 100,000. Here is the process we used to create a table in the DefaultData schema.

  1. Create the table, no identity or primary keys. Make the TenantId nullable.
  2. Insert data from Tenant A using SQL Data Compare. Because the source data does not have a TenantId, that will be inserted as null.
  3. Run an update statement setting any null TenantId to the id of Tenant A.
  4. Insert data from Tenant B using SQL Data Compare.
  5. Run an update statement setting any null TenantId to the id of Tenant B.
  6. Create the primary key of TileTypeId and TenantId. This will make TenantId nonnullable.
  7. Set TileTypeId as a identity column, set the seed value to 100,000.

Conclusion

There are roughly 20 tables in our database that now use this process. It took two developers a day or so to get all the tables moved over. It took longer to get the data into the tables than it was to setup the process. But it was all worth it, one day after we moved this over we had to change the data again for the second tenant in four of those tables. I changed the data locally and let the automated database deployment process handle the rest. It was a lot easier and I didn't have to involve the DBAs which made them much happier.

It is surprising to me that even after two years the Automated Database Deployment process is never static, it has to be constantly evolving. I suppose this is true with any software development process. As the driving force at Farm Credit Services of America, I have really had an in-depth look at how a process evolves as we roll it out to various teams and new questions come up.