We have a client that uses multi-tenant database where each database holds data for each of their customers. Whenever a new customer is added, a service dynamically creates a new database. In order to seed this new database we were tasked to implement a feature to copy data from existing “demo” database.

The “demo” database is actually a live client where sales team does demo. This ensures that the data that is copied is fresh and not stale.

We implemented a solution where we simply listed all the tables in namespace and used activerecord-import to copy the table data. We used activerecord-import gem to keep code agnostic of underlying database as we used different databases in development from production. Production is “SQL Server” and development database is “PostgreSQL”. Why this project ended up having different database in development and in production is worthy of a separate blog.

When we started using the above mentioned strategy then we quickly ran into a problem. Inserts for some tables were failing.

insert or update on table "dependent_table" violates foreign key constraint "fk_rails"
Detail: Key (column)=(1) is not present in table "main_table".

The issue was we had foreign key constraints on some tables and “dependent” table was being processed before the “main” table.

So initially we thought of simply hard coding the sequence in which to process the tables. It means if any new table is added then we will have to update the service to include the newly added table. So we needed a way to identify the foreign key dependencies and determine the sequence to copy the tables at runtime. To resolve this issue, we thought of using Topological Sorting.

Topological Sorting

To get started we need the list of dependencies of “main” and “dependent” tables. In Postgresql, this sql query fetches the table dependencies.

SELECT
    tc.table_name AS dependent_table,
    ccu.table_name AS main_table
FROM
    information_schema.table_constraints AS tc
    JOIN information_schema.key_column_usage AS kcu
      ON tc.constraint_name = kcu.constraint_name
      AND tc.table_schema = kcu.table_schema
    JOIN information_schema.constraint_column_usage AS ccu
      ON ccu.constraint_name = tc.constraint_name
      AND ccu.table_schema = tc.table_schema
WHERE constraint_type = 'FOREIGN KEY'
and (tc.table_name like 'namespace_%' or ccu.table_name like 'namespace_%');

=> dependent_table  | main_table
-----------------------------------
   dependent_table1 | main_table1
   dependent_table2 | main_table2

The above query fetches all the dependencies for only the tables have namespace or the tables we are interested in. The output of above query was [[dependent_table1, main_table1], [dependent_table2, main_table2]].

Ruby has a TSort module that for implementing topological sorts. So we needed to run the topological sort on the dependencies. So we inserted the dependencies into a hash and included the TSort functionality into the hash. Following is the way to include the TSort module into hash by subclassing the Hash.

require "tsort"

class TsortableHash < Hash
  include TSort

  alias tsort_each_node each_key

  def tsort_each_child(node, &block)
    fetch(node).each(&block)
  end
end
# Borrowed from https://www.viget.com/articles/dependency-sorting-in-ruby-with-tsort/

Then we simply added all the tables to dependency hash, as below

tables_to_sort = ["dependent_table1", "dependent_table2", "main_table1"]
dependency_graph = tables_to_sort.inject(TsortableHash.new) {|hash, table| hash[table] = []; hash }

table_dependency_map = fetch_table_dependencies_from_database
=> [["dependent_table1", "main_table1"], ["dependent_table2", "main_table2"]]

# Add missing tables to dependency graph
table_dependency_map.flatten.each {|table| dependency_graph[table] ||= [] }

table_dependency_map.each {|constraint| dependency_graph[constraint[0]] << constraint[1] }

dependency_graph.tsort
=> ["main_table1", "dependent_table1", "main_table2", "dependent_table2"]

The output above, is the dependency resolved sequence of tables.

Topological sorting is pretty useful in situations where we need to resolve dependencies and Ruby provides a really helpful tool TSort to implement it without going into implementation details. Although I did spend time in understanding the underlying algorithm for fun.