Schema management tips and FAQ’s

Unison Cloud's native storage layer lets you save arbitrary data in typed tables without SQL adapters or serialization formats; however, the freedom to save arbitrary types comes with some pitfalls. It's important to consider how you will manage your tables as your types evolve, since the data that you have already written to a table on the Cloud isn't linked to the same update process as the data in your UCM codebase. Here are some tips and answers to common questions about schema evolution.

How should I iterate on datatypes that will ultimately get saved to Cloud storage as a schema?

It’s pretty common to start saving a data type to a table and then realize it’s missing a field. The safest thing you can do is to use the Cloud.run.local or Cloud.main.local.interactive interpreters as you're developing your app. They provide a nice, ephemeral sandbox when iterating on your schemas.

As a quick and easy workaround, if your previously saved deployed data is not important, include a version number in the table name when you create it; then you can increment the table name as you make type changes. Unfortunately, the compiler will not stop you from updating data types after you’ve already written earlier versions into Cloud storage—fortunately, making new tables is cheap. 😉

How do I run a basic offline schema migration from one type to another type?

If you have a BTree and you want to do an offline migration, the process is simple:

  • Take down (undeploy) all the services that depend upon the database you intend to migrate.
  • Create a new BTree, making sure to change or increment the table name so it does not conflict with the old BTree.
  • Stream out the existing data from the old table with BTree.toStream writing it into the new table with BTree.write
  • Bring the dependent services back up, this time using your new table name.

What should a user do if they need a non-downtime migration?

We have a provisional design for online migrations of services, but zero-downtime migration is not something we support right now.

How do I run a schema migration from an older version of a type to a newer version of the type?

This is an unfortunate consequence of working with natively typed schemas, you really can’t upgrade from an old version of a type to its new version, since the old version of the type is no longer available. If you know that your domain models will change over time, consider writing them as unique types with a version indicator either in the namespace path or the type name itself, like type db.v1.User = User Text. When it comes time to add a new "version" of the type, rather than updating in place, you should create a new type with a different version number, type db.v2.User = User Text Text.

What happens if heterogeneous data types get written to a typed table by accident?

Alas, once you’ve written two versions of the same data type to a table, you can’t easily recover. Logic that tries to read a record from cloud data storage as type A_V2, fails gracefully to deserialize it, then tries to read the record as type A_V1 is not possible with our current reflection primitives. We’re aware that this is an annoying limitation and plan to change this in an upcoming release. For now, you should follow the tips in this doc to avoid this situation.

How should I clean up previous schema versions or data in old tables that can just be deleted?

If your data is in a BTree and you want to delete entities in your old tables, you can easily scan the table, get its keys, and delete items from there. We don’t currently have a function for BTree.drop - which would drop the entirety of a BTree Table’s data, so getting the keys with BTree.toStream.keys and deleting the old records with BTree.delete is your best option.

We do have a Database.delete function, akin to “dropping a database.” You can use that for a complete reset of all the tables associated with the database.

Other Cloud storage tips and caveats

Do not store credentials or other secret information in Cloud Tables.

Values like secret keys or other credentials should be stored in the Cloud’s secure Config ability.

How do I model relational data since BTree is a key-value store?

We wrote a separate tutorial and nice exercise for that. Check it out!

Prefer BTree over Table in your applications.

Though its name is deceptively simple, Table is more of a building block for making other Cloud storage types than a user-facing concept. This is because there’s no way to get the data from a table after you’ve written to it unless you’ve kept track of the keys in a separate data structure somewhere else. Compare this with the BTree cloud storage type, which provides functions like BTree.toStream.keys, for easy retrieval of table data. Unless you are building other Cloud data structures, you probably want BTree for your business logic.

Transactions should be small!

Transactions offer us the ability to keep data in sync across a database, but there are limitations to the amount of data that can be transactionally updated at once. This is an instance where we must bend to our infrastructure overlords, alas! The precise limit is somewhat challenging to calculate based on the existing set of Cloud storage functions, as it’s based on three opaque things:

  • The number of records that can be updated in a single transaction (100 records in total)
  • The maximum size of each record (400kb)
  • And the aggregate amount of data being transactionally updated (4MB)

Why is this hard to determine? While you may think that you’re moving less than 100 records at a time, your Cloud storage data structure might be modifying some information about the structure of your data behind the scenes as part of the update. Additionally, the size of your records is challenging to calculate because it includes the size of the serialized dependencies of the data. Neither of these is easily calculable before the transaction is run, therefore, it is important to keep transactions small.