Schema modeling with OrderedTable
In this tutorial, we'll walk through some best practices for OrderedTable
schema modeling while writing a simple blogging app. We know that the blogging app will need to support querying logic by both an author and the most recent posts, but the database schema design is left to us.
Here's what we'll cover:
OrderedTable
schema design anti-patterns- A better, query-driven approach to schema design
- Using transactionality for data consistency
Cloud storage deep dive video
Prerequisites
- A Unison Cloud account
- Familiarity with creating Unison types
- Familiarity with basic Cloud interactions
Why use OrderedTable
The OrderedTable
cloud storage type behaves like a Map
; it stores typed key value pairs with CRUD operations. But OrderedTable
is also sorted and data can be retrieved with range operations. This makes it an especially good choice for data that needs to be chunked or returned in a specific order, as in our blog post example.
What not to do...
If you’re used to modeling data in a relational context, you might not know where to start when creating your OrderedTable
schema, or worse, you might define a schema which is incapable of supporting your application’s use-cases.
It might be tempting to hop right into defining the data models, i.e. an Author
writes many Post
entities with a unique OffsetDateTime
. Maybe we only need one simple, wide table—something like OrderedTable Author [Post]
:
type Author = {
authorId : Text,
name : Text
}
type Post = {
title: Text,
body : Text,
timeStamp : OffsetDateTime,
}
antiPattern : '{Exception, Cloud} OrderedTable Author [Post]
antiPattern = do OrderedTable.named !Database.default "badSchema" Universal.ordering
After all, the Author
could be the key to a List
of Posts
, and maybe we can add the equivalent of a SQL where
or limit
clause with ordering for some filtering logic based on time.
Here’s why that’s not ideal:
Storing data in an unbounded list in OrderedTable
is generally an anti-pattern
Say you want to edit a single blog post inside the List
of Post
as a transaction. Do you want the entirety of the author’s blogposts to also be locked up while the update for a single post is taking place? Generally, no.
OrderedTable
also has some hard-stop data size limits when it comes to reading and writing data, so especially prolific bloggers might eventually run into issues creating new posts!
As a rule, if your OrderedTable
encloses a list as its value, it’s a sign that you may want to key the data differently, so that your List
of Posts
becomes something shaped like a OrderedTable SomeUniqueKey Post
.
Query-driven schema design
Because we’re not handing off our data to something with a query plan optimizer, we need to start by defining precisely how our application will be accessing the data.
💡 Our data modeling concerns here are twofold, for database writes, the keys we use need to be unique enough to avoid inconsistencies, and for our reads, we want our business queries to work with the minimum amount of in-memory filtering.
Ideally, we also want the lowest possible factor of data duplication, but as we’ll see, the queries over our data will ultimately dictate that.
For the purposes of this example, these are the queries we want to support:
- Get all the posts for a given author
- Get the most recent N posts by a given author
- Get the most recent N posts on the entire blogging website, including the author name
The only query that our earlier OrderedTable Author [Post]
would support without in-memory scans is the first one.
Compound keys for better queries
Here’s a better schema for our app: we define two OrderedTables
with compound (tuple) keys–one for queries first scoped by an Author
in some way, and a second for queries that are filtering by time information first.
For convenience, an AppStorage
type can contain both OrderedTable
tables and the Database
they’re stored in:
type AppStorage = {
database : Database,
postsByTimestamp : OrderedTable (OffsetDateTime, Author) Post,
postsByAuthor : OrderedTable (Author, OffsetDateTime) Post
}
The compound keys ensure that we’re writing unique entities since it’s unlikely the same author is composing two posts with sub-millisecond timing. The ordering of the elements in the tuple is also important, since the OrderedTable
will be ordered by the first element in the tuple and then order by the second only if the first elements are equal.
OrderedTable (Author, OffsetDateTime)
covers the first two desired queries. Getting all the posts by a given author and the latest N posts by a given author are both supported by variations of a call to the rangeClosed.prefix function. It queries the OrderedTable
for all keys within the given range, using only the first element of the compound key.
getAllPostsByAuthor : AppStorage -> Author -> {Remote} [Post]
getAllPostsByAuthor appStorage author =
postsByAuthor = Storage.postsByAuthor appStorage
streamEntities = rangeClosed.prefix postsByAuthor prefixOrdering author author
Stream.map (at2) streamEntities |> Stream.toList!
getLatestPostsByAuthor : AppStorage -> Author -> Nat -> {Remote} [Post]
getLatestPostsByAuthor appStorage author n =
postsByAuthor = Storage.postsByAuthor appStorage
streamEntities = rangeClosed.prefix postsByAuthor prefixOrdering author author
Stream.take n streamEntities |> Stream.map (at2) |> Stream.toList!
Likewise, to support the final query across all authors over time, we’ll use the OrderedTable
keyed by (OffsetDateTime, Author)
and the same prefixQuery function.
getLatestPosts : AppStorage -> OffsetDateTime -> Nat -> {Remote} [Post]
getLatestPosts appStorage time n =
postsByTimestamp = Storage.postsByTimestamp appStorage
streamEntities = rangeClosed.prefix postsByTimestamp prefixOrdering time time
Stream.take n streamEntities |> Stream.map (at2) |> Stream.toList!
Use transactionality to sync your data across OrderedTables
Keeping two OrderedTables around with the same content but different keys sounds like a recipe for inconsistency, but the Cloud’s durable storage layer has another mechanism for keeping data in sync and that is transactionality.
Represented by the Transaction
ability in functions like OrderedTable.write.tx
, we can use this ability to describe interactions with the cloud’s storage layer that need to complete together.
In our case, when we receive a new blogpost, we need to write to both OrderedTables
in the same transactional block. Below, the transactional block is opened by the do
keyword, and run as a Storage
interaction with the transact.random
function.
writePost : AppStorage -> Author -> OffsetDateTime -> Post -> {Remote, Storage, Exception, Random} ()
writePost appStorage author time newPost =
postByAuthorOrderedTable = Storage.postsByAuthor appStorage
postByTimestampOrderedTable = Storage.postsByTimestamp appStorage
db = Storage.database appStorage
Storage.transact.random db do
OrderedTable.write.tx postByAuthorOrderedTable (author, time) newPost
OrderedTable.write.tx postByTimestampOrderedTable (time, author) newPost
Run a test deployment
You can exercise each of these database interactions in local tests by writing posts with the writePost
function and then querying them with your various read functions. Here's an example of a test run that writes two posts and then queries for all posts by a given author:
createStorage : '{Exception, Cloud} Storage
createStorage = do
database = Database.named "posts"
Database.assign database (!Environment.default)
postsByTimestamp = OrderedTable.named database "postsByTimestamp" Universal.ordering
postsByAuthor = OrderedTable.named database "postsByAuthor" Universal.ordering
Storage database postsByTimestamp postsByAuthor
testGetAuthor : '{IO, Exception} [Post]
testGetAuthor = Cloud.main.local do
postTime = !OffsetDateTime.current
secondTime = addDuration postTime (Duration.seconds +10)
appStorage = !createStorage
Cloud.submit (!Environment.default) do
writePost appStorage (Author "Alice") postTime (Post "Hello" "World" postTime)
writePost appStorage (Author "Bob") secondTime (Post "Goodbye" "Computer" secondTime)
getAllPostsByAuthor appStorage (Author "Alice")
Note that the local updates and reads to the database must take place in the same Cloud.main.local
block. Entering run testGetAuthor
in the UCM will return a list of posts by the author "Alice", not "Bob".
Summary
Working with OrderedTable
requires slightly different thinking about your data modeling. If you’re ever stuck, remember:
- Your
OrderedTable
should very seldom look likeOrderedTable Key [ListOfValues]
- Start by describing the queries your application will need to perform. It will help dictate what your data should be keyed by.
- It’s not uncommon to coordinate multiple
OrderedTables
withTransactions
in order to serve different access patterns for your data.