- Published on
Data Modeling at Tincre
- Authors
- Name
- Jason R. Stevens, CFA
- linkedin@thinkjrs
At Tincre we use a variety of practical software development strategies to make the magic happen for our several thousand customers and tiny but mighty team.
One of those is how we model, manipulate, store, and use data at the application and database level. You see, our applications primarily consist of three separate layers:
- Frontend: things that Tincre clients actually use/click on, such as the ads button.
- Backend: where the actual logic is implemented; for example, running ads via Meta's APIs.
- Database: where and how client data, application data, and ad data are stored.
In this first of two parts, we'll dig into all this and build a toy, but robust, app. From scratch. So grab a coffee and buckle up.
Our strategy's motivation
Before we get going, let's dive a little deeper into the why behind all of this prior to highlighting our heuristic solutions.
Why heuristic? Specific solutions are far too detailed/complex for this communication medium. One of these days we'll have time to publish a technical paper or two on our method.
Flatten data objects
To begin, we like to flatten objects as much as possible, as nested traversal can be costly and developer antagonistic for our use cases at Tincre.
The web works with JSON
objects and nearly everything we build operates on, within, and throughout the internet.
Architectural design tradeoffs
Flattening makes architectural design exceedingly difficult as we also require rapid access to lots of data for time series calculations and reporting. We clearly see this as a cost, though obviously not enough to outweigh the benefit flattening provides.
Though your author is a 'math person' and when programming much prefers generalized abstraction, generalization is often less useful on a per-feature-time-cost-benefit for application development, the kind of development we do at Tincre.
In a perfect world we would layer our data model designs via intelligent inheritance, much like GraphQL APIs allow for. But those designs scale extremely poorly with the type of time series work we do and given the developer resources to which we have access.
As you may know, we operate Tincre as a lean and hyper-efficient machine.
Over time we've developed a simple strategy to optimize for our needs. Those needs consist of modeling database tables flexibly, manipulating those with object-relational mapping tools (ORMS) in TypeScript/JavaScript and Python, and ensuring validation of table parameters from backends.
Modeling database tables
We primarily use PostgreSQL to house structure and house our data. We interact with our Postgres databases via a combination of ORMs in JavaScript/TypeScript and Python.
Prisma - JS/TS
We absolutely love Prisma for modeling our tables and their interaction with one another. Database modeling itself can be complex and Prisma takes much of the pain away.
We maintain Prisma models as the single source of truth from which our other language ORMs inherit. we'll spare details herein.
SQLAlchemy - Python
In Python, we use an awesome tool called sqlalcodegen to generate SQLAlchemy classes for object modeling within our Python backends.
This lets us model our data from a user-first standpoint and ship robust features faster.
None
to null
to NULL
Due to how PostgreSQL is designed, it stores optional keys as NULL
values. JavaScript's JSON takes null
values for shipping empty keys. And Python uses None
, as it was designed after the former languages/tools and rightly corrected for nullish bugs.
Not that Python is perfect, it has its own issues! In this author's opinion, language design is equivalent to attempting a unified-theory of mathematics/physics.
Okay, so how do we reconcile all of this?
Enter KISS modeling, merging, and usage strategy
We like to have flattened models that are flexible and require nearly no developer time when requiring updates. In PostgreSQL and most languages this translates to optional parameters.
So that's exactly what we do. Let's look at an example Payment
table modeled via Prisma and SQLAlchemy:
// Prisma
model Payment {
id Int @id @default(autoincrement())
paymentSentAt DateTime @default(now()) @db.Timestamptz(6)
myParam String?
}
# SQLAlchemy
class Payment(Base):
__tablename__ = "Payment"
id = Column(
Integer,
primary_key=True,
server_default=text("nextval('\"Payment_id_seq\"'::regclass)"),
)
paymentSentAt = Column(
TIMESTAMP(True, 6),
nullable=False,
server_default=text("CURRENT_TIMESTAMP"),
)
myParam = Column(Text, nullable=True) # can leave out "nullable" parameter here, as it default
The amazing thing about PostgreSQL is that it allows for rigidity and/or flexibility, depending on your requirements. On top of that, the tool ecosystem is as robust, stable, and vibrant as it gets.
We love the flexibility aspect while selectively leveraging its enforced rigidity, where necessary. In the above Payment
table example, we guarantee the timestamp, allowing "parameters" or "features" or "variables" (all synonyms here) to be NULL
.
Python-side models: Pydantic and SQLAlchemy
In Python, we further model our classes using the remarkable Pydantic library to detail object parameters while working with them in application code.
You can even interface with your ORM here, using the orm configuration mode. We often leave this out as our models can get too detailed/complex.
# pydantic
class PaymentFeatures(BaseModel):
"""A class to model data features (columns) for
the Payment table stored in our PostgreSQL database"""
myParam: Union[str, None] = None
To convert objects to Python dictionaries and JSON we use the .dict()
and .json()
methods we get by inheriting from Pydantic's BaseModel
class, respectively.
📚️ Read the Pydantic documentation for details covering this topic.
Part II - build the app from scratch
In the next part of this section we'll actually build the app from start to finish, code included. Everything you need to get going.
Stay tuned. To boot, a fantastic way to do that is via our newsletter, for which you can easily sign up below.
🧐 Stay classy.