Tomas Kocman14 min

Creating a Project Template? Go With Database Storage

EngineeringNov 28, 2023

Engineering

/

Nov 28, 2023

Tomas KocmanGo Backend Engineer

Share this article

I expect you’ve already read the first part of this “STRV Go Template” series, where we mentioned the database layer and noticed a directory called postgres in each of the domains. In this blog post, you will find out what’s inside — and how we work with a database, what design pattern we adopted and, most importantly, how it fits in with domain-driven design.

When talking about databases a little bit deeper, the database is our source of truth, isn't it? A lot of effort should be put into keeping data consistent to prevent application misbehaving. Our database layer must also be impervious to data races.

One of the most important aspects is maintainability and extendability. You definitely shouldn't be stressed out when modifying or implementing a new feature — that's the goal. Everyone wants to prevent situations like data inconsistency due to data races of transactions or internal server errors caused by wrongly handled database errors. 

Let’s see how we work with a database and how it matches the DDD principles.

Which Database to Choose?

From my experience, most projects — at least in the first phase — are suitable for relational databases. From time to time, we also have use cases for document databases, and it’s possible that in later phases, we might run into performance issues or that we just want to improve application latency. This is the right time to think about a caching system. 

I mentioned three database types, but what specific databases do we really use? 

In the case of relational databases, it’s PostgreSQL (Aurora Serverless V2); as a document database, it’s MongoDB (Amazon DocumentDB); and as a caching system, we have experience with Redis (Amazon ElastiCache) and Amazon DynamoDB. But I think the template should contain only the bare minimum of functionality, which will most probably be used within every project. That means we’ve chosen PostgreSQL as a database in the template — and if there’s a requirement for another database during project development, we add the support as we go. 

Considering all that, the rest of this blog post focuses solely on PostgreSQL implementation and how we work with relational databases.

Not Everything Is Solved by an ORM

When starting to implement a database layer, many people’s first question is, “What ORM should I use?” There is actually another option, which I really like: using plain SQL queries. 

There are both advantages and disadvantages to using an Object-Relational Mapper. ORMs are great at abstracting database access, which is perfect when you’re starting a new project and don’t want to spend time fiddling around with SQL database-specifics early on. Maybe you’ve noticed that a lot of people in the Go community do things a little differently, and database access is a good example. As Rob Pike said, “Clear is better than clever.” 

The use of ORM in Go is not typical among developers. I realize ORM has benefits like significant improvement in developer speed or helping with migrations. But on the other side, it's less performant and too invasive and "magical.” ORMs hide much of the SQL internals, which might be very tricky when it comes to complicated queries.

We use plain SQL; it's the Go community’s standard and is good to know in general. But to mention some drawbacks, it does represent a challenge to manage a lot of queries and increases maintenance efforts. Writing simple SQL statements to do basic CRUD operations also takes more time than with ORM. 

ORM in Go comes with the danger of not having all types of support (like complex joins) so at some point later down the road, you might end up writing plain SQL anyway. It's not uncommon to solve complex problems just using a single optimized query in Postgres. 

I also see another problem, less directly about ORM and more about the design nature. It can be so easy to think of models as regular objects that using them everywhere goes without saying. Sometimes, I hear from other departments: “You guys copy-paste models so much, why don’t you just use a single model for business logic and at least a database but ideally for everything?” Well, I don’t know much about mobile development but in the backend, before you know it, the entire application starts revolving around that same fat model — and other parts of the code suddenly depend on parts of the model on which they weren’t supposed to. 

Regardless of whether you use an ORM or not, keep your database models succinct and tied to the database layer. Keep in mind that the domain is the core of your application and the database adapter is supposed to satisfy the domain’s output port. That’s one of the reasons we keep database models unexported. I’ll explain how conversion to domain models is made a little later, in the repository pattern section.

SQL Driver

Another decision is in order. Choosing the right underlying SQL driver is the next critical mission in a project’s early stages. In the case of the template, a pgx driver with raw SQL queries is used. Of course, there are alternatives, such as sqlx or sqlc; we've chosen pgx because it's suited directly for Postgres, which we use for every project.

Since it's written purely for a single database, it aims to be low-level, fast and performant — enabling Postgres-specific features other packages don't allow for. You can even define serialization/deserialization for your custom type in relation to the Postgres type. All SQL files the app needs are embedded directly into the binary using the embed package.

Querying & Scanning

Now that we’ve selected the SQL driver, the question is how much it offers in terms of querying and scanning. 

Until recently, there was only one option for how to query and scan using pgx. If you use the default Go database/sql package for operating with databases, you’re probably familiar with positional arguments. Let's take a simple SQL query as an example:

SELECT
    u.id,
    u.name,
    u.email
FROM
    user AS u
WHERE
    u.email = $1

Each argument passed into the query function needs to match the dollar signs in the query definition. Similarly, when scanning the result into the user object, three arguments for scanning need to match the order in our case. 

Things substantially changed with pgx v5. Arguments in our simple query could be slightly rewritten. Instead of the dollar sign, we can now use named arguments.

u.email = @email

This approach significantly eliminates the possibility of mistakes and it's much more readable. You explicitly provide the mapping between argument names and structure fields, so there shouldn't be any mistakes in argument ordering.

pgx.NamedArgs{"email": user.Email}

There is an analogous approach regarding scanning. If you’ve ever worked with sqlx or some alternative, you’re probably already familiar with modifying the structure with db tags. We use the package scany to enable this feature. In that case, we just provide tags for structure fields and scany ensures correct argument handling when reading from Postgres.

type user struct {
    ID    uuid.UUID `db:"id"`
    Name  string    `db:"name"`
    Email string    `db:"email"`
}

After the changes, both the code and SQL queries are much more readable and maintainable.

Migrations

Migrations are an integral part of working with a database. Every time a new feature comes in, migration files sit nearby. 

Migration means converting a database schema to a new version requested by a new functionality. This is the field where ORM probably wins, with its out-of-the-box auto migration functionality. But don't worry, there’s a great tool for database migrations: migrate

We use migrate in the form of CLI binary for local development or testing migration files, but it can be used as a Go package; since this database tooling is written purely in Go, you can easily implement auto migrate for your application, too. We did so in the template and I highly recommend you to do the same. The application simply tries to apply all migrations on each start so we don’t have to do it manually before deployment. It supports a wide range of databases, including⁠ MongoDB, Neo4j, Cassandra and CockroachDB. 

The most important part here is the migration files. Each migration consists of two SQL files (up and down script). My main recommendation is to name the files with number prefixes so migrate processes them in the correct order. In our case, it could be:

0001_create_tables.up.sql
0001_create_tables.down.sql

The number at the beginning of the file name is the order number the migration file was created with. Migration files can be generated directly with the migrate tool.

migrate create -ext sql -digits 4 -seq -dir ./migrations create_tables

Alternatively, you can name the start of migration files with the date — but we changed it to simple integers. In an environment where a lot of developers in a single team create their pull requests, a migration with an earlier date could be merged after another migration with a later date. In this case, the migrate tool wouldn’t apply this migration. The worst-case scenario with sequential migration names is that two migrations will have the same sequence, in which case the error will be returned so a developer can notice and fix it.

The content of the files is the description of how the database schema should be built (in most cases, it's create/drop table). migrate allows you to do miracles with your database based on your migration files — just try it out yourself.

Repository Pattern

With all chore parts set up, we can finally delve into the relationship between the domain and the database — starting with the most interesting part for those interested in DDD. 

The first part of this blog series depicts domain code structure. Inside each domain is the repository.go file. The thing is, the repository is the whole pattern; it cannot fit into a single file with a code. Let’s deep dive into why this pattern is one of my favorites. 

In my opinion, one of the main reasons an application may have become very complex and hard to read is that some people tend to be too reluctant to a proper decoupling of the database logic and the application (business) logic. Keeping the logic of your application along with your database logic makes your application much more complex and hard to both test and maintain. So if you encounter words like connection, begin, commit, rollback, and so on in your service layer, be aware that it might be a serious problem in the future. 

The main purpose of the repository pattern is to separate these two logic spaces. It allows you to make your code simpler and easier to add new functionalities. As a bonus, you can defer the important decisions of choosing a database solution and schema. Another good side effect of this approach is out-of-the-box immunity for database vendor lock-in.

The idea of the repository pattern is to communicate with the database via the interface. You should define the interface to support any database implementation, which means that it should be free of any implementation details of any database. The interface in the repository.go might look like this:

type Repository interface {
    Create(ctx context.Context, user *User) error
    Read(ctx context.Context, userID uuid.UUID) (*User, error)
}

Hidden within the Create and Read functions of real database implementation is the entire database boilerplate — like initializing a connection, releasing, calling Exec or Select pgx functions and so on. This is answers what’s inside the postgres directory (a question posed in the previous blog post). As you can see, there’s no need for a service using this interface to call any database low-level operation, like for working with connections and (most importantly) transactions. As a result, another SOLID principle — dependency inversion — is ticked off.

You might be thinking that it’s an artificial example and things are much more complex in the real world. And you’re right, because how do we do the transactions? 

Personally, my biggest challenge was how to manage transactions in a clean way that does not affect the rest of the application too much, is not dependent on the implementation and is both explicit and fast. Let’s take a look at the Update function I added to the repository interface.

type UpdateFunc func(*User) (*User, error)

type Repository interface {
    Create(ctx context.Context, user *User) error
    Read(ctx context.Context, userID uuid.UUID) (*User, error)
    Update(ctx context.Context, userID uuid.UUID, fn UpdateFunc) error
}

Besides the Update function, you can see also UpdateFunc. This is the key to working with transactions in the repository pattern; it’s an approach based on closure passed to the update function. The basic idea is that when we call Update, we need to provide UpdateFunc that can update the provided user. In practice, the transaction flow in the database function is as follows:

  • Open the transaction.
  • Read the user from the database.
  • Call the update method on the user object.
  • Save the updated user back to the database.
  • Execute commit/rollback.

It’s also nothing terrible to have multiple update-like functions created with extra data to save. Under the hood, the implementation should reuse the same code without a lot of duplication. 

Pro tip: Don’t forget to use SELECT … FOR UPDATE in your queries to prevent data races where it might potentially happen. But still, you can ask how to do certain operations — like creating a user, a session and some default settings in our database, all within one transaction. (I’ll elaborate on this in a minute.)

Eventual Consistency

One of the most important aspects of software development in general is keeping application data consistent, no matter what. It means having a guarantee that changes made within a transaction are consistent with database constraints. Consistency models are a topic for a blog post of its own — but in general, in DDD, you should strive for eventual consistency beyond your bounded contexts, as we already know from the repository pattern that a domain cannot touch the data of another domain. Bounded contexts have to be completely separated, which, of course, also applies to database operations. 

Eventual consistency is a design approach for improving scalability and performance. Although transactional consistency seems more straightforward and easier to use, eventual consistency has an advantage in a lot of use cases. In DDD, it often leads to better design compared to working with traditional transactions. I wouldn’t even be afraid to say that in real-world use cases of really big systems, transactional consistency workflows are rare. A business process often involves a series of use cases and a persisting group of changes represented by aggregates. Domain use cases are responsible for keeping aggregates consistent.

Getting back to the problem with the user, session and settings, I’d recommend rethinking the aggregates and making an aggregate from the user and his settings. The session could be a separate aggregate. To support this statement, my previous blog post states that DDD is a great design for rewriting your entire huge application into microservices, if you decide to do so. You can now see one of the reasons. 

If you’d like to have the session in one transaction with the user and all other things that relate to the user, how would you then refactor this enormous transaction and the whole flow to make an authentication microservice? It would be very difficult and ultimately — because of the nature of microservices — you’d end up with eventual consistency anyway. So think about it from the very beginning and try to design your entities and aggregates in the correct way.

Even if the approach with repository and eventual consistency between bounded contexts adds a bit more code, it’s totally worth making that investment. It should benefit you very soon, when the project grows up.

Database Adapter in Detail

We already know the relationship between the service and database layer, but how is the database layer actually implemented internally to support all needed functionality to repositories inside the domains? 

Because all domains in our template use PostgreSQL, there’s no need to duplicate code in each domain. Instead, we’ve separated the database implementation into the sub-package of the project root.

There are also all migration files because it’s very convenient to have all migrations in one place. In the case of our template and most of our projects (where we use solely PostgreSQL), there’s no reason to not have all migration files in the database adapter — so domain repositories can expect the prepared database schema in place.

Setting Up the Correct Interface

So that repositories can do their job, they require very simple database operations, specifically acquiring/releasing a connection and transaction-related operations like beginning, committing and rolling back transactions. Since the repository embeds the whole database adapter, we can simply start with the interface describing how the database functions will be used and what an adapter needs to implement.

type DataContext interface {
    Ctx() context.Context
}

type DataSource interface {
    AcquireConnCtx(ctx context.Context) (DataContext, error)
    ReleaseConnCtx(dctx DataContext) error
    Begin(ctx context.Context) (DataContext, error)
    Commit(dctx DataContext) error
    Rollback(dctx DataContext) error
}

DataContext holds internal information about a connection or transaction. The second interface, DataSource, is used for obtaining or using DataContext and is directly embedded in repositories.

That's how the contract between the database layer and other parts of the code base could look like. No one knows anything about the internal implementation of the database, they just use the interface. 

As you can see from the function declarations in the DataSource, you can simply change the SQL database — since the interface is pretty common for relational databases. It’s up to you how you implement concrete implementation of this interface based on the used database and driver, but there are no big differences.

Useful Helpers

It’s boring to constantly copy-paste AcquireConnCtx/ReleaseConnCtx or similar functions in each repository function. Let’s try to simplify it and make generic helpers, which will make repositories more concise.

func WithConnectionResult[T any](ctx context.Context, s DataSource, f func(DataContext) (T, error)) (result T, err error) {
   dctx, err := s.AcquireConnCtx(ctx)
   if err != nil {
      return result, fmt.Errorf("acquiring connection: %w", err)
   }
   defer func() {
      if releaseErr := s.ReleaseConnCtx(dctx); releaseErr != nil {
         err = errors.Join(fmt.Errorf("releasing connection: %w", releaseErr), err)
      }
   }()
   return f(dctx)
}

This is one of our helper functions for working with databases in the repository. WithConnectionResult takes a function as a parameter, which contains the actual database logic. The function is just wrapped between connection handling. 

Similarly, we have helpers for working with transactions. An example is refreshing a session. The helper first begins a transaction, then calls the provided function with expected logic like reading the session, checking whether the session isn’t expired, and generating and storing a new session. Then, the helper calls commit and rollback if needed. 

But back to our simpler use case of just a connection. The practical use case of this function might look like this:

func (r *Repository) Read(ctx context.Context, userID uuid.UUID) (*domuser.User, error) {
    return sql.WithConnectionResult(ctx, r.dataSource, func(dctx sql.DataContext) (*domuser.User, error) {
        user, err := sql.ReadValue[user](dctx, readUserQuery, pgx.NamedArgs{
            "id": userID,
        })
        if err != nil {
            if sql.IsNotFound(err) {
                return nil, domuser.NewUserNotFoundError().Wrap(err)
            }
            return nil, err
        }
        return user.ToUser(r.userFactory), nil
    })
}

In this example, you can see the repository function for reading a user with everything together. dataSource is a real database implementation of the interface mentioned above. Query and error handling are inside the closure. You can expect a valid database connection inside dctx object. 

The interesting thing to mention in error handling is NewUserNotFoundError(), which is a helper function generating a new custom domain error. To see how the user structure and readUserQuery can look like, examples are mentioned in Querying & Scanning section.

But what does sql.ReadValue actually do? There are some generics, too, but why generics for such an easy operation? It’s our second useful helper for the database layer that allows us to make a repository function even more convenient.

func ReadValue[T any](dctx DataContext, query string, args ...any) (T, error) {
    var result T
    if err := pgxscan.Get(dctx.Ctx(), QuerierFromDataContext(dctx), &result, query, args...); err != nil {
       return result, err
    }
    return result, nil
}

ReadValue leverages a scany package to select exactly one item from the database and parses it into the provided object with the help of generics. QuerierFromDataContext returns an object capable of making Exec, Query and other low-level database operations because, as mentioned, the real database connection is saved in dctx

From the previous example, you could deduce that the user is a database object and the domuser.User is a domain object. To convert a database object to a domain object, the repository uses helper functions with a domain factory.

Testing

Unit testing of the database layer is unfortunately not very straightforward. We use the 3rd-party package pgxmock, which enables us to mock PostgreSQL's most used functions. 

In my opinion, testing code that works with a database is kind of problematic. Internal functionality like WithConnectionResult or ReadValue is good to have tested, but the implementation of repositories is debatable. In our experience, if the implementation changes, we need to fix unit tests. And if you make some mistake in the SQL script or in a parameter that is passed down to the SQL script, you are unable to detect it. 

After gaining this experience, we decided not to write unit tests for repositories since it doesn’t bring the expected value compared with the amount of time spent writing them. Unit tests should be fast and efficient — and this requirement wasn’t fulfilled. But I encourage you to write unit tests for internal database implementation and all helpers that are used in your repositories. 

In my opinion, the best invested time lies in integration testing, involving a real database and not just a mock.

Conclusion

This post describes the fundamentals and basic concepts of the database layer in the Go template we use for each project. 

We started with the reasoning behind choosing raw SQL files instead of an ORM, a description of the pgx driver, how we store SQL files and how querying and scanning work with scany. Then we went through migrations with the migrate tool and a basic description of migration files. Finally, we set up important interfaces used to enable efficient encapsulation from concrete database implementation and internals, which are not important in the rest of the code base. I hope you find our generic helpers useful, and that you try it out on your own. 

But what is most important in this part of the series is the repository pattern. I highly recommend separating concerns and not coupling a database code with the service layer where it calls for business logic only. I’m almost sure if you try it, you won’t regret it. 

If you’d like to know more about our database layer in general or the helpers making our code base more concise — or if you have questions regarding the repository pattern — I’m more than happy to share my experience.

Share this article