Oct 27, 2023

Master the repository pattern in Typescript

Introduction

During my time at Spendesk, a company that later became a unicorn, I faced a common problem: SQL queries and ORM models were scattered throughout the codebase, leading to a disorganized and difficult-to-maintain system.

It was then that I discovered the repository pattern, which seemed like an ideal solution to our problem.

I started implementing the pattern at Spendesk, working closely with my team to iterate and refine our usage of the pattern over several months.

In the end, we developed a solid implementation that effectively improved our system’s maintainability.

Since then, I’ve continued to apply the repository pattern in other projects and found it to be a valuable tool for keeping codebases clean, organized, and scalable.

This post is the result of my experience playing around with the pattern over the years with other experienced engineers.

The examples we’ll use in this post will be based on an e-commerce application, making it easier to relate the concepts to real-world scenarios.

Core Concept

Domain

A domain refers to a specific area of activity or subject matter within a business organization. It encompasses the knowledge, processes, and rules that are specific to that area or subject matter. For instance, if we are developing an application to sell goods online, the domain is e-commerce.

The e-commerce domain has sub-domains. Here are few of them

Product management: product descriptions, prices, images, and inventory levels.
Payment processing: accepting credit card payments, PayPal, or other payment methods
Shipping and delivery: calculating shipping costs, tracking delivery status, and communicating with shipping carriers

Entities

An entity is an object that represents a real-world concept or object in the domain.

In an e-commerce application, an entity might be a Product, a Customer, or an Order.

Aggregates

An aggregate is a collection of related entities that are treated as a single unit. This can be visualized as a tree, where the root is the single entry point.

For example, in an e-commerce application, orders and line items are both entities that are linked together. An order may have one or many line items, but these line items only make sense in the context of that particular order.

Therefore, the aggregate in this case is the order.

Relation

Consistency boundaries

An aggregate is the consistency boundary, also known as transactional boundary. This means that any changes made to the aggregate must be consistent and atomic, so that the aggregate remains in a valid state at all times.

In our e-commerce application, an order consists of multiple line items, each with its own amount and quantity.

If a change is made to one of these line items, it’s important to ensure that the order’s total amount is updated accordingly.

The repository pattern

The importance of aggregates

When data is stored and retrieved from the storage medium without proper structure, it results in mixed and disorganized code that is difficult to name and communicate about.

Identifying the aggregates of an application is critical

for avoiding data inconsistencies (consistency boundaries)
because it represents the real life things, that are easy to talk about and manipulate. And it keep the code aligned with the domain.

An order is an order. A customer is a customer.

But an object with an id, a customer email and an order date represents nothing in the domain.

Keep domain and storage logics appart

Combining domain and storage logics result in a number of issues.

The code is hard to test

Testing becomes difficult due to the need to mock query building libraries or ORMs. At Spendesk, we were using Sequelize, a JavaScript ORM. It was challenging to mock because the output values were not simple plain objects.

The storage representation of the data

The storage representation of data can be different, particularly in the case of SQL where an aggregate may be split across several tables.

For example, an e-commerce application may have an orders table and a line_items table to store each item of an order (as it’s a one to many relationship). This constraint should not appear in the domain logic.

Polluted domain logic

Instead of having a clear logic that simply reflect the domain rules, the domain logic is polluted with the storage logic.

Moreover, changes made to the storage can then potentially affect the domain logic.

Introducing the repository pattern

The previous points underline how important it is to

define entities and aggregates that reflect real world entities, and enforce their use
separate the domain and storage logics

The repository pattern is the solution.

A repository is a component that encapsulate the storage logic of an aggregate.

The repository acts as a mediator between the domain logic of the application and the data access layer. It provides a set of methods for performing data operations, such as get, upsert, and delete, as well as other specific queries that the application may require.

By using the repository pattern, the application code is decoupled from the data access layer, which makes it easier to switch to a different data storage system or to make changes to the data access layer without affecting the rest of the application code.

It also makes it easier to unit test the application code, since the repository can be mocked easily.

Consistency boundaries

The repository implementation must ensure that all operations performed on the aggregate are atomic. In a SQL implementation, this can be achieved using SQL transactions. By wrapping all SQL queries in the same transaction, it’s possible to ensure that if any query fails, all the other queries will be reverted, preventing inconsistencies in the data.

Software architecture and repositories

Software architecture layers

You may be familiar with clean architecture or hexagonal architecture.

Regardless of how your project is architectured, you should have at least a separation between the domain and the infrastructure details, as explained before.

The medium of storage should not affect the domain logic, because the domain logic doesn’t know and is not related to a particular medium of storage. Orders can be stored in a file, a relational database, or in RAM: in does not matter, the domain logic remains the same.

The infrastructure layer is responsible for providing access to the underlying technology, such as databases, message queues, and web services. It acts as a bridge between the domain layer and the underlying technology stack, enabling the domain layer to interact with the underlying resources without being tightly coupled to them.

To decouple the domain from the infrastructure, the domain will define and expose an interface that describe the persistence needs for a particular aggregate.

An implementation of this interface can be built for a particular medium of storage (a PostgreSQL database for instance) in the infrastructure layer. This implementation might be injected in the domain. The domain doesn’t know which implementation it is (that is PostgreSQL behing), but it does not matter.

Implementing aggregates

Entity services

In Typescript, two options are available for implementing aggregates

object-oriented programming (OOP) using classes
plain objects

In either approach, an entity service will be required to apply business logic and call the repository.

The difference is that with OOP, the domain logic will be encapsulated into the entity class. While with plain objects, the domain logic will be in the entity service.

In OOP, the domain logic must not be in the entity service, as this goes against OOP principles. Instead, the domain logic will be encapsulated in the entity class.

In the contrary, for plain objects, the business logic must be in the entity service, as plain objects are not capable of encapsulating domain logic: they are just structs.

Let’s see 2 examples, one using OOP, and another one using plain objects.

Both examples will use a simplified product entity, with only a name and a price. They focus on updating the product price. The business rule is that the product price must be greater than 0.

Using plain objects

interface Product {
  id: string;
  name: string;
  price: number;
}

const buildProductService = (dependencies: Dependencies) => {
  const { productRepository } = dependencies;

  const updateProductPrice = async (
    id: string,
    price: number
  ): Promise<UpdatePriceResult> => {
    // first, the product is fetched from the repository
    const product = await productRepository.get(id);

    // if the product can't be found, then an explicit business error is returned
    if (!product) {
      return { outcome: "notUpdated", reason: "productNotFound" };
    }

    //  make sure the price is greater than zero,or return an explicit business error
    if (price <= 0) {
      return { outcome: "notUpdated", reason: "priceLowerOrEqualZero" };
    }
    // update the price of the entity
    product.price = price;

    // upsert the entity in the repository to persist the new price
    await productRepository.upsert(product);

    return { outcome: "updated" };
  };

  return { updateProductPrice };
};

Using classes

The big difference is that price verification and update has been moved inside the Product class.

class Product {
  private id: string;
  private name: string;
  private price: number;

  constructor(id: string, name: string, price: number) {
    this.id = id;
    this.name = name;
    this.price = price;
  }

  // the product entity has a method to set the price
  public setPrice(newPrice: number) {
    // make sure the price is greater than zero,or return an explicit business error
    if (newPrice < 0) {
      return { outcome: "notUpdated", reason: "priceLowerOrEqualZero" };
    }
    // update the price of the entity
    this.price = newPrice;
    return { outcome: "updated" };
  }
}

const buildProductService = (dependencies: Dependencies) => {
  const { productRepository } = dependencies;

  const updateProductPrice = async (
    id: string,
    price: number
  ): Promise<SetPriceResult> => {
    // first, the product is fetched from the repository
    const product = await productRepository.get(id);

    // if the product can't be found, then an explicit business error is returned
    if (!product) {
      return { outcome: "notUpdated", reason: "productNotFound" };
    }

    // call the setter
    const updateResult = product.setPrice(price);
    if (updateResult.outcome === "notUpdated") {
      return updateResult;
    }

    // upsert the entity in the repository to persist the new price
    await productRepository.upsert(product);

    return { outcome: "updated" };
  };

  return { updateProductPrice };
};

Complete typescript example

Let’s take a look at a complete definition and implementation of an aggregate and its associated repository, using plain objects.

For the purpose of this example, we will once again use the order aggregate, this time including all of its attributes.

Domain layer

Starting from the domain is a natural approach, as it forms the foundation of the application.

It is crucial to build the domain around business constraints rather than technical ones.

First, define the business entity

Each order must have a unique identifier.
An order belongs to a customer.
The date when the order was placed should be stored.
An order can be cancelled, subject to certain conditions.
An order can contain multiple items:
- Each item corresponds to a product in the catalog.
- The price of the item should be stored, even if the product has a price because the product price may change over time.
- The same product can be ordered multiple times within the same order.
The total amount of the order should be calculated as the sum of all the items in the order.

Here is one way to define the aggregate

export type LineItem = {
  id: string;
  productId: string;
  quantity: number;
  price: number;
};

export type Order = {
  id: string;
  customerId: string;
  orderDate: Date;
  totalAmount: number;
  lineItems: LineItem[];
  cancellationDate?: Date;
};

Define persistence needs for this aggregate

Given the defined order aggregate, here are the persistence needs

The order’s unique identifier will be referenced in various places (such as shipping), so it must be possible to retrieve an order by its ID.
Retrieve all orders associated with a specific customer.
It’s possible for an order to be updated in some cases (such as when it’s cancelled).

Here is a possible interface for this repository.

interface Orders {
  getById: (id: string) => Promise<Order | undefined>;
  getByCustomerId: (customerId: string) => Promise<Order[]>;
  upsert: (order: Order) => Promise<void>;
}

Infrastructure layer

Now that everything is defined, let’s write a PostgreSQL implementation for this repository.

CREATE TABLE orders (
  id uuid PRIMARY KEY,
  customer_id int NOT NULL,
  order_date timestamp NOT NULL,
  cancellation_date timestamp NOT NULL,
  total_amount numeric(10,2) NOT NULL
  -- FOREIGN KEY (customer_id) REFERENCES customers (id)
);

CREATE TABLE line_items (
  id uuid PRIMARY KEY,
  order_id uuid NOT NULL,
  product_id int NOT NULL,
  quantity int NOT NULL,
  price numeric(10,2) NOT NULL,
  FOREIGN KEY (order_id) REFERENCES orders (id)
  -- FOREIGN KEY (product_id) REFERENCES products (id)
);

import { Knex } from "knex";
import { Order, LineItem, Orders } from "../../../domain/order";

interface PostgresqlOrderRepositoryDependencies {
  db: Knex;
}

type OrderRow = {
  id: string;
  customer_id: string;
  order_date: Date;
  cancellation_date?: Date;
  total_amount: number;
};

type LineItemRow = {
  id: string;
  order_id: string;
  product_id: string;
  quantity: number;
  price: number;
};

const transformOrderRowToOrder = (
  orderRow: OrderRow,
  lineItemRows: LineItemRow[]
): Order => {
  return {
    id: orderRow.id,
    customerId: orderRow.customer_id,
    orderDate: orderRow.order_date,
    cancellationDate: orderRow.cancellation_date,
    totalAmount: orderRow.total_amount,
    lineItems: lineItemRows.map((lineItem) => ({
      id: lineItem.id,
      productId: lineItem.product_id,
      quantity: lineItem.quantity,
      price: lineItem.price,
    })),
  };
};

const transformOrderToOrderRow = (order: Order): OrderRow => {
  return {
    id: order.id,
    customer_id: order.customerId,
    order_date: order.orderDate,
    cancellation_date: order.cancellationDate,
    total_amount: order.totalAmount,
  };
};

const transformLineItemsToLineItemRows = (
  orderId: string,
  lineItems: LineItem[]
): LineItemRow[] => {
  return lineItems.map((lineItem) => {
    return {
      id: lineItem.id,
      order_id: orderId,
      product_id: lineItem.productId,
      quantity: lineItem.quantity,
      price: lineItem.price,
    };
  });
};

export const buildPostgresqlOrderRepository = (
  dependencies: PostgresqlOrderRepositoryDependencies
): Orders => {
  const { db } = dependencies;

  const getById = async (id: string): Promise<Order | undefined> => {
    const orderRows = await db
      .select("*")
      .from<OrderRow>("orders")
      .where("id", id);

    if (orderRows.length === 0) {
      return undefined;
    }

    const lineItemRows = await db
      .select("*")
      .from<LineItemRow>("line_items")
      .where("order_id", id);

    return transformOrderRowToOrder(orderRows[0], lineItemRows);
  };

  const getByCustomerId = async (customerId: string): Promise<Order[]> => {
    const orders = await db
      .select("*")
      .from<OrderRow>("orders")
      .where("customer_id", customerId);

    const ordersItems = await db
      .select("*")
      .from<LineItemRow>("line_items")
      .where(
        "order_id",
        orders.map((order) => order.id)
      );

    return orders.map((order) =>
      transformOrderRowToOrder(
        order,
        ordersItems.filter((item) => item.order_id === order.id)
      )
    );
  };

  const upsert = async (order: Order): Promise<void> => {
    await db.transaction(async (trx) => {
      await trx("orders").upsert([transformOrderToOrderRow(order)]);

      await trx("line_items").upsert(
        transformLineItemsToLineItemRows(order.id, order.lineItems)
      );
    });
  };

  return { getById, getByCustomerId, upsert };
};

Note that the interface is called Orders and not OrderRepository. The reason is a DDD reason: in the domain, “repository” means nothing. We would say “retrieve the order 1 from the orders”, not “retrieve the order 1 from the order repository”.

A few best practices to finish strong

Naming

The repository

The repository interface name should be the name of the aggregate, in plural. For instance, the repository interface for the Order aggregate is called Orders. Because it represents all the orders.

The name of the repository implementation must contains the storage medium specifications.

The function responsible for building a PostgreSQL implementation of the repository Orders can be named buildPostgresqlOrderRepository.

The methods

A repository must be seen as a collection of data, similar to a Set.

In such a collection, there is no create method, since the creation of an entity is handled by the business logic. The repository’s role is to receive the entity from the business logic and persist it.

Similarly, cancel is not a valid method neither. The cancelOrder method could be part of the orderService as it’s the business logic that actually modifies the order to effect the cancellation. The repository’s responsibility is simply to receive the order and persist its updated version.

export const buildOrderService = (
  dependencies: OrderServiceDependencies
): OrderService => {
  const { orderRepository } = dependencies;

  // ...

  // cancel order is part of the order service
  const cancelOrder = async (id: string): Promise<CancelOrderResult> => {
    const order = await orderRepository.getById(id);

    if (!order) {
      return { outcome: "notCancelled", reason: "orderNotFound" };
    }

    if (order.cancellationDate) {
      return { outcome: "notCancelled", reason: "alreadyCancelled" };
    }
    // the cancellation is done by setting the cancelledAt date to the current date
    order.cancellationDate = new Date();

    // then, the repository upsert method is called to persits the change
    await orderRepository.upsert(order);

    return { outcome: "cancelled" };
  };

  // ...

  return {
    cancelOrder,
    // ...
  };
};

Ids and Timestamps

In a SQL database, it’s possible to generate values for certain fields when defining the schema of a table.

For instance, ID SERIAL PRIMARY KEY (or AUTO INCREMENT) will generate an ID for newly inserted rows. In most ORMs, timestamps like created at are automatically added to each row using things like DEFAULT CURRENT TIMESTAMP.

However, when working with the order aggregate, it’s important not to generate values for the id and order date fields in the storage medium.

These fields are business values. The id is used to uniquely identify the entity in the business logic, while the order date represents the date on which the order was created in the business logic, not the date at which the row was inserted in the storage medium.
The repository handle the persistence of the aggregate order, which includes the id and order date fields. So it must be set by the business logic
In theory, there can be multiple implementations of a repository interface. These implementation are distinct, so there is no way to ensure there is no ID conflict between the 2 repository.

Imagine if both repository implementation start the id value at 1 and increment the id counter after each insert. The same id will match different entities stored in 2 distinct repositories.

Here is the rule:

if the value is a business value, it must be set in domain layer
if the value is only used in the storage, it must be generated in the storage medium

Partial read, partial write, and performance

To cancel an order, the only attribute to update is the cancellation date.

Similarly, there may be cases where only certain attributes of an order are needed, such as the order dates for a customer.

In these situations: should there be methods that return or update only the relevant attributes ?

As already mentioned, a repository must be seen as a collection of objects, like a Set or a Map.

In such collections, the full object is fetched or set, not just a part of it. The same holds true for repositories.

Furthermore, an Order is an Order. A subpart of an Order is not an Order, and it can be difficult to name because it doesn’t represent anything meaningful in the domain.

Most of the time, the reason people want to do partial updates or partial gets is for performance reasons. Few extra attributes will not significantly impact the performance an application. And in most cases, code readability is more important than performance.

If there is actually a performance problem with a repository, the first step would be to assert that the aggregates and business logic are well defined.

Otherwise, the CQRS pattern can be beneficial in cases where there are performance issues, as it aims to separate the reading and writing responsibilities in a system.

By separating these concerns, a write model can handle commands and modify the state of the aggregate with well-defined business logic, while the read model can be optimized for queries and return denormalized data that is tailored to the client’s needs.

By doing so, this approach can result in significant performance improvements.

Concurrent update

Suppose that two upserts are made at the same time on the same order.

A fetches the order
B fetches the order
A changes the cancellation date, and calls the upsert method
B changes the total amount, and calls the upsert method.

Although no errors occur, the cancellation date ends up being set back to null. How can this be prevented from happening?

One solution to this problem is to use version numbers, though it may not be the best approach for every situation

Each entity in the system would be assigned a version number, which is incremented by one every time an update is made to the entity.

When the update method is called on the repository to persist the new version of the entity, the update should be performed based on both the entity’s ID and version number (rather than ID only). Since the version number has been incremented, the entity to be updated should be the one with the previous version (i.e. version number minus one). If no match is found, it means that at least one update has been made to the entity in the meantime, and the repository should return an error.

At this point, the business logic must determine how to handle the situation - automatic retry or returning an error are two possible options.

Testing the business logic

In memory repository implementation

Setting up a database for testing can be complicated, slow, and unnecessary when the purpose is to test the business logic. Instead of injecting an instance of a PostgreSQL implementation, it is possible to inject an instance of an in-memory implementation, which is much simpler and faster.

The in-memory implementation can be created using a Set or a Map, and does not require any additional setup.

// still an OrderRepository, but using a Map instead of a PostgreSQL database as storage medium

export const buildInMemoryOrderRepository = (): OrderRepository => {
  // use a Map to store the orders by id
  const ordersById = new Map<string, Order>();

  const getById = async (id: string): Promise<Order | undefined> => {
    const order = ordersById.get(id);
    return order;
  };

  const getByCustomerId = async (customerId: string): Promise<Order[]> => {
    const orders = Array.from(ordersById.values());
    return orders.filter((order) => order.customerId === customerId);
  };

  const upsert = async (order: Order): Promise<void> => {
    ordersById.set(order.id, order);
  };

  return { getById, getByCustomerId, upsert };
};

import { buildOrderService, Order } from "../index";
import { buildInMemoryOrderRepository } from "../../../infrastructure/repositories/order/inMemory";

// example of a unit test, testing the order service method cancelOrder, and using the in memory repository

describe("cancelOrder", () => {
  const orderRepository = buildInMemoryOrderRepository();
  const orderService = buildOrderService({ orderRepository });

  describe("given the id of an existing order", () => {
    describe("given the order is already cancelled", async () => {
      const id = "order1";
      const order: Order = {
        id,
        customerId: "customerId",
        orderDate: new Date(),
        cancellationDate: new Date(),
        totalAmount: 100,
        lineItems: [
          {
            id: "lineItem1",
            productId: "product1",
            price: 50,
            quantity: 2,
          },
        ],
      };

      await orderRepository.upsert(order);

      it("returns an error because the order is already cancelled", async () => {
        const result = await orderService.cancelOrder(id);

        expect(result).toEqual({
          outcome: "notCancelled",
          reason: "alreadyCancelled",
        });
      });
    });
    // other tests ...
  });
  // other tests  ...
});

Isolate the business logic

The in-memory repository is unnecessary because it does not add value to the test - it is only present because a repository must be injected. The focus of the test should be on the business logic, not the repository. In addition, a poorly implemented in-memory repository may affect the test results, and writing tests to evaluate a repository implementation that is solely used for testing purposes can be excessive.

To test the system effectively, it is best to isolate the business logic and test it independently. If the aggregates are implemented as classes, testing the class methods may be sufficient, since the business logic is contained within the class. On the other hand, if the business logic is primarily located in the entity service, it may be useful to extract that logic into functions that are not exposed outside of the service, but only used for testing purposes.

Conclusion

The repository pattern is a powerful tool for software engineers seeking to create maintainable, scalable, and organized codebases. Its ability to separate concerns and decouple domain logic from underlying data storage implementations makes it a powerful technique that enforces aggregate usage, prevents the contamination of business logic with storage logic, and makes testing easier.

Throughout this post, we have explored essential concepts and best practices for implementing the repository pattern in TypeScript, using a real-world e-commerce application as our example. From defining the repository interface and aggregates to implementing concrete repositories, we have covered a wide range of topics that will aid in mastering the repository pattern.

Whether you are a seasoned developer or just beginning, I hope this post has given you a solid foundation for building better software using the repository pattern.