Luke Autry

tserial: Validate TypeScript interfaces at runtime

A new way to validate JavaScript objects across system boundaries

One of the most powerful and well-regarded aspects of TypeScript is its structural typing. Interfaces and type expressions allow developers to define the shape of their abstractions with no runtime overhead. This grants developers the ability to approach their programs with a data-first mentality. With more advanced features like string literals, union types, intersection types, and tagged unions, developers can use types to express powerful patterns and build up compile-time guarantees around their code.

If you're like me, you find yourself using classes less, and interfaces more - especially around system boundaries. Interfaces are well-suited for representing pure data, especially the type transferred over the wire, e.g. through a REST API, WebSockets, or from some connected data store.

When I was converting core components of a Node.js codebase from CoffeeScript to TypeScript, I stumbled upon a big challenge: how do know if the types we've created accurately reflect what's present at runtime? The entire point of using TypeScript is to avoid unexpected behavior at runtime; unvalidated inputs that come from Non-TypeScript World potentially create a cascade of unexpected behavior. On the server, this creates bugs and, in the worst cases, critical vulnerabilities.

Runtime validation and deserialization

Developers who have spent much time writing code in compiled languages like Java, C#, or Go will be familiar with the concept of deserializing common data formats, such as JSON, into their runtime equivalents. Typically, this is implemented using reflection; the data "becomes" a class or a struct.

In dynamic languages, like JavaScript, the approach is usually different. We use libraries like joi to validate data, or json-schema to define a source of truth for the data that we're passing from service to service. Type information is encoded within some runtime representation.

For projects primarily using TypeScript, we're somewhere in the middle. We have compile-time types, but the metadata necessary to reference those types aren't found in compiled JavaScript. reflect-metadata does an admirable job of making some metadata available at runtime, but it does so with decorators, limiting its use to classes, methods, etc.

The manual solution

Before getting into the tserial solution, let's think through our ideal handwritten solution. Let's suppose we have an interface IPerson:

interface IPerson {
  firstName: string;
  lastName: string;
}

A basic assertion would make use of type guard function:

const isPerson = (person: unknown): person is IPerson => {
  if (!person) { return false; }
  if (typeof person !== 'object') { return false; }
  if (typeof person.firstName !== 'string') { return false; }
  if (typeof person.lastName !== 'string') { return false; }
  return true;
}

if (isPerson(untrustedInput)) {
  // untrustedInput is type IPerson now!
}

This is a pretty good start, but if we assume that IPerson is expected to define some POST body parameter, we'd like to surface a better error message than, "Your request body wasn't IPerson. It's up to you to figure out how." Let's create a function that offers a higher resolution result.

export type Result<T> = ISuccessResult<T> | IErrorResult;

export interface ISuccessResult<T> {
  success: true;
  value: T;
}

export interface IErrorResult {
  success: false;
  message: string;
}

const assertPerson = (person: unknown): Result<IPerson> => {
  if (!person) { 
    return { success: false, message: 'expected person to be defined' }; 
  }
  if (typeof person !== 'object') { 
    return { success: false, message: 'expected person to be an object' }; 
  }
  if (typeof person.firstName !== 'string') { 
    return { success: false, message: 'expected person.firstName to be a string' };
  }
  if (typeof person.lastName !== 'string') {
    return { success: false, message: 'expected person.lastName to be a string' };
  }

  return { success: true, value: person };
};

const result = assertPerson(untrustedInput);
if (result.success) {
  // result.value is IPerson
  console.log(result.value);
} else {
  console.error(result.message);
}

This approach, while certainly more verbose, offers several advantages.

  • Granular error reporting can give consumers actionable information
  • Callers of this assertion utility are forced to handle the deserialization failure
  • This function can, with some small changes, prove that our unknown value is an IPerson. This is a powerful way to create a "living" assertion. If we were to add a property city to IPerson, the compiler would raise an error on the final line of assertPerson. After all, up to that point, we've only proven something like { firstName: string; lastName: string }, which doesn't match { firstName: string; lastName: string; city: string }.

The main disadvantage is that you have to write the code in assertPerson. Couldn't we automate this process?

The tserial solution

tserial uses the TypeScript compiler API to generate assertion statements similar to the example shown above. tserial usage, which we'll describe in more detail later, is simple, transparent, and low overhead. I designed it with a few desirable outcomes in mind. I wasn't sure how many of these goals I'd be able to meet.

  • I just want to write TypeScript. No DSLs or codecs, just plug-and-play with existing type expressions. This needs to work as a drop-in solution for a big, complex codebase.
  • Should be able to be applied incrementally, i.e. without sweeping conversions across many files
  • Support for type aliases, discriminated unions, literal types, intersection types, union types, and (resolved) generics are all must-haves.
  • For use with serializable data only. Aggressively reject attempts to create assertions for non-serializable objects, e.g. classes and functions.
    • I recognize that this can be controversial; it's quite common to use classes as DTOs or models. My stance is that we should be using interfaces or pure type expressions to represent the kind of data transferred across system boundaries. Classes have runtime content (methods, getters/setters) that just don't make sense in this context.
  • Assertions must return granular, structured error data
  • "Living assertions": assertions should fail type checking when they get out of sync with their types.
  • No runtime dependencies, not even on tserial itself
    • This should allow support for the new experimental deno runtime
  • Minimal build-time dependencies (typescript only)
    • Reduce transitive dependencies that, unfortunately, sometimes expose us to security vulnerabilities
  • Minimal configuration. No plugins or custom compilers - it should run against your TypeScript project via a CLI utility or Node.js module with minimal changes.
  • Should offer a high degree of transparency and ease of audit. Reviewing the generated deserializer should be straightforward if necessary. When tserial is updated with bug fixes or enhancements, the diff between the old output and the new output should help developers understand how updates have impacted them.

Installation

The library is distributed via NPM. It should usually be installed as a dev dependency.

Usage

tserial exposes a CLI and a Node.js API; the command is run within the context of a TypeScript project. The output is a single file containing a deserialize function. Any file that needs to deserialize data will import this function.

Rather than attempt to analyze every interface and type alias in your project, tserial is "opt-in". Interfaces that are meant to be validated at runtime should be marked with a serializable JSDoc tag. The name of the tag can be customized.

/**
 * @serializable
 */
export interface MyInterface {
  prop: string;
}
import { deserialize } from './output-file.ts';

const goodData = { prop: '1234' };
const badData = { prop: 1234 };

/**
 * { success: true, value: { prop: '1234' } } - `value` is type `MyInterface`
 */
console.log(deserialize('MyInterface', goodData));

/**
 * { success: false, { ...errorInfo } }
 */
console.log(deserialize('MyInterface', badData));

Build integration

When to run tserial is largely up to the developer. A low-tech option would be to add an NPM script (or a simple Node script) that runs the command and developers will be responsible for running the script when source types are added, removed, or changed. In almost all situations, the TypeScript compiler will raise errors when source types have been modified and the generated code is stale. Regardless, it may be useful to add some CI step that generates a fresh script, compares it to the old one, and throws an error if there isn't an exact match. The output file should be checked in.

Alternative solutions

io-ts

Probably the most prominent and widely used solution available today. io-ts uses codecs to represent types. Runtime objects can be deserialized by running them through the codec. The blog post introducing io-ts does a fantastic job of explaining the approach and showing basic usage.

io-ts has a lot of advantages. It requires no build step, no compiler transformations, it just works.

The only downside is that you kind of need to drink the io-ts kool-aid. TypeScript interfaces will not be your source of truth, the codec will. This is all perfectly type-safe and correct, but if your preference is to model your data with pure TypeScript (i.e. interfaces and other type expressions), you may not be excited about adopting this approach.

Other notable projects

  • ts-runtime extends the TypeScript type system to modify compiler output and, essentially, implements runtime types. While laudable, this approach may give developers pause. The compilation result is fundamentally changed from "vanilla" TypeScript and is done so in a way that is fairly opaque from a developer's point of view.
  • typescript-is uses TypeScript transformations API to create runtime validation logic. You'll need to configure the build to use a special TypeScript compiler since TypeScript doesn't support transform plugins out of the box. Under the hood, typescript-is has a similar approach to tserial. No granular error reporting.
  • ts-auto-guard Generates type guard functions for interfaces. At this time, it doesn't have support for type aliases (e.g. discriminated unions). It also only provides a boolean result rather than granular validation errors.
  • jointz is fairly new on the scene and could be described as a lighter-weight version of io-ts
  • Likely others that I've missed