Designing OpenAPI Schemas

27 Apr 2024 in Tech

I’ve written a lot of OpenAPI schemas over the last couple of years, and have developed a pattern that helps with maintenance. You have a minimum of two logical schemas for every entity in your system, Foo and FooRequest. Foo is a union of FooRequest and any computed fields from the system.

Let’s look as a concrete example, a Pet in an adoption shelter. A pet has two user set fields, name and type, and one computed field, created_at. You can’t set the created_at field when creating or updating a pet, which means we have two schemas:

  • PetRequest: name, type
  • Pet: name, type, created_at

The following example isn't the best way to accomplish the GET/POST split! It's shown as it's a common pattern used in many specifications, but keep reading for a better solution.

When you model this with JSON schema, it looks like the following:

yaml
components:
schemas:
PetRequest:
type: object
properties:
name:
type: string
type:
type: string
Pet:
allOf:
- $ref: "#/components/schemas/PetRequest"
- type: object
properties:
created_at:
type: string

Any updates to the PetRequest object will automatically be reflected in the Pet object. However, this is such a common pattern that OpenAPI has keywords to help built in.

Simplify with readOnly: true

If you can split your schema in to "user provided" and "computed" values cleanly, you only need a single schema thanks to the readOnly keyword.

yaml
components:
schemas:
Pet:
type: object
properties:
name:
type: string
type:
type: string
created_at:
type: string
readOnly: true

The readOnly keyword causes the schema to be split in to two virtual schemas - one for GET and one for POST/PATCH/PUT. Any OpenAPI renderer (I tested with Redoc) will remove created_at from the POST schema.

Complex APIs

In an ideal world, you wouldn’t need more than one single Pet schema. However, we live in the real world and sometimes there are additional requirements. Here are some examples:

  • Pets have an adopted_at time, which can only be set when updating a pet, not when creating
  • Pets have a total_steps field which is computed from an external source that is not cached and is too expensive to show when listing multiple pets

These requirements mean that we have to split Pet in to two, Pet and CreatePetRequest. We also need a MinimalPet representation for the GET /pets endpoint. This results in three distinct schemas:

  • CreatePetRequest: name, type
  • Pet: name, type, adopted_at (readOnly), created_at (readOnly)
  • PetWithDetails: name, type, adopted_at (readOnly), total_steps (readOnly), created_at (readOnly)

These schemas can be composed to build the entities we need at runtime. CreatePetRequest is the base as it contains the minimum available information. Pet builds on this by adding adopted_at and created_at. Finally, PetWithDetails adds computed fields that are expensive to calculate such as total_steps.

CreatePetRequest -> Pet -> PetWithDetails.

Expressed using JSON schema, it looks like this:

yaml
components:
schemas:
CreatePetRequest:
type: object
properties:
name:
type: string
type:
type: string
Pet:
allOf:
- $ref: "#/components/schemas/CreatePetRequest"
- type: object
properties:
adopted_at:
type: string
created_at:
type: string
readOnly: true
PetWithDetails:
allOf:
- $ref: "#/components/schemas/Pet"
- type: object
properties:
total_steps:
type: string
readOnly: true

This gets expanded to the following schemas (courtesy of Swagger UI):

Swagger UI rendering of schemas

Here's a complete OpenAPI specification that uses these schemas.

An ideal world

Although the model works for APIs that have complex requirements, I consider needing separate models for create and update requests a design flaw. In this example API, you could make adopted_at a nullable field and use the same schema for both create and update requests.

I also consider needing a minimal representation of an entity a design flaw.There are cases where there is a real requirement that needs these schemas (e.g. if total_steps must always be accurate and can’t be cached) but these cases are rare.

If you find yourself using more than one schema, take a moment to reconsider your API design and see how you can simplify it.