We are Codebeez. We specialise in everything Python and help our clients build custom scalable, robust and maintainable solutions.
The scenario
Image you have a database with a large number of schemas, and each of these schemas has a large number of tables. One day your boss comes along and asks you to create a CRUD API for the entire database, what do you do? You don't want to write all these routes by hand, there is a lot of overlap after all.
This was the exact premise of the latest (public) Hackathon that we organised at our office. How can we generate an "as complete as possible" API given only as PostgreSQL database. This is a write up of the different solutions the attendees came up with.
The setup
For the hackathon we've used the 2014 Microsoft AdventureWorks dataset which contains 68 tables and 20 views. We felt this was a good representation of a realistic database that you might want to expose via an API for consumption internally or externally. You can find a convenient visual representation of the tables here. Everything ran in Docker and we had access to all files used to create and populate the database.
We were also encouraged to look beyond what we were comfortable with and try methods such as AI code completion or generation. More on that later.
Different approaches
The different approaches can be roughly split up in two main categories:
- Generate an API automatically
- Generate an API automagically
Let's look at them separately!
Automatically
When talking about automatically generating an API based on the PostgreSQL database in this context, I'm referring to using a tool or library which extracts (meta)data from the database and uses this to setup an API.
During the hackathon, three distinct tools were uses by different people:
FastAPI Code Generator
With us mostly being a Python shop, this was one of the more obvious solutions. This library takes in one or more OpenAPI
files and generates both Pydantic
models and a set of matching routes. The library is still in an experimental phase but performances well as a basis for further development.
Let's take a look at an example from their docs to get a better understanding of how it works. First, we start with an OpenAPI
input file that looks like this (shortened for brevity, check here for the full file)
openapi: "3.0.0"
info:
version: 1.0.0
title: Swagger Petstore
license:
name: MIT
servers:
- url: http://petstore.swagger.io/v1
paths:
/pets:
get:
summary: List all pets
operationId: listPets
tags:
- pets
parameters:
- name: limit
in: query
description: How many items to return at one time (max 100)
required: false
schema:
type: integer
format: int32
responses:
'200':
description: A paged array of pets
headers:
x-next:
description: A link to the next page of responses
schema:
type: string
content:
application/json:
schema:
$ref: "#/components/schemas/Pets"
%% shortened for brevity %%
post:
summary: Create a pet
operationId: createPets
tags:
- pets
responses:
'201':
description: Null response
%% shortened for brevity %%
/pets/{petId}:
get:
summary: Info for a specific pet
operationId: showPetById
tags:
- pets
parameters:
- name: petId
in: path
required: true
description: The id of the pet to retrieve
schema:
type: string
responses:
'200':
description: Expected response to a valid request
content:
application/json:
schema:
$ref: "#/components/schemas/Pets"
%% shortened for brevity %%
components:
schemas:
Pet:
required:
- id
- name
properties:
id:
type: integer
format: int64
name:
type: string
tag:
type: string
Pets:
type: array
description: list of pet
items:
$ref: "#/components/schemas/Pet"
Error:
required:
- code
- message
properties:
code:
type: integer
format: int32
message:
type: string
Now, when we feed this file as input for the library, it automatically generates the following files.
First, an app/main.py
files containing the actual routes that can be called. As you can see it only generates a valid skeleton, it's up to us to implement the business logic of fetching and storing data.
# generated by fastapi-codegen:
# filename: api.yaml
# timestamp: 2020-06-14T10:45:22+00:00
from __future__ import annotations
from typing import Optional
from fastapi import FastAPI, Query
from .models import Pets
app = FastAPI(version="1.0.0", title="Swagger Petstore", license="{'name': 'MIT'}",)
@app.get('/pets', response_model=Pets)
def list_pets(limit: Optional[int] = None) -> Pets:
"""
List all pets
"""
pass
@app.post('/pets', response_model=None)
def create_pets() -> None:
"""
Create a pet
"""
pass
@app.get('/pets/{pet_id}', response_model=Pets)
def show_pet_by_id(pet_id: str = Query(..., alias='petId')) -> Pets:
"""
Info for a specific pet
"""
pass
The corresponding Pydantic
models are also generated in the app/models.py
file:
# generated by datamodel-codegen:
# filename: api.yaml
# timestamp: 2020-06-14T10:45:22+00:00
from typing import List, Optional
from pydantic import BaseModel, Field
class Pet(BaseModel):
id: int
name: str
tag: Optional[str] = None
class Pets(BaseModel):
__root__: List[Pet] = Field(..., description='list of pet')
class Error(BaseModel):
code: int
message: str
Looking at the solutions, we get to the following pros and cons:
Pros
- Automatic generation of all specified routes.
- Automatic generation of all corresponding Pydantic
models.
- A great amount of flexibility in business logic.
- Based on OpenAPI
files, which are easy to understand by developers and non-developers.
Cons
- You need to generate an OpenAPI
file yourself.
- No business logic is generated, only boilerplate.
- Only works for FastAPI
.
Now, the downside regarding the need for an OpenAPI
file was cleverly solved during the hackathon by giving the provided create table
SQL statements to ChatGPT 3.5 and asking it to translate them into valid OpenAPI
specs. A similar approach was taken for the business logic, asking ChatGPT 3.5 to generate the CRUD logic given the FastAPI
skeleton that was generated by the library.
A good combination of automatically and automagically!
Django Inspectdb
Another similar solution is Django's inspectdb
utility which basically does the same thing as the fastapi-code-generator
but specifically for Django. Let's look at a quick example.
Let's say we have database table called Pets
that looks something like this:
Name | age | color | species |
---|---|---|---|
Fido | 3 | gold | dog |
Bud | 6 | brown | dog |
Max | 2 | red | cat |
and we give it as an input to the inspectdb
utility. The output would look like:
# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
# * Rearrange models' order
# * Make sure each model has one field with primary_key=True
# * Make sure each ForeignKey and OneToOneField has `on_delete` set to the desired behavior
# * Remove `managed = False` lines if you wish to allow Django to create, modify, and delete the table
# Feel free to rename the models, but don't rename db_table values or field names.
from django.db import models
class Pets(models.Model):
index = models.BigIntegerField(blank=True, null=True)
name = models.CharField(blank=True, null=True)
age = models.BigIntegerField(blank=True, null=True)
color = models.CharField(blank=True, null=True)
species = models.CharField(blank=True, null=True)
class Meta:
managed = False
db_table = 'pets'
With this output, a some additional steps were taken to come up with the full CRUD API.
- Two more custom management commands had to be written that build generic serialisers (DTO) and viewsets (CRUD) for each model.
- The newly generated viewsets were connected to Django URLs.
- Enabled built-in rest framework schema generator based on viewsets/serialisers and fed this schema into a custom template view that loads a Swagger UI.
And with that the API was done. Now if we look at this solution the pros and cons are similar to the previous one:
Pros - Automatic generation of models in the database. - A great amount of flexibility in business logic. - Based on the database directly, no intermediate files needed.
Cons
- Multiple DB definitions needed in order to support multiple schemas.
- Custom commands needed to go from models to full API.
- Only works for Django
.
Some of these can again be sped up or entirly mitigated by having an LLM generate more boilerplate or business logic.
PostgREST
In this case ChatGPT 4 was used as a glorified search engine where the context of the situation was explained, and asked how to tackle this issue efficiently. It's the only one used that did not generate a Python
based API: PostgREST.
This is a standalone web server that turns your PostgreSQL database directly into a RESTful API. The structural constraints and permissions in the database determine the API endpoints and operations.
So in other words, it looks at all the tables in the database, the table constraints and types and immediately spins up a server with the CRUD endpoints!
Setting it up was quite because we were using Docker
during the hackathon. Simply adding it to the docker-compose.yml
file and pointing it to the PostgreSQL
database was enough to get up an running:
server:
image: postgrest/postgrest
ports:
- "3000:3000"
links:
- db:db
environment:
PGRST_DB_URI: postgres://app_user:secret@db:5432/postgres
PGRST_DB_SCHEMA: public
PGRST_DB_ANON_ROLE: app_user
PGRST_SERVER_PROXY_URI: "http://127.0.0.1:3000"
depends_on:
- db
After this we create a new rol that PosttGREST
can use to interact with the database, this is also where you set restrictions on your API:
CREATE ROLE web_anon NOLOGIN;
GRANT USAGE ON SCHEMA public TO web_anon;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE ON tables TO web_anon;
After this we start using the API!
curl -X GET http://localhost:3000/pets
[
{
"id": 1,
"name": "Fido"
...
},
{
"id": 2,
"name": "Max"
...
}
]
Pros - Easy setup. - Completely based on the tables, constraints and types of the database. - No additional work required.
Cons - Not flexible in terms of business logic.
If you really only need a CRUD API this was definitely the fastest and easiest way to do it. That being said, it is also the least flexible. If you need to transform data before saving or after retrieving from the database, this is not the tool for you.
Automagically
When we talk about automatically we mean tools that are sufficiently advanced that they appear magical. Think AI's or one-click-solutions. Now, some of the attendees took the AI help a step further and tried to have an LLM generate the whole API. The process came down to feeding data describing the tables and into an LLM and asking it to generate what we need.
If we look at the steps:
-
Retrieve the metadata from the database using a SQL query
-
Add the logic needed to connect to the database, since only we know the connection details
import psycopg2
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel# Define your PostgreSQL database connection parameters
db_params = {
"dbname": "Adventureworks",
"user": "postgres",
"password": "postgres",
"host": "db", # e.g., 'localhost' or your database server's IP address
"port": "5432", # default is 5432
}# Connect to the PostgreSQL database
try:
conn = psycopg2.connect(**db_params)
except psycopg2.Error as e:
print("Error connecting to the database:", e)
exit(1)
- Pasting the output of the SQL query in VSCode and ask
Copilot
to generate the necessary code:
"""
Metadata for table 'humanresources.shift':
Column Name: shiftid, Data Type: integer, Max Length: None, Numeric Precision: 32, Default Value: nextval('humanresources.shift_shiftid_seq'::regclass), Is Nullable: NO
Constraint Name: PK_EmployeeDepartmentHistory_BusinessEntityID_StartDate_Departm
Column Name: shiftid, Data Type: integer, Max Length: None, Numeric Precision: 32, Default Value: nextval('humanresources.shift_shiftid_seq'::regclass), Is Nullable: NO
Constraint Name: PK_Shift_ShiftID
Column Name: shiftid, Data Type: integer, Max Length: None, Numeric Precision: 32, Default Value: nextval('humanresources.shift_shiftid_seq'::regclass), Is Nullable: NO
Constraint Name: FK_EmployeeDepartmentHistory_Shift_ShiftID
Column Name: name, Data Type: character varying, Max Length: 50, Numeric Precision: None, Default Value: None, Is Nullable: NO
Column Name: starttime, Data Type: time without time zone, Max Length: None, Numeric Precision: None, Default Value: None, Is Nullable: NO
Column Name: endtime, Data Type: time without time zone, Max Length: None, Numeric Precision: None, Default Value: None, Is Nullable: NO
Column Name: modifieddate, Data Type: timestamp without time zone, Max Length: None, Numeric Precision: None, Default Value: now(), Is Nullable: NO
"""
# please create a fastapi CRUD application with a pydantic model for this schema
- Repeat for all 64 tables
Now, as you as can image, this method had its pros and cons.
Pros
- No tools needed besides Copilot
.
- No need to write the (boilerplate)code.
- Works completly within you IDE.
Cons
- Doesn't scale well to a large number of tables.
- Fine tuning what you ask Copilot
and how you ask it required to get good ouput.
If you have a relatively small database and you don't mind going back and forth with Copilot
a bit, this is a viable solution. However, going beyond is difficult and requires quite a bit of tweaking.
Conclusion
So, what is the best way to do this? As with most things, it depends. There are plenty of tools availble out there that will make the process of generating a CRUD API based on an existing database quite easy. And it looks like the rise of AI will continue to simplify the process even further.
If my boss would actually ask me to do this, and the database size isn't trivial, I would opt for a combination of automatically and automagically such as the fastapi-code-generator
. Having an AI generate predictable boilerplate and a dedicated tool for the necessary code seems like the best solution with the most flexibility, at least at the time of writing.
Who know what the future will bring!
Do you have any questions or are you interested in knowing how Codebeez can help you or your business? Don't hesitate to reach out!