Generate SQLite ORM code from DB metadata using Lisp macros

Posted by Shane Creighton-Young, Software Engineer on January 17, 2022 · 13 mins read

Hello. Today I share with you a cool macro. It’s neat because it generates code and performs static code checks based on database metadata. Most language’s compilers or type systems are not capable of this, but the Racket compiler is!

The 20ish line macro automatically generates functions for working with single-table SELECT * ... queries. It generates one row accessor function per column in the table. It can be used like this:

#lang racket/base
(require db)

;; Query some data.
(define c (sqlite3-connect #:database "db.sqlite"))
(define todo (query-row c "SELECT * FROM todo LIMIT 1"))

;; Automatically define todo-id, todo-description, etc.
(define-table-accessors "db.sqlite" "todo")

(displayln (todo-id todo))
;; Prints 1
(displayln (todo-description todo))
;; Prints "Take out the trash"
(displayln (todo-dnoe todo))
;; Compile error: Identifier todo-dnoe doesn't exist
(displayln (todo-done todo))
;; Prints 0 (false). Oops!

No external codegen program, no Rails/Django style in-code table schema, and we get static code checks for free. Woohoo!

Let’s learn how this thing works, shall we?

Although we don’t use Lisp at Perpetua today, our team always strives to use and create incredible tools for our customers and in our work. If you want to help build the best e-commerce ad management tools, apply today!

Table of contents

  1. The macro
    1. Problem statement
    2. Macro datatypes
    3. Macro APIs
    4. Line by line commentary
  2. Discussion (ORMs, Typescript, other feature ideas)

The macro

#lang racket/base
(require
 (for-syntax
  racket/base
  racket/syntax
  srfi/26
  db))

;; (define-table-accessors "<db-path>" "<table-name>")
(define-syntax (define-table-accessors stx)
  (syntax-case stx ()
    [(_ db-path table)
     (begin
       ;; Query for column names
       (define table-name (syntax->datum #'table))
       (define c
         (sqlite3-connect #:database (syntax->datum #'db-path)))
       (define query
         (format "select name from pragma_table_info(\"~a\");"
                 table-name))
       (define column-names (map (cut vector-ref <> 0)
                                 (query-rows c query)))
       (define num-columns (length column-names))

       ;; Generate 1 function per column
       (with-syntax
           ([(idx ...)
             (datum->syntax stx (build-list num-columns values))]
            [(read-column ...)
             (datum->syntax stx
                            (map (cut format-symbol "~a-~a"
                            	      table-name <>)
                                 column-names))])
         #'(begin
             (define (read-column row)
               (vector-ref row idx)) ...)))]))

Before explaining this macro line-by-line, let’s take a detour and look at the details of the challenge at hand and the Racket APIs used in the solution.

Problem statement

We are using the Racket db module. When a SELECT * query is made, the result is usually one or more vectors. Vector is the Racket/Scheme lingo for fixed size array. To access elements of the array, we use the vector-ref function.

;; JS: arr[i]
(vector-ref arr i)

The columns in a table have a default order. The vectors we get from SELECT * contain the column data for that row in the default order. For the todo table, a row vector could look like this:

#(1 "Take out the trash" 0)

This expression is a vector containing the id, description, and done flag for a todo row.

(vector-ref todo 0)
;; 1
(vector-ref todo 1)
;; "Take out the trash"
(vector-ref todo 2)
;; 0

We want a macro which would use the DB during compilation to generate functions to make it easier to pull the data out.

(define (todo-id todo)
  (vector-ref todo 0))
(define (todo-description todo)
  (vector-ref todo 1))
(define (todo-done todo)
  (vector-ref todo 2))

Macro datatypes

In Racket macro-land, there are informally 3 types of stuff.

First, there’s normal Racket values. Numbers, strings, functions, symbols, lists, vectors (arrays), hash tables, etc.

"string"
'symbol
123 ;; Number
(list 1 2 3)
(vector 1 2 3)

The second type is the syntax object. Syntax objects are like a serialized version of Racket code, along with optional metadata about the code, such as where it came from (file, line number, etc). The syntax->datum function extracts the serialized code from a syntax object.

;; This function call
(vector-ref todo 0)
;; Would have a syntax-object datum consisting
;; of a list:
(list 'vector-ref 'todo 0)

The last type of stuff is the pattern variable. Although not strictly necessary, pattern variables are useful for unpacking list-like syntax objects.

An example may be illuminating:

(define-syntax (mixup-list stx)
  (syntax-case stx ()
    [(mixup-list arg1 args ...)
     (syntax (list args ... arg1 arg1 args ...))]))

;; Macro call
(mixup-list 1 2 3)
;; Transforms to this:
(list 2 3 1 1 2 3)

Here, arg1 and args are the pattern variables. The ellipses after args lets it capture a list (in this case 2 and 3), while arg1 just grabs the first argument.

Technically, pattern variables and syntax objects are different things. A pattern variable usually has to be converted to a syntax object using the (syntax …) macro before making use of it.

Also, technically mixup-list within macro definition is a pattern variable too. This isn’t often used in code gen since it’s nearly always just the name of the macro. However, writing it there helps to show you the similarity between the list of pattern variables and the actual macro expression.

To avoid unused variable warnings, _ can be used for the first pattern variable instead of the macro’s name.

(define-syntax (mixup-list stx)
  (syntax-case stx ()
    [(_ arg1 args ...)
     (syntax (list args ... arg1 arg1 args ...))]))

Macro APIs

Many of the APIs used to create macros are actually macros themselves. Let’s take a look.

The (syntax XYZ) stuff is a macro. It’s the syntax macro. It recursively converts its arguments into a syntax object. The #' prefix can be used as a shorthand; (syntax XYZ) and #'XYZ mean the exact same thing.

The syntax macro is able to recognize pattern variables and substitute the corresponding syntax. It also recognizes the ellipses to expand patterns which are bound to lists. The ellipses doesn’t have to come right after a pattern variable (like (list args ... before). It can also come after something which contains a pattern variable—the whole thing will get repeated with the pattern spliced in.

(define-syntax (list-the-whole-thing stx)
  (syntax-case stx ()
    [(_ arg ...)
     #'(list (list (+ arg 5)) ...)]))

(list-the-whole-thing 1 2 3)
;; Expands to:
(list (list (+ 1 5)) (list (+ 2 5)) (list (+ 3 5)))

When define-syntax (which is also a macro) is used the way we have been using it, it defines a compiler-level function which takes the macro expression’s code as a syntax object. When you return new syntax from it, that’s what the original syntax gets replaced with.

(define-syntax (silly-macro stx)
  (displayln (syntax->datum stx))
  #'42)

This silly macro prints itself as it gets compiled, and then replaces itself with 42. What actually gets printed during compilation depends on what arguments it is given. The macro will accepts any number of arguments; it just throws them all away and becomes 42.

(silly-macro 1 2 3) 
;; Prints (silly-macro 1 2 3) during compilation.

(silly-macro) 
;; Prints (silly-macro) during compilation.

silly-macro 
;; Prints silly-macro during compilation.
;; It wasn't even called this time!

(silly-macro (silly-macro 3)) 
;; Prints (silly-macro (silly-macro 3)) during compilation.

;; (silly-macro 3) is NOT printed! This macro is so silly, the
;; arguments you pass it don't even get compiled.

The syntax-case macro accepts multiple pattern “clauses”. It finds the first pattern which matches the macro call, and returns the syntax object for that pattern. If none match, it’s a syntax error.

(define-syntax (one-two-three stx)
  (syntax-case stx ()
    [(_ one) #'(displayln "One arg")]
    [(_ one two) #'(displayln "Two args")]
    [(_ one two three) #'(displayln "Three args")]))

(one-two-three 1 2)
;; Transforms to (displayln "Two args") at runtime
(one-two-three 1 2 3 4)
;; Syntax error

(The extra () in syntax-case needs to be there. It has a purpose but we aren’t making use of it.)

Lastly, the with-syntax macro is very similar to syntax-case in how it sets up pattern variables. However, instead of picking the first pattern that matches, it binds all the patterns, and then evaluates the body expression using them all.

(define-syntax (pair-reverse stx)
  (syntax-case stx ()
    [(_ arg ...)
     (with-syntax ([(forward ...)
                    (syntax (arg ...))]
                   [(backward ...)
                    (datum->syntax stx
                      (reverse (syntax->datum (syntax (arg ...)))))])
       #'(list (list forward backward) ...))]))

(pair-reverse 1 2 3 4)
;; Transforms to (list (list 1 4) (list 2 3) (list 3 2) (list 4 1))

Line by line commentary

Now that we have most of the prerequisites in place, let’s take a closer look at this 20ish line DB querying compiler extension.

#lang racket/base
(require
 ;; Imports code at the compile phase
 (for-syntax
  racket/base
  racket/syntax
  ;; cut
  srfi/26
  ;; sqlite3-connect
  db))

;; New macro: define-table-accessors
(define-syntax (define-table-accessors stx)

  ;; Pattern match on the shape of the arguments
  (syntax-case stx ()

    ;; This pattern clause accepts 2 args
    ;; db-path and table pattern variables
    [(_ db-path table)

     ;; Begin starts a new block
     (begin

       ;; Extract the table name string from
       ;; the pattern variable.
       ;; Same as (syntax->datum (syntax table)).
       (define table-name (syntax->datum #'table))

       ;; Connect to SQLite using extracted db-path.
       (define c
         (sqlite3-connect #:database (syntax->datum #'db-path)))

       ;; Construct metadata query and extract list
       ;; of column names.
       (define query
         (format "select name from pragma_table_info(\"~a\");"
                 table-name))
       
       ;; Note: the (cut f x <> z) is equivalent to a
       ;; JS arrow function y => f(x,y,z).
       (define column-names (map (cut vector-ref <> 0)
                                 (query-rows c query)))
       (define num-columns (length column-names))
       
       (with-syntax
            
            ;; This pattern is the indices into the row
            ;; vector: (list 0 1 2 3 ...)
           ([(idx ...)
             (datum->syntax stx (build-list num-columns values))]
            
            ;; This pattern is the list of names for
            ;; the new functions!
            [(read-column ...)
             (datum->syntax stx
               (map (cut format-symbol "~a-~a"
               	         table-name <>)
                 column-names))])

         ;; Create a sequence of functions
         #'(begin
             (define (read-column row)
               (vector-ref row idx)) ...)))]))

DrRacket Macro Stepper - Start DrRacket Macro Stepper - End

Discussion

Hat tip to real libraries which generate code from the database: SQLDelight from Cash App; Racquel, a Racket ORM which includes a much more advanced OOP version of what we have seen here.

Code generated from DB metadata makes sense compared to Rails or Django. There are clear advantages when a database is shared between codebases. No need to rewrite ORM schemas in a new language, or create dependencies to enable programmatic querying; we just get more from the database dependency we already have. Not to mention playing around with CSVs of unknown provenance.

Considering statically typed Racket exists and type info is in SQLite metadata tables, it is possible to add type hints to the generated functions. Then, not only do we get “dynamic language” static tooling like go-to-definition and unbound-identifier-error, but we get actual mypy/Typescript style type checking too.

And since it’s Racket, why not generate runtime validation functions based on DB types as well? That would be nice. Wouldn’t it be great if the Typescript compiler let us extend it to generate validation code from interface declarations? yeah..oh well.

With SQLite 3.37.0 and above, it is possible to use the pragma_table_list() table to generate bindings for every table using just a reference to a database, not needing to specify any tables:

(define-table-accessors* "db.sqlite")
;; Defines functions for each table
;; using pragma_table_list
;;   and pragma_table_info

Of course, if we don’t like the <tablename>-<columnname> naming system, we could add a prefix argument:

(define-table-accessors "db.sqlite" "todo" #:prefix t:)
(displayln (t:id todo))

The real heavy hitter would be this: statically checked variable bindings for ad hoc SELECT queries. The macro above only works with SELECT * queries for one table. It does not work with not ad hoc SELECT queries. In nearly every other language, when we have to run an ad hoc query, we have to expect to pull a dynamically typed value out of a dictionary using a string, if we are lucky. If we aren’t lucky, we might have to use resort to using arr[0] or arr[1] to look at query results. The horror! 😉

In Racket, using only a thin layer on top of the sql package, we could support ad hoc queries. Heck, we could put variable declarations right in the automatically-executed query.

(select [t.id tid]
        [t.description tdesc]
        [a.name assignee]
	#:from (as todo t)
               (inner-join (as people p)
               		   (on (= p.id t.assignee_id)))
        #:where (and (= t.done 0)
                     (= t.project_id ,project-id)))

(displayln tid)
(displayln assigne) ;; syntax error, typo

That’s all for today. Happy hacking!

Check out our open positions