Lutra

General-purpose query language

For the past year I've been working on my own query language that I now call Lutra. It's a continuation of my work on PRQL, with more focus on language design and correctness.

In its current state, Lutra is not ready for production use, but it is in the form where it can be tried out, critisized, and most importantly, improved.

For the most part, I've been building it because it is fun and because I want to see what does it take to construct a proper programming language. Then there is an actual need for something like this in the world of data engineering and databases.

Currently, the project is working, but it is easy to find broken things. Here is quick list of things it can do:

query PostgreSQL, selecting from and inserting into tables,
run on a local interpreter that can read and write Apache Parquet files,
work with deeply nested data,
express sum type as enums,
define arbitrary functions with generic type paramaters,
infer types, including inference of paramaters of lambda functions,
there is a code formatter (lutra fmt),
there is a language server (lutra language-server),
there are bindings for calling it from Python and Rust,
there are grammars for TreeSitter, TextMate and SublimeText.

Now, what does it do? Well, it can query databases. Currently PostgreSQL, but it could be adapted to query others.

# define the type of our table
type Movie: {id: int32, title: text}

# query the table
func main(): [Movie] -> std::sql::from("movies")

The above is a very verbose way of writing SELECT id, title FROM movies.

It starts to shine when queries get more complicated:

import std::(sql, map, filter)

type Movie: {id: int32, title: text}
type Person: {id: int32, name: text}

func main() -> (
  sql::from("movies"): [Movie]
  | map(func (m) -> {
    title = m.title,
    director = (
      sql::from("director"): [Person]
      | filter(func (p) -> p.id == m.director_id)
    )
  })
)

Here we essentially peform a JOIN, adding a director to each of the movies. The result would be something like this:

[
  {
    title = "Parasite",
    director = {
      id = 5,
      name = "Bong Joon Ho",
    },
  },
  {
    title = "The Prestige",
    director = {
      id = 10,
      name = "Christopher Nolan",
    },
  },
]

Something to notice is that director is a nested sub-object. In generated SQL, we would just have 3 columns, but that is a under-the-hood representation of the data nested in such way.

Then, there is Python code generator, which is able to generate this:

class MainResult:
    title: str
    director: Person

class Person:
    id: int
    name: str

And we can call all that like this:

pg = await lutra_runner_postgres.Runner.connect(
    "postgres://postgres:pass@localhost:5416"
)

movies = await pg.run(l.main(), ())

print(movies)

I'm quite proud of a program I've recently managed to get working which computes compouding interest.

type State: {
  time: int32, debt: float64, last_payment: Payment,
}

# initial state: we have borrowed 50000 units of money.
const initial: State = {
  time = 0, debt = 50000.0, last_payment = std::default()
}

type Payment: {time: int32, amount: float64}

# and then we make the following repayments:
const payments: [Payment] = [
  {3, 10000.0}, # no fmt
  {5, 10000.0},
  {7, 10000.0},
  {9, 11200.0},
  {13, 10373.0},
]

# this is how time affects our standing
func apply_time(state: State, time: int32): State -> {
  time = time,
  debt = (
    state.debt
    * std::math::pow(1.0 + 0.05 / 12.0, time - state.time | std::to_float64)
  ),
  last_payment = state.last_payment,
}

# this is how a payment affects our standing
func apply_payment(state: State, payment: Payment): State -> {
  time = state.time, debt = state.debt - payment.amount, last_payment = payment,
}

# simulation from time 0 to at_time
func simulate_debt(at_time: int32) -> (
  payments
  | std::filter(func (p: Payment) -> p.time <= at_time)
  | std::fold(
    initial,
    func (state: State, payment: Payment) -> (
      state | apply_time(payment.time) | apply_payment(payment)
    ),
  )
  | apply_time(at_time)
)

# let's run it twice
func main() -> [10, 15] | std::map(simulate_debt)

This all compiles to SQL (that contains one gnarly WITH RECURSIVE query) and runs to return this:

[
  {
    time = 10,
    debt = 10245.230201028651,
    last_payment = {
      time = 9,
      amount = 11200.0,
    },
  },
  {
    time = 15,
    debt = 0.8368558536575957,
    last_payment = {
      time = 13,
      amount = 10373.0,
    },
  },
]

I think this is pretty cool, especially because it gives the exact same result on both PostgreSQL and local interpreter.

Aljaž Mur Eržen

This is Lutra, a query language

Lutra