Skip to main content
HomeTutorialsArtificial Intelligence (AI)

An Introduction to LMQL: The Bridge Between SQL and Large Language Models

Discover everything you need to know about LMQL, short for Language Models Query Language, an innovative programming language for LLMs.
Feb 2024  · 12 min read

Structured Query Language, or SQL (often pronounced sequel) for short, is a declarative programming language used to store, retrieve, manage, and manipulate data within a database management system.

It was developed by IBM researchers Raymond Boyce and Donald Chamberlain in the early 1970s but only became commercially available in 1979 when Relational Software, Inc., which has now rebranded to Oracle, introduced its implementation.

These days, SQL is widely accepted as the standard relational database management system (RDBMS) due to its simplicity and ability to efficiently manage and analyze large data sets makes it indispensable in our increasingly data-driven world.

But the data-driven world is evolving.

Artificial intelligence is gaining popularity fast, and large language models have emerged as extremely powerful tools for various tasks. The only problem is that interacting with these language models occasionally feels like conversing in an alien language. This is where LMQL comes into the fray.

LMQL was developed by the SRI Lab at ETH Zürich and acts as a personal translator between developers and their language models. Namely, LMQL brings the power of SQL to the realm of language models, thus making interactions with them smoother, more efficient, and more fun.

For the remainder of this tutorial, we will discuss:

  • What is LMQL?
  • Why LMQL?
  • How to set up LMQL
  • Practical applications of LMQL
  • LMQL limitations
  • Best practices

What is LMQL?

LMQL, short for Language Models Query Language, is an innovative programming language for Large Language Models (LLMs). Namely, it blends declarative SQL-like elements with an imperative scripting syntax to provide a more structured and intuitive way to extract information or generate responses from LLMs.

To further add, LMQL is a superset of Python, which means it acts as an extension that introduces new features and expands Python’s capabilities. This enables developers to create natural language prompts containing text and code, enhancing queries' flexibility and expressiveness.

According to the documentation, “LMQL offers a novel way of interweaving traditional programming with the ability to call LLMs in your code. It goes beyond traditional templating languages by integrating LLM interaction natively at the level of your program code.”

The programming language was presented by its creators in a research paper titled Prompting is Programming: A Query Lnague for Large Language Models as a solution to enable a phenomenon they called “LMP,” which stands for Language Model Prompting.

For context, large language models have demonstrated exceptional performance across several tasks, such as question answering and code generation. Fundamentally, LLMs are proficient at automatically generating logical sequences based on given inputs using statistical likelihoods.

Utilizing this capability, users can prompt LLMs with language instructions or examples to trigger the execution of various downstream tasks. Advanced prompting techniques even enable interactions between users, the language model, and external tools like calculators.

The challenge is attaining state-of-the-art performance or tailoring LLMs for specific tasks, which typically calls for implementing complex, task-specific programs that may still depend on impromptu interactions.

Language model prompting is an emerging discipline that’s been gaining traction to tackle these problems. LMQL adheres to the principles of LMP, including providing LLMs with an intuitive combination of text prompting and scripting and enabling users to specify constraints over the language model output.


The more recent generation of language models can be easily prompted conceptually with examples or instructions. However, utilizing them to their full potential and staying updated as new models are released necessitates a thorough understanding of their internal workings and vendor-specific libraries and implementations.

For instance, limiting the decoding process to a list of legal words or phrases can be challenging because language models work with tokens. Whether you utilize LLMs locally or via an API, they’re quite expensive because they’re massive networks.

LMQL can reduce the number of language model (LM) invoke calls by taking advantage of preset behavior and the search constraint introduced by constraints.

Another reason for LMQL is that many prompting techniques can require back-and-forth communication between the language model and the user (like we see with chatbots like ChatGPT) or highly specialized interfaces, such as those used to perform arithmetic calculations with external control logic.

Implementing these prompts requires plenty of manual work and interaction with a model's decoding procedures, which restricts the generality of the resulting implementations. Lastly, since an LM can only generate a single (sub-word) token at a time, finishing can need multiple calls.

Existing LMs don’t provide the functionality to constrain output, which is vital if LMs are employed in production. For example, imagine you’re building a sentiment analysis application to mark negative reviews. The program would expect the LLM to respond with something such as “positive,” “negative,” or “neutral.”

However, quite often, the LLM may say something like, “The sentiment for the provided customer review is positive,” which is difficult for your API to process. Thus, constraints are extremely beneficial.

With LMQL, you can control output with terms that are comprehensible to humans rather than the tokens that the LMs use.

Setting Up LMQL

LMQL can be installed locally or used online via the web-based Playground IDE. Note if you would like to use self-hosted models via Transformers or llama.cpp, you must install LMQL locally. Here’s how:

Installation and Environment Setup

Installing LMQL locally is pretty straightforward.

All you have to do is run the following command in a Python >= 3.10 environment:

pip install lmql

If you have intentions to run models on a local GPU, then you must install LMQL in an environment with GPU-enabled installation of PyTorch >= 1.11.

Here’s the command to run if you want to install LMQL with GPU dependencies via pip:

pip install lmql[hf]

Note: installing dependencies in a virtual environment is good practice.

After installation, you’ve got three options to run LMQL programs:

1. Playground
You can launch a local instance of the Playground IDE

lmql playground

This command will launch a browser-based Playground IDE, but if it does not launch automatically, go to http://localhost:3000.

Note this method requires an installation of Node.js.

2. Command-line interface
An alternative to the playground is the command-line tool. This can be used to execute local .lmql files. To use it, simply run the following command:

lmql run

3. Python integration
Since it’s a superset, LMQL can be run directly from within a Python program. All you must do is import the lmql package. All query code must be run via or with the @lmql.query decorator.

When using the local Transformer models in the Playground IDE or the command-line tool, you must first launch an instance of the LMQL Inference API for the corresponding model by executing the lmql serve-model command.

Understanding LMQL Syntax

An LMQL program consists of five fundamental parts, each playing a vital role in determining the behavior of a query. These components include:

  • Query
  • Decoder
  • Model
  • Constraints
  • Distribution

Let’s delve deeper into each.


The primary means of communication between the language model and the user is the query block.

Here’s a basic LMQL query:

# Source:
"Say 'this is a test':[RESPONSE]" where len(TOKENS(RESPONSE)) < 25

—-- Model Output—--
Say 'this is a test': RESPONSE This is a test

The query block treats every top-level string as a direct query to the language model. Similar to Python f-strings, these query strings support two special escaped subfields.

Notice the phrase the language model will generate is represented using [varname]. This is also known as a hole - we’ll get to the where clause in a moment.

Retrieval of a variable value from the current scope can also be done using using {varname}.

For example:

# Source:
# review to be analyzed
review = """We had a great stay. Hiking in the mountains was fabulous and the food is really good."""

# use prompt statements to pass information to the model
"Review: {review}"


Various decoding algorithms used to generate text from a language model’s token distribution are supported by LMQL. This enables the decoding algorithm to be specified at the beginning of a query.

There are two ways to specify the decoding algorithm to use:

1. As part of the query - this is where you specify the algorithm and its parameters as part of the query. According to the LMQL documentation, “This can be particularly useful if your choice of decoder is relevant to the concrete program you are writing.”Here’s how it looks in code:

# Source:
# use beam search with beam width 2 for
# the entire program

# uses beam search to generate RESPONSE 
"This is a query with a specified decoder: [RESPONSE]"

2. Externally - this is where the decoding algorithm and parameters are specified externally, i.e., separately from the actual program. Note this is only possible when LMQL is being used from a Python context

# Source: lmql

@lmql.query(model="openai/text-davinci-003", decoder="sample", temperature=1.8)
def tell_a_joke():
    """A list good dad joke. A indicates the punchline:
    A:[PUNCHLINE]""" where STOPS_AT(JOKE, "?") and  STOPS_AT(PUNCHLINE, "\n")

tell_a_joke() # uses the decoder specified in @lmql.query(...)
tell_a_joke(decoder="beam", n=2) # uses a beam search decoder with n=2


LMQL is defined as a “high-level, front-end language for text generation,” by the developers. This means it’s not specific to a text generation model. Instead, various text generation models are supported on the backend, such as OpenAI model, llama.cpp and HuggingFace Transformers.

Loading models is quite straightforward. You can use the lmql.model(...) function, which produces an lmql.LMM object.

Here’s how it looks:

# Source:
lmql.model("openai/gpt-3.5-turbo-instruct") # OpenAI API model
lmql.model("random", seed=123) # randomly sampling model
lmql.model("llama.cpp:<YOUR_WEIGHTS>.gguf") # llama.cpp model

lmql.model("local:gpt2") # load a `transformers` model in-process
lmql.model("local:gpt2", cuda=True, load_in_4bit=True) # load a `transformers` model in process with additional arguments
lmql.model("gpt2") # access a `transformers` model hosted via `lmql serve-model`

Once the lmql.LLM object is created, you can pass the model to the query program using one of two methods:

1. Specifying the model externally

# Source:
import lmql

# uses 'chatgpt' by default
def tell_a_joke():
    """A great good dad joke. A indicates the punchline
    A:[PUNCHLINE]""" where STOPS_AT(JOKE, "?") and \
                           STOPS_AT(PUNCHLINE, "\n")

tell_a_joke() # uses chatgpt
tell_a_joke(model=lmql.model("openai/text-davinci-003")) # uses text-davinci-003

2. Using a query with the from clause

# Source:
    "This is a query with a specified 'from'-clause: [RESPONSE]"


One of the main appeals of LMQL is the constraints component. Notably, LMQL enables users to specify constraints on the language model output. This helps with scripted prompting by guaranteeing that the model output ends at the intended point and also gives users control over the model during decoding.

The supported constraints include:

  • Stopping Phrases
  • Number Type Constraints
  • Choice From Set
  • Character Length
  • Token Length
  • Regex Constraints Preview
  • Combining Constraints
  • Custom

Check out the Constraints page in the LMQL documentation to learn more about them.


The distribution instruction is a key component in LMQL. It provides control over the format and structure of the output by defining how the generated results are distributed and presented.

Here’s an example of it in action:

# Source:
    # review to be analyzed
    review = """We had a great stay. Hiking in the mountains was fabulous and the food is really good."""

    # use prompt statements to pass information to the model
    "Review: {review}"
    "Q: What is the underlying sentiment of this review and why?"
    # template variables like [ANALYSIS] are used to generate text
    "A:[ANALYSIS]" where not "\n" in ANALYSIS

    # use constrained variable to produce a classification
    "Based on this, the overall sentiment of the message can be considered to be[CLS]"
   CLS in [" positive", " neutral", " negative"]

                  ----- Model Output ------
Review: We had a great stay. Hiking in the mountains was fabulous and the food is really good.
Q: What is the underlying sentiment of this review and why?
A: ANALYSIS The underlying sentiment of this review is positive because the reviewer had a great stay, enjoyed the hiking and found the food to be good.
Based on this, the overall sentiment of the message can be considered to be CLS

LMQL Limitations and Community Support

As with all technologies, there are a few limitations with LMQL. For example:

  • The LMQL library has not been around for a long time and is not super popular. Consequently, the community is quite small, and there are only a few external resources available to help you when you get stuck.
  • The documentation for the library isn’t as detailed as it could be.
  • Limitations with OpenAI API mean It’s not possible to fully utilize LMQL with ChatGPT since the most popular and best-performing models are inaccessible.

While these may be blockers for users considering using LMQL, it’s important to note the library is still quite new and is still a work in progress – these limitations may be resolved in later versions.


LMQL is a SQL-like programming language that is also a superset of Python. It simplifies the process of extracting information or generating responses from LLMs by blending its declarative elements with an imperative scripting syntax. Developers can leverage LMQL to effectively control the generation of text, thus making it an extremely valuable tool for various applications in the tech industry.

To continue your learning, check out:

Photo of Kurtis Pykes
Kurtis Pykes

Start Your AI Journey Today!


Generative AI Concepts

2 hr
Discover how to begin responsibly leveraging generative AI. Learn how generative AI models are developed and how they will impact society moving forward.
See DetailsRight Arrow
Start Course
See MoreRight Arrow


What is an LLM? A Guide on Large Language Models and How They Work

Read this article to discover the basics of large language models, the key technology that is powering the current AI revolution
Javier Canales Luna's photo

Javier Canales Luna

12 min


Introduction to Meta AI’s LLaMA

LLaMA, a revolutionary open-source framework, aims to make large language model research more accessible.
Abid Ali Awan's photo

Abid Ali Awan

8 min


Understanding and Mitigating Bias in Large Language Models (LLMs)

Dive into a comprehensive walk-through on understanding bias in LLMs, the impact it causes, and how to mitigate it to ensure trust and fairness.
Nisha Arya Ahmed's photo

Nisha Arya Ahmed

12 min


How to Build LLM Applications with LangChain Tutorial

Explore the untapped potential of Large Language Models with LangChain, an open-source Python framework for building advanced AI applications.
Moez Ali's photo

Moez Ali

12 min


Quantization for Large Language Models (LLMs): Reduce AI Model Sizes Efficiently

A Comprehensive Guide to Reducing Model Sizes
Andrea Valenzuela's photo

Andrea Valenzuela

12 min


Introduction to Large Language Models with GPT & LangChain

Learn the fundamentals of working with large language models and build a bot that analyzes data.
Richie Cotton's photo

Richie Cotton

See MoreSee More