Blog

An Introduction to LMQL: The Bridge Between SQL and Large Language Models

Discover everything you need to know about LMQL, short for Language Models Query Language, an innovative programming language for LLMs.

Updated Feb 2024 · 12 min read

Structured Query Language, or SQL (often pronounced sequel) for short, is a declarative programming language used to store, retrieve, manage, and manipulate data within a database management system.

It was developed by IBM researchers Raymond Boyce and Donald Chamberlain in the early 1970s but only became commercially available in 1979 when Relational Software, Inc., which has now rebranded to Oracle, introduced its implementation.

These days, SQL is widely accepted as the standard relational database management system (RDBMS) due to its simplicity and ability to efficiently manage and analyze large data sets makes it indispensable in our increasingly data-driven world.

But the data-driven world is evolving.

Artificial intelligence is gaining popularity fast, and large language models have emerged as extremely powerful tools for various tasks. The only problem is that interacting with these language models occasionally feels like conversing in an alien language. This is where LMQL comes into the fray.

LMQL was developed by the SRI Lab at ETH Zürich and acts as a personal translator between developers and their language models. Namely, LMQL brings the power of SQL to the realm of language models, thus making interactions with them smoother, more efficient, and more fun.

For the remainder of this tutorial, we will discuss:

What is LMQL?
Why LMQL?
How to set up LMQL
Practical applications of LMQL
LMQL limitations
Best practices

What is LMQL?

LMQL, short for Language Models Query Language, is an innovative programming language for Large Language Models (LLMs). Namely, it blends declarative SQL-like elements with an imperative scripting syntax to provide a more structured and intuitive way to extract information or generate responses from LLMs.

To further add, LMQL is a superset of Python, which means it acts as an extension that introduces new features and expands Python’s capabilities. This enables developers to create natural language prompts containing text and code, enhancing queries' flexibility and expressiveness.

According to the documentation, “LMQL offers a novel way of interweaving traditional programming with the ability to call LLMs in your code. It goes beyond traditional templating languages by integrating LLM interaction natively at the level of your program code.”

The programming language was presented by its creators in a research paper titled Prompting is Programming: A Query Lnague for Large Language Models as a solution to enable a phenomenon they called “LMP,” which stands for Language Model Prompting.

For context, large language models have demonstrated exceptional performance across several tasks, such as question answering and code generation. Fundamentally, LLMs are proficient at automatically generating logical sequences based on given inputs using statistical likelihoods.

Utilizing this capability, users can prompt LLMs with language instructions or examples to trigger the execution of various downstream tasks. Advanced prompting techniques even enable interactions between users, the language model, and external tools like calculators.

The challenge is attaining state-of-the-art performance or tailoring LLMs for specific tasks, which typically calls for implementing complex, task-specific programs that may still depend on impromptu interactions.

Language model prompting is an emerging discipline that’s been gaining traction to tackle these problems. LMQL adheres to the principles of LMP, including providing LLMs with an intuitive combination of text prompting and scripting and enabling users to specify constraints over the language model output.

Why LMQL?

The more recent generation of language models can be easily prompted conceptually with examples or instructions. However, utilizing them to their full potential and staying updated as new models are released necessitates a thorough understanding of their internal workings and vendor-specific libraries and implementations.

For instance, limiting the decoding process to a list of legal words or phrases can be challenging because language models work with tokens. Whether you utilize LLMs locally or via an API, they’re quite expensive because they’re massive networks.

LMQL can reduce the number of language model (LM) invoke calls by taking advantage of preset behavior and the search constraint introduced by constraints.

Another reason for LMQL is that many prompting techniques can require back-and-forth communication between the language model and the user (like we see with chatbots like ChatGPT) or highly specialized interfaces, such as those used to perform arithmetic calculations with external control logic.

Implementing these prompts requires plenty of manual work and interaction with a model's decoding procedures, which restricts the generality of the resulting implementations. Lastly, since an LM can only generate a single (sub-word) token at a time, finishing can need multiple calls.

Existing LMs don’t provide the functionality to constrain output, which is vital if LMs are employed in production. For example, imagine you’re building a sentiment analysis application to mark negative reviews. The program would expect the LLM to respond with something such as “positive,” “negative,” or “neutral.”

However, quite often, the LLM may say something like, “The sentiment for the provided customer review is positive,” which is difficult for your API to process. Thus, constraints are extremely beneficial.

With LMQL, you can control output with terms that are comprehensible to humans rather than the tokens that the LMs use.

Setting Up LMQL

LMQL can be installed locally or used online via the web-based Playground IDE. Note if you would like to use self-hosted models via Transformers or llama.cpp, you must install LMQL locally. Here’s how:

Installation and Environment Setup

Installing LMQL locally is pretty straightforward.

All you have to do is run the following command in a Python >= 3.10 environment:

pip install lmql

If you have intentions to run models on a local GPU, then you must install LMQL in an environment with GPU-enabled installation of PyTorch >= 1.11.

Here’s the command to run if you want to install LMQL with GPU dependencies via pip:

pip install lmql[hf]

Note: installing dependencies in a virtual environment is good practice.

After installation, you’ve got three options to run LMQL programs:

1. Playground
You can launch a local instance of the Playground IDE

lmql playground

This command will launch a browser-based Playground IDE, but if it does not launch automatically, go to http://localhost:3000.

Note this method requires an installation of Node.js.

2. Command-line interface
An alternative to the playground is the command-line tool. This can be used to execute local .lmql files. To use it, simply run the following command:

lmql run

3. Python integration
Since it’s a superset, LMQL can be run directly from within a Python program. All you must do is import the lmql package. All query code must be run via lmql.run or with the @lmql.query decorator.

When using the local Transformer models in the Playground IDE or the command-line tool, you must first launch an instance of the LMQL Inference API for the corresponding model by executing the lmql serve-model command.

Understanding LMQL Syntax

An LMQL program consists of five fundamental parts, each playing a vital role in determining the behavior of a query. These components include:

Query
Decoder
Model
Constraints
Distribution

Let’s delve deeper into each.

Query

The primary means of communication between the language model and the user is the query block.

Here’s a basic LMQL query:

# Source: https://lmql.ai/docs/
"Say 'this is a test':[RESPONSE]" where len(TOKENS(RESPONSE)) < 25

—-- Model Output—--
"""
Say 'this is a test': RESPONSE This is a test
"""

The query block treats every top-level string as a direct query to the language model. Similar to Python f-strings, these query strings support two special escaped subfields.

Notice the phrase the language model will generate is represented using [varname]. This is also known as a hole - we’ll get to the where clause in a moment.

Retrieval of a variable value from the current scope can also be done using using {varname}.

For example:

# Source: https://lmql.ai/docs/language/overview.html
# review to be analyzed
review = """We had a great stay. Hiking in the mountains was fabulous and the food is really good."""

# use prompt statements to pass information to the model
"Review: {review}"

Decoder

Various decoding algorithms used to generate text from a language model’s token distribution are supported by LMQL. This enables the decoding algorithm to be specified at the beginning of a query.

There are two ways to specify the decoding algorithm to use:

1. As part of the query - this is where you specify the algorithm and its parameters as part of the query. According to the LMQL documentation, “This can be particularly useful if your choice of decoder is relevant to the concrete program you are writing.”Here’s how it looks in code:

# Source: https://lmql.ai/docs/language/decoding.html
# use beam search with beam width 2 for
# the entire program
beam(n=2)

# uses beam search to generate RESPONSE 
"This is a query with a specified decoder: [RESPONSE]"

2. Externally - this is where the decoding algorithm and parameters are specified externally, i.e., separately from the actual program. Note this is only possible when LMQL is being used from a Python context

# Source: https://lmql.ai/docs/language/decoding.htmlimport lmql

@lmql.query(model="openai/text-davinci-003", decoder="sample", temperature=1.8)
def tell_a_joke():
    '''lmql
    """A list good dad joke. A indicates the punchline:
    Q:[JOKE]
    A:[PUNCHLINE]""" where STOPS_AT(JOKE, "?") and  STOPS_AT(PUNCHLINE, "\n")
    '''

tell_a_joke() # uses the decoder specified in @lmql.query(...)
tell_a_joke(decoder="beam", n=2) # uses a beam search decoder with n=2

Model

LMQL is defined as a “high-level, front-end language for text generation,” by the developers. This means it’s not specific to a text generation model. Instead, various text generation models are supported on the backend, such as OpenAI model, llama.cpp and HuggingFace Transformers.

Loading models is quite straightforward. You can use the lmql.model(...) function, which produces an lmql.LMM object.

Here’s how it looks:

# Source: https://lmql.ai/docs/models/
lmql.model("openai/gpt-3.5-turbo-instruct") # OpenAI API model
lmql.model("random", seed=123) # randomly sampling model
lmql.model("llama.cpp:<YOUR_WEIGHTS>.gguf") # llama.cpp model

lmql.model("local:gpt2") # load a `transformers` model in-process
lmql.model("local:gpt2", cuda=True, load_in_4bit=True) # load a `transformers` model in process with additional arguments
lmql.model("gpt2") # access a `transformers` model hosted via `lmql serve-model`

Once the lmql.LLM object is created, you can pass the model to the query program using one of two methods:

1. Specifying the model externally

# Source: https://lmql.ai/docs/models/
import lmql

# uses 'chatgpt' by default
@lmql.query(model="chatgpt")
def tell_a_joke():
    '''lmql
    """A great good dad joke. A indicates the punchline
    Q:[JOKE]
    A:[PUNCHLINE]""" where STOPS_AT(JOKE, "?") and \
                           STOPS_AT(PUNCHLINE, "\n")
    '''

tell_a_joke() # uses chatgpt
tell_a_joke(model=lmql.model("openai/text-davinci-003")) # uses text-davinci-003

2. Using a query with the from clause

# Source: https://lmql.ai/docs/models/
argmax
    "This is a query with a specified 'from'-clause: [RESPONSE]"
from
    "openai/text-ada-001"

Constraints

One of the main appeals of LMQL is the constraints component. Notably, LMQL enables users to specify constraints on the language model output. This helps with scripted prompting by guaranteeing that the model output ends at the intended point and also gives users control over the model during decoding.

The supported constraints include:

Stopping Phrases
Number Type Constraints
Choice From Set
Character Length
Token Length
Regex Constraints Preview
Combining Constraints
Custom

Check out the Constraints page in the LMQL documentation to learn more about them.

Distribution

The distribution instruction is a key component in LMQL. It provides control over the format and structure of the output by defining how the generated results are distributed and presented.

Here’s an example of it in action:

# Source: https://lmql.ai/docs/language/overview.html
argmax
    # review to be analyzed
    review = """We had a great stay. Hiking in the mountains was fabulous and the food is really good."""

    # use prompt statements to pass information to the model
    "Review: {review}"
    "Q: What is the underlying sentiment of this review and why?"
    # template variables like [ANALYSIS] are used to generate text
    "A:[ANALYSIS]" where not "\n" in ANALYSIS

    # use constrained variable to produce a classification
    "Based on this, the overall sentiment of the message can be considered to be[CLS]"
distribution
   CLS in [" positive", " neutral", " negative"]

                  ----- Model Output ------
Review: We had a great stay. Hiking in the mountains was fabulous and the food is really good.
Q: What is the underlying sentiment of this review and why?
A: ANALYSIS The underlying sentiment of this review is positive because the reviewer had a great stay, enjoyed the hiking and found the food to be good.
Based on this, the overall sentiment of the message can be considered to be CLS

LMQL Limitations and Community Support

As with all technologies, there are a few limitations with LMQL. For example:

The LMQL library has not been around for a long time and is not super popular. Consequently, the community is quite small, and there are only a few external resources available to help you when you get stuck.
The documentation for the library isn’t as detailed as it could be.
Limitations with OpenAI API mean It’s not possible to fully utilize LMQL with ChatGPT since the most popular and best-performing models are inaccessible.

While these may be blockers for users considering using LMQL, it’s important to note the library is still quite new and is still a work in progress – these limitations may be resolved in later versions.

Conclusion

LMQL is a SQL-like programming language that is also a superset of Python. It simplifies the process of extracting information or generating responses from LLMs by blending its declarative elements with an imperative scripting syntax. Developers can leverage LMQL to effectively control the generation of text, thus making it an extremely valuable tool for various applications in the tech industry.

To continue your learning, check out:

Author

Kurtis Pykes

Topics

Artificial Intelligence (AI)

SQL

Start Your AI Journey Today!

Course

Generative AI Concepts

2 hr

15.1K

Discover how to begin responsibly leveraging generative AI. Learn how generative AI models are developed and how they will impact society moving forward.

See Details

Start Course

Course

Working with the OpenAI API

3 hr

10.1K

Start your journey developing AI-powered applications with the OpenAI API. Learn about the functionality that underpins popular AI applications like ChatGPT.

See Details

Start Course

Track

AI Fundamentals

10 hours hr

Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.

See Details

Start Course

How to Become a Prompt Engineer: A Comprehensive Guide

A step-by-step guide to becoming a prompt engineer: skills required, top courses to take, with career advancement tips.

Srujana Maddula

9 min

Top 5 SQL Server Certifications: A Complete Guide

Unlock SQL Server certification success with our guide on paths, preparation with DataCamp, and the top certifications to enhance your career.

Matt Crabtree

8 min

Generative AI Certifications in 2024: Options, Certificates and Top Courses

Unlock your potential with generative AI certifications. Explore career benefits and our guide to advancing in AI technology. Elevate your career today.

Adel Nehme

6 min

[AI and the Modern Data Stack] Accelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel

Richie, Nuri, and La Tiffaney explore AI’s impact on marketing analytics, how AI is being integrated into existing products, the workflow for implementing AI into business processes and the challenges that come with it, the democratization of AI, what the state of AGI might look like in the near future, and much more.

Richie Cotton

52 min

Building Intelligent Applications with Pinecone Canopy: A Beginner's Guide

Explore using Canopy as an open-source Retrieval Augmented Generation (RAG) framework and context built on top of the Pinecone vector database.

Kurtis Pykes

12 min

Semantic Search with Pinecone and OpenAI

A step-by-step guide to building semantic search applications using OpenAI and Pinecone in Python.

Moez Ali

13 min

See More See More

What is LMQL?

Why LMQL?

Setting Up LMQL

Installation and Environment Setup

Understanding LMQL Syntax

Query

Decoder

Model

Constraints

Distribution

LMQL Limitations and Community Support

Conclusion

How to Become a Prompt Engineer: A Comprehensive Guide

Top 5 SQL Server Certifications: A Complete Guide

Generative AI Certifications in 2024: Options, Certificates and Top Courses

[AI and the Modern Data Stack] Accelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel

Building Intelligent Applications with Pinecone Canopy: A Beginner's Guide

Semantic Search with Pinecone and OpenAI

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Generative AI Concepts

Working with the OpenAI API

AI Fundamentals

How to Become a Prompt Engineer: A Comprehensive Guide

Top 5 SQL Server Certifications: A Complete Guide

Generative AI Certifications in 2024: Options, Certificates and Top Courses

[AI and the Modern Data Stack] Accelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel

Building Intelligent Applications with Pinecone Canopy: A Beginner's Guide

Semantic Search with Pinecone and OpenAI

Generative AI Concepts