Skip to main content

What is YAML? Understanding the Basics, Syntax, and Use Cases

YAML is a simple yet powerful format for configurations, automation, and data serialization. Learn how it works with real-world examples!
Feb 16, 2025  · 14 min read

Over the years, I’ve worked with countless configuration files, and YAML has stood out for its simplicity and readability. Whether setting up workflows in Kubernetes, defining services in Docker, or structuring API requests, YAML makes complex configurations more manageable. Its clean, indentation-based structure eliminates the clutter of formats like XML while remaining flexible.

In this guide, I’ll walk you through YAML’s syntax, structure, advanced features, and best practices so you can work with it confidently. 

What is YAML?

YAML (Yet Another Markup Language / YAML Ain’t Markup Language) is a data serialization format that prioritizes readability and ease of use. While XML uses a mixture of HTML-like nested tags and JSON uses curly brackets and quotes, much like Python dictionaries, YAML is more concise and uses indentation to define structure, making it more human-friendly.

YAML supports various data types, including scalars (strings, numbers, booleans), sequences (lists), and mappings (key-value pairs). It is widely used in configuration files, infrastructure automation, and data exchange, especially in tools like Kubernetes, Docker, and Ansible.

Additionally, YAML is a superset of JSON, meaning valid JSON files can be parsed as YAML. YAML files typically have a .yaml or .yml extension.

You can learn more about YAML on its website as well!

YAML Syntax and Structure

This section will explore the fundamental principles of YAML syntax, including key-value pairs, lists, nested data, and comments.

Basic syntax rules

There are a few basic syntactic rules for YAML: 

  • Space indentation denotes structure, so avoid those tabs! 
  • Key-value pairs follow a key: value structure, similar to other languages. 
  • Using hyphens at the start of lines will denote a list. 
  • Using # will create comment lines.
# Here is an example of YAML
name: John Doe
age: 30
skills:
  - Python
  - YAML

Key-value pairs

YAML represents data as key-value pairs, similar to dictionaries in Python. This will often denote information given to different configuration files and settings. There is no need to denote strings or keys with quotes; simply write the key and values needed:

location: New York
country: USA
security-level: user

Lists in YAML

Lists are represented using hyphens (-). This will allow you to list multiple objects under a single key. This is often represented visually with bullets when read by Markup editors.

fruits:
  - Apple
  - Banana
  - Cherry

Nested data

Nested structures allow hierarchical data representation using indentation. Think of these like nested dictionaries. By using indentation, you denote what keys are subsets of others.

person:
  name: Alice
  details:
    age: 25
    city: London

Comments

Comments begin with # and are ignored by YAML parsers. These comments are single-line comments.

# This is a comment
username: admin
password: secret

Advanced YAML Features

YAML includes powerful features like multi-line strings, data types, and anchors that make documents more efficient and structured. In this section, we’ll explore these capabilities with practical examples.

Multi-line strings

YAML supports multi-line strings using | (literal block) or > (folded block). 

  • | literal block will create a new line \n for each line break. 
  • > folded block will only make a new line for consecutive line breaks.
literal: |
  This is a
  multi-line string.

folded: >
  This is another
  multi-line string.

The above is better understood by showing the output.

  • For the | (literal block):
This is a
multi-line string.
  • For the > (folded block):
This is another multi-line string.

Data types in YAML

YAML supports various data types, including strings, numbers, booleans, and null values. It automatically detects types based on formatting but also allows explicit type definitions.

The following example shows the usage of basic data types in YAML:

string_implicit: Hello, YAML!  # No quotes needed unless necessary
string_double_quoted: "Supports escape sequences like \n and \t"
string_single_quoted: 'Raw text, no escape sequences'

integer: 42  # Whole numbers
float: 3.14  # Numbers with decimals

boolean_true: true
boolean_false: false

null_value: null  # Null value
null_tilde: ~  # Another way to represent null

YAML allows explicit type declarations using !!type when needed:

explicit_string: !!str 123  # Forces 123 to be a string
explicit_integer: !!int "42"  # Forces "42" to be an integer
explicit_float: !!float "3.14"  # Forces "3.14" to be a float

Since YAML is often used for structured data, it supports:

  • Lists (sequences):
fruits:
  - Apple
  - Banana
  - Cherry
  • Dictionaries (mappings):
person:
  name: Alice
  age: 30
  is_student: false

Anchors and aliases

YAML allows you to define reusable values using anchors (&) and reference them later using aliases (*). This helps reduce redundancy in configuration files, making them cleaner and easier to maintain.

defaults: &default_settings
  retries: 3
  timeout: 30

server1:
  host: example.com
  retries: *default_settings  # Reuses the retries value from defaults

The <<: syntax allows merging key-value pairs from an anchor into another mapping. If a key exists in both, the new value overrides the original.

defaults: &default_settings
  retries: 3
  timeout: 30

server1:
  <<: *default_settings  # Merges all key-value pairs from default_settings
  host: example.com  # This key is added to the merged data

This is the final resolved structure:

server1:
  retries: 3
  timeout: 30
  host: example.com

Anchors and aliases are especially useful in large configuration files where repeating values manually would be inefficient. They help keep YAML files DRY (Don't Repeat Yourself) and make updates easier.

Common Use Cases for YAML

YAML is widely used in software development, infrastructure automation, and API management. Its human-readable syntax makes it a preferred format for configuration files, data serialization, and Infrastructure as Code (IaC). Let’s explore its most common applications.

Configuration files

YAML is widely used for configuration in applications like Docker Compose, Kubernetes and CI/CD pipelines. Its ease of understanding makes it straightforward for anyone to pick up Docker YAML set-up files and understand what is happening.

version: '3'
services:
  web:
    image: nginx
    ports:
      - "80:80"
    environment:
      - NGINX_HOST=localhost
      - NGINX_PORT=80

YAML's readability and support for anchors and aliases help reduce repetition, making it more maintainable than JSON or XML.

Learn more about YAML and its usage in Docker in this intermediate Docker course.

Data serialization and transfer

YAML is used to serialize data for APIs and configuration management tools by converting complex data structures into a human-readable format and easily parsed by machines. 

For example, an API request body formatted in YAML:

user:
  id: 123
  name: "John Doe"
  email: "johndoe@example.com"
  active: true

YAML’s indentation-based structure eliminates unnecessary syntax, making it lightweight, readable, and easy to modify compared to JSON.

Infrastructure as Code (IaC)

Configuration management tools like Ansible and Kubernetes leverage YAML to define system states, automate processes, and ensure consistency across environments.

  • In Ansible, YAML is used to write playbooks that define system states, tasks, and dependencies, ensuring that infrastructure components are configured consistently. 
  • Kubernetes utilizes YAML manifests to define resources such as pods, services, and deployments, enabling automated orchestration of containerized applications.

Here’s an example of a Kubernetes Pod configuration:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: app-container
      image: my-app:latest
      ports:
        - containerPort: 8080

Learn more about how YAML is used in Kubernetes in this Introduction to Kubernetes course.

API documentation

API specifications like OpenAPI and Swagger use YAML to define endpoints and data structures in an easy-to-read way. YAML is used to outline API methods, request parameters, response formats, and authentication methods.

Here’s an example of an OpenAPI specification in YAML:

openapi: 3.0.0
info:
  title: User API
  version: "1.0"
paths:
  /users:
    get:
      summary: Retrieve a list of users
      responses:
        "200":
          description: Successful response

OpenAPI specifications, for example, use YAML to document RESTful APIs. This allows them to provide a clear blueprint for generating client SDKs, interactive API documentation, and automated testing. This structured format ensures consistency across API implementations.

Working with YAML Files

YAML is widely used for configuration files, automation, and data serialization, but since it relies on indentation, proper formatting is crucial. Here’s how you can read, write, validate, and edit YAML effectively.

Reading and writing YAML in Python

Python’s PyYAML library can parse and generate YAML.

Imagine you have the following configuration YAML file:

database:
  host: localhost
  port: 5432
  user: admin
  password: secret

Here’s how you can work with your configuration file in Python:

import yaml

# Load YAML data
with open("config.yaml", "r") as file:
    data = yaml.safe_load(file)  # safe_load prevents arbitrary code execution

# Modify data (optional)
data["database"]["user"] = "new_user"

# Write YAML data
with open("output.yaml", "w") as file:
    yaml.dump(data, file, default_flow_style=False)

If you’re interested in working with JSON data in Python, check out the comprehensive Python JSON tutorial.

Validating YAML files

To ensure correct structure, you can use tools to check for tabs in place of spaces or weird issues such as repeated characters, syntax problems, and trailing spaces.

These are some popular YAML validators:

Editing YAML

You can write and edit YAML in any text editor, but linting tools and syntax highlighting improve readability.

Some of my favorite editors:

  • VS Code (with YAML plugins)
  • PyCharm (built-in support)
  • Sublime Text (with YAML syntax highlighting)

Common Mistakes to Avoid in YAML

Despite its simplicity, you can still easily have problems and typos when working with YAML. This section discusses these mistakes and provides best practices for writing clean and correct files. It is also why I recommend using a linter or text editor!

Mixing tabs and spaces

YAML relies on spaces for indentation—never mix spaces and tabs. Tabs will simply break your YAML script. This is actually a conscientious decision, as different systems read tabs differently, and thus, to minimize impact, spaces are the preferred usage.

Incorrect indentation

Ensure consistent indentation to avoid parsing errors. Since indentations are YAML’s only method of denoting hierarchy, improper parsing can cause issues with your code. You can easily tuck away a key: value pair in the wrong place, so just keep an eye out for those indentations!

Forgetting quotes for special characters

Use quotes for strings containing special characters or spaces. Things like backslashes, commas, exclamation marks, and so on need quotes to be read as strings.

path: "/home/user/documents"
message: "Hello, World!"

By using proper validation, structured editing, and Python’s PyYAML, you can work efficiently with YAML files while avoiding common pitfalls.

Conclusion

YAML is a powerful yet simple format widely used in configuration, data serialization, and infrastructure automation. You can efficiently work with YAML in various applications by understanding its syntax, structure, and best practices.

If you're interested in applying YAML in real-world scenarios:

Become a Data Engineer

Build Python skills to become a professional data engineer.
Get Started for Free

FAQs

Is YAML universal?

As long as the data source or target can read YAML, it is a viable and useful method of serializing and transporting data. Make sure that you are sending data to a target that can process YAML.

Is YAML secure? Can YAML files introduce security risks?

YAML itself is just a data format, but security risks arise when parsing untrusted YAML files. The default yaml.load() method in Python’s PyYAML can execute arbitrary code embedded in YAML, making it risky. Instead, always use yaml.safe_load() to prevent unintended execution of malicious code. Similarly, when using YAML in applications, ensure strict schema validation to avoid security vulnerabilities.

Can YAML support environment variables?

Yes! While YAML itself doesn’t directly process environment variables, many tools (like Docker Compose and Kubernetes) allow referencing environment variables within YAML files.

How do you handle comments in YAML?

YAML supports single-line comments using the # symbol, but it does not support multi-line comments. If you need multi-line comments, a common workaround is to use a dummy key like _comment. However, this is just a convention and won’t be ignored by YAML parsers unless your application specifically filters it out.


Tim Lu's photo
Author
Tim Lu
LinkedIn

I am a data scientist with experience in spatial analysis, machine learning, and data pipelines. I have worked with GCP, Hadoop, Hive, Snowflake, Airflow, and other data science/engineering processes.

Topics

Learn more about data engineering with these courses!

track

Data Engineer

40hrs hr
Gain in-demand skills to efficiently ingest, clean, manage data, and schedule and monitor pipelines, setting you apart in the data engineering field.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

An Introduction to Data Pipelines for Aspiring Data Professionals

This tutorial covers the basics of data pipelines and terminology for aspiring data professionals, including pipeline uses, common technology, and tips for pipeline building.
Amberle McKee's photo

Amberle McKee

22 min

tutorial

R Markdown Tutorial for Beginners

Learn what R Markdown is, what it's used for, how to install it, what capacities it provides for working with code, text, and plots, what syntax it uses, what output formats it supports, and how to render and publish R Markdown documents.
Elena Kosourova's photo

Elena Kosourova

12 min

tutorial

Python JSON Data: A Guide With Examples

Learn how to work with JSON in Python, including serialization, deserialization, formatting, optimizing performance, handling APIs, and understanding JSON’s limitations and alternatives.
Moez Ali's photo

Moez Ali

8 min

tutorial

What is Terraform? Get Started With Infrastructure as Code

Read our step-by-step beginner's guide to using Terraform, and learn how to efficiently automate and manage your Azure, AWS, and Google Cloud infrastructure.
Marie Fayard's photo

Marie Fayard

10 min

tutorial

Getting Started With OpenAI Structured Outputs

Learn how to get started with OpenAI Structured Outputs, understand its new syntax, and explore its key applications.
Bex Tuychiev's photo

Bex Tuychiev

9 min

tutorial

Getting Started with Apache Airflow

Learn the basics of bringing your data pipelines to production, with Apache Airflow. Install and configure Airflow, then write your first DAG with this interactive tutorial.
Jake Roach's photo

Jake Roach

10 min

See MoreSee More