Track
Over the years, I’ve worked with countless configuration files, and YAML has stood out for its simplicity and readability. Whether setting up workflows in Kubernetes, defining services in Docker, or structuring API requests, YAML makes complex configurations more manageable. Its clean, indentation-based structure eliminates the clutter of formats like XML while remaining flexible.
In this guide, I’ll walk you through YAML’s syntax, structure, advanced features, and best practices so you can work with it confidently.
What is YAML?
YAML (Yet Another Markup Language / YAML Ain’t Markup Language) is a data serialization format that prioritizes readability and ease of use. While XML uses a mixture of HTML-like nested tags and JSON uses curly brackets and quotes, much like Python dictionaries, YAML is more concise and uses indentation to define structure, making it more human-friendly.
YAML supports various data types, including scalars (strings, numbers, booleans), sequences (lists), and mappings (key-value pairs). It is widely used in configuration files, infrastructure automation, and data exchange, especially in tools like Kubernetes, Docker, and Ansible.
Additionally, YAML is a superset of JSON, meaning valid JSON files can be parsed as YAML. YAML files typically have a .yaml
or .yml
extension.
You can learn more about YAML on its website as well!
YAML Syntax and Structure
This section will explore the fundamental principles of YAML syntax, including key-value pairs, lists, nested data, and comments.
Basic syntax rules
There are a few basic syntactic rules for YAML:
- Space indentation denotes structure, so avoid those tabs!
- Key-value pairs follow a
key: value
structure, similar to other languages. - Using hyphens at the start of lines will denote a list.
- Using
#
will create comment lines.
# Here is an example of YAML
name: John Doe
age: 30
skills:
- Python
- YAML
Key-value pairs
YAML represents data as key-value pairs, similar to dictionaries in Python. This will often denote information given to different configuration files and settings. There is no need to denote strings or keys with quotes; simply write the key and values needed:
location: New York
country: USA
security-level: user
Lists in YAML
Lists are represented using hyphens (-
). This will allow you to list multiple objects under a single key. This is often represented visually with bullets when read by Markup editors.
fruits:
- Apple
- Banana
- Cherry
Nested data
Nested structures allow hierarchical data representation using indentation. Think of these like nested dictionaries. By using indentation, you denote what keys are subsets of others.
person:
name: Alice
details:
age: 25
city: London
Comments
Comments begin with #
and are ignored by YAML parsers. These comments are single-line comments.
# This is a comment
username: admin
password: secret
Advanced YAML Features
YAML includes powerful features like multi-line strings, data types, and anchors that make documents more efficient and structured. In this section, we’ll explore these capabilities with practical examples.
Multi-line strings
YAML supports multi-line strings using |
(literal block) or >
(folded block).
|
literal block will create a new line\n
for each line break.>
folded block will only make a new line for consecutive line breaks.
literal: |
This is a
multi-line string.
folded: >
This is another
multi-line string.
The above is better understood by showing the output.
- For the
|
(literal block):
This is a
multi-line string.
- For the
>
(folded block):
This is another multi-line string.
Data types in YAML
YAML supports various data types, including strings, numbers, booleans, and null values. It automatically detects types based on formatting but also allows explicit type definitions.
The following example shows the usage of basic data types in YAML:
string_implicit: Hello, YAML! # No quotes needed unless necessary
string_double_quoted: "Supports escape sequences like \n and \t"
string_single_quoted: 'Raw text, no escape sequences'
integer: 42 # Whole numbers
float: 3.14 # Numbers with decimals
boolean_true: true
boolean_false: false
null_value: null # Null value
null_tilde: ~ # Another way to represent null
YAML allows explicit type declarations using !!type
when needed:
explicit_string: !!str 123 # Forces 123 to be a string
explicit_integer: !!int "42" # Forces "42" to be an integer
explicit_float: !!float "3.14" # Forces "3.14" to be a float
Since YAML is often used for structured data, it supports:
- Lists (sequences):
fruits:
- Apple
- Banana
- Cherry
- Dictionaries (mappings):
person:
name: Alice
age: 30
is_student: false
Anchors and aliases
YAML allows you to define reusable values using anchors (&
) and reference them later using aliases (*
). This helps reduce redundancy in configuration files, making them cleaner and easier to maintain.
defaults: &default_settings
retries: 3
timeout: 30
server1:
host: example.com
retries: *default_settings # Reuses the retries value from defaults
The <<:
syntax allows merging key-value pairs from an anchor into another mapping. If a key exists in both, the new value overrides the original.
defaults: &default_settings
retries: 3
timeout: 30
server1:
<<: *default_settings # Merges all key-value pairs from default_settings
host: example.com # This key is added to the merged data
This is the final resolved structure:
server1:
retries: 3
timeout: 30
host: example.com
Anchors and aliases are especially useful in large configuration files where repeating values manually would be inefficient. They help keep YAML files DRY (Don't Repeat Yourself) and make updates easier.
Common Use Cases for YAML
YAML is widely used in software development, infrastructure automation, and API management. Its human-readable syntax makes it a preferred format for configuration files, data serialization, and Infrastructure as Code (IaC). Let’s explore its most common applications.
Configuration files
YAML is widely used for configuration in applications like Docker Compose, Kubernetes and CI/CD pipelines. Its ease of understanding makes it straightforward for anyone to pick up Docker YAML set-up files and understand what is happening.
version: '3'
services:
web:
image: nginx
ports:
- "80:80"
environment:
- NGINX_HOST=localhost
- NGINX_PORT=80
YAML's readability and support for anchors and aliases help reduce repetition, making it more maintainable than JSON or XML.
Learn more about YAML and its usage in Docker in this intermediate Docker course.
Data serialization and transfer
YAML is used to serialize data for APIs and configuration management tools by converting complex data structures into a human-readable format and easily parsed by machines.
For example, an API request body formatted in YAML:
user:
id: 123
name: "John Doe"
email: "johndoe@example.com"
active: true
YAML’s indentation-based structure eliminates unnecessary syntax, making it lightweight, readable, and easy to modify compared to JSON.
Infrastructure as Code (IaC)
Configuration management tools like Ansible and Kubernetes leverage YAML to define system states, automate processes, and ensure consistency across environments.
- In Ansible, YAML is used to write playbooks that define system states, tasks, and dependencies, ensuring that infrastructure components are configured consistently.
- Kubernetes utilizes YAML manifests to define resources such as pods, services, and deployments, enabling automated orchestration of containerized applications.
Here’s an example of a Kubernetes Pod configuration:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app-container
image: my-app:latest
ports:
- containerPort: 8080
Learn more about how YAML is used in Kubernetes in this Introduction to Kubernetes course.
API documentation
API specifications like OpenAPI and Swagger use YAML to define endpoints and data structures in an easy-to-read way. YAML is used to outline API methods, request parameters, response formats, and authentication methods.
Here’s an example of an OpenAPI specification in YAML:
openapi: 3.0.0
info:
title: User API
version: "1.0"
paths:
/users:
get:
summary: Retrieve a list of users
responses:
"200":
description: Successful response
OpenAPI specifications, for example, use YAML to document RESTful APIs. This allows them to provide a clear blueprint for generating client SDKs, interactive API documentation, and automated testing. This structured format ensures consistency across API implementations.
Working with YAML Files
YAML is widely used for configuration files, automation, and data serialization, but since it relies on indentation, proper formatting is crucial. Here’s how you can read, write, validate, and edit YAML effectively.
Reading and writing YAML in Python
Python’s PyYAML
library can parse and generate YAML.
Imagine you have the following configuration YAML file:
database:
host: localhost
port: 5432
user: admin
password: secret
Here’s how you can work with your configuration file in Python:
import yaml
# Load YAML data
with open("config.yaml", "r") as file:
data = yaml.safe_load(file) # safe_load prevents arbitrary code execution
# Modify data (optional)
data["database"]["user"] = "new_user"
# Write YAML data
with open("output.yaml", "w") as file:
yaml.dump(data, file, default_flow_style=False)
If you’re interested in working with JSON data in Python, check out the comprehensive Python JSON tutorial.
Validating YAML files
To ensure correct structure, you can use tools to check for tabs in place of spaces or weird issues such as repeated characters, syntax problems, and trailing spaces.
These are some popular YAML validators:
- CLI tool: yamllint (Python-based linter)
- Online validators: YAML Lint, JSON Formatter’s YAML Validator
Editing YAML
You can write and edit YAML in any text editor, but linting tools and syntax highlighting improve readability.
Some of my favorite editors:
- VS Code (with YAML plugins)
- PyCharm (built-in support)
- Sublime Text (with YAML syntax highlighting)
Common Mistakes to Avoid in YAML
Despite its simplicity, you can still easily have problems and typos when working with YAML. This section discusses these mistakes and provides best practices for writing clean and correct files. It is also why I recommend using a linter or text editor!
Mixing tabs and spaces
YAML relies on spaces for indentation—never mix spaces and tabs. Tabs will simply break your YAML script. This is actually a conscientious decision, as different systems read tabs differently, and thus, to minimize impact, spaces are the preferred usage.
Incorrect indentation
Ensure consistent indentation to avoid parsing errors. Since indentations are YAML’s only method of denoting hierarchy, improper parsing can cause issues with your code. You can easily tuck away a key: value
pair in the wrong place, so just keep an eye out for those indentations!
Forgetting quotes for special characters
Use quotes for strings containing special characters or spaces. Things like backslashes, commas, exclamation marks, and so on need quotes to be read as strings.
path: "/home/user/documents"
message: "Hello, World!"
By using proper validation, structured editing, and Python’s PyYAML, you can work efficiently with YAML files while avoiding common pitfalls.
Conclusion
YAML is a powerful yet simple format widely used in configuration, data serialization, and infrastructure automation. You can efficiently work with YAML in various applications by understanding its syntax, structure, and best practices.
If you're interested in applying YAML in real-world scenarios:
- Learn how YAML is used in CI/CD workflows with this CI/CD for Machine Learning course.
- Explore how APIs use YAML in their specifications in this Introduction to APIs in Python course.
- Dive deeper into containerization and infrastructure automation in this Containerization and Virtualization track.
Become a Data Engineer
FAQs
Is YAML universal?
As long as the data source or target can read YAML, it is a viable and useful method of serializing and transporting data. Make sure that you are sending data to a target that can process YAML.
Is YAML secure? Can YAML files introduce security risks?
YAML itself is just a data format, but security risks arise when parsing untrusted YAML files. The default yaml.load()
method in Python’s PyYAML
can execute arbitrary code embedded in YAML, making it risky. Instead, always use yaml.safe_load()
to prevent unintended execution of malicious code. Similarly, when using YAML in applications, ensure strict schema validation to avoid security vulnerabilities.
Can YAML support environment variables?
Yes! While YAML itself doesn’t directly process environment variables, many tools (like Docker Compose and Kubernetes) allow referencing environment variables within YAML files.
How do you handle comments in YAML?
YAML supports single-line comments using the #
symbol, but it does not support multi-line comments. If you need multi-line comments, a common workaround is to use a dummy key like _comment
. However, this is just a convention and won’t be ignored by YAML parsers unless your application specifically filters it out.
I am a data scientist with experience in spatial analysis, machine learning, and data pipelines. I have worked with GCP, Hadoop, Hive, Snowflake, Airflow, and other data science/engineering processes.