Curso
It’s common to reach for .find()
or .index()
to check if text exists in a string—and then run into odd behavior like a ValueError
or a missed match at the start of the string. Case sensitivity, off-by-one slice boundaries, and the fact that there’s no .contains()
method on str
add to the confusion. The fix is straightforward: use the right tool for the job. We rely on the in
operator for existence checks, .find()
/.index()
for positions, .count()
for tallies, and .replace()
for substitutions—plus a few Unicode-safe techniques when case-insensitivity matters.
How Python Handles Substring Search
Python’s text strings are Unicode. Substring operations are case-sensitive by default and implemented in C for speed. The most direct way to test if text appears inside a string is the in
operator, which returns a Boolean using the string’s __contains__()
implementation. When you need positions or counts, use the dedicated str
methods. Optional start
and end
parameters follow slice semantics: start is inclusive; end is exclusive.
Containment With in
Use "needle" in haystack
when you only care whether the substring exists. It’s readable and avoids index handling.
text = "Banana bread"
print("ban" in text) # False (case-sensitive)
print("Ban" in text) # True
For case-insensitive checks, normalize both sides. .casefold()
is more Unicode-aware than .lower()
and is my default for caseless matching.
pattern = "ban"
print(pattern.casefold() in text.casefold()) # True
- When we use it: quick existence checks, input validation, filtering lists.
- Edge case:
"" in s
is alwaysTrue
(the empty string is in every string).
str.find()
Method
Syntax
str.find()
returns the lowest index of the substring, or -1
if not found. Optional start
and end
restrict the search.
s.find(substring, start=0, end=len(s))
Examples and use cases
Use .find()
when you need the position and can handle a “not found” value without exceptions.
tagline = "Ship fast. Fix faster."
print(tagline.find("fast")) # 5
print(tagline.find("fast", 6)) # 16
print(tagline.find("slow")) # -1
Do not write if s.find(sub):
to check existence. If the match is at index 0, the condition is falsey.
name = "Alice"
if name.find("A"): # Wrong: 0 is Falsey
print("Found A") # Will not run
if name.find("A") != -1: # Correct
print("Found A")
str.index()
Method
Syntax
.index()
mirrors .find()
but raises ValueError
when the substring is missing.
s.index(substring, start=0, end=len(s))
Examples and when to choose it
Use .index()
when the substring must exist and an exception is the right failure mode.
product = "USB-C Cable"
print(product.index("Cable")) # 6
try:
product.index("HDMI")
except ValueError:
print("Required label missing")
str.count() Method
Syntax
.count()
returns the number of non-overlapping occurrences. It also accepts start
and end
bounds.
s.count(substring, start=0, end=len(s))
Examples and edge cases
Use .count()
for quick tallies without regex.
log_line = "ERROR: timeout. ERROR: retry. ERROR: gave up."
print(log_line.count("ERROR")) # 3
print(log_line.count("WARN")) # 0
Special case: the empty string.
print("abc".count("")) # 4 (len(s) + 1)
str.replace()
Method
Syntax
.replace()
returns a new string where occurrences of old
are replaced by new
. Use count
to limit replacements.
s.replace(old, new, count=-1)
Replacing substrings
The method is not in-place; it returns a copy.
message = "The red house is between the blue house and the old house"
print(message.replace("house", "car"))
# The red car is between the blue car and the old car
print(message.replace("house", "car", 2))
# The red car is between the blue car and the old house
Case-insensitive replacement
For case-insensitive replacements while preserving case patterns, use re.sub()
with re.IGNORECASE
, or normalize with .casefold()
if you do not need to preserve original casing.
Case-Insensitive Matching With .casefold()
.casefold()
provides aggressive, Unicode-aware case normalization and is preferred over .lower()
for caseless checks.
query = "straße" # German 'straße'
text = "STRASSE und mehr" # Uppercase variant
print(query.casefold() in text.casefold()) # True
Related Tools for Prefixes and Suffixes
For known starts/ends, reach for dedicated methods—they’re faster and clearer than slicing or regex:
s.startswith(prefix)
ands.endswith(suffix)
(accept tuples for multiple options)
s.removeprefix(prefix)
ands.removesuffix(suffix)
(Python 3.9+)
filename = "draft_report_v2.pdf"
print(filename.removeprefix("draft_")) # report_v2.pdf
print(filename.endswith((".pdf", ".docx"))) # True
Working With Pandas Columns
When searching many strings in a DataFrame column, use vectorized string methods. Set regex=False
to match literal text.
import pandas as pd
cities = pd.Series(["New York", "Yorktown", None, "Toronto"], dtype="string")
mask = cities.str.contains("York", regex=False, na=False)
print(mask.tolist()) # [True, True, False, False]
Notes:
- By default,
regex=True
, so special characters like.
match any character unless you disable regex.
- Handle nulls via
na=...
. WithStringDtype
, missing values default toFalse
in the mask.
Common Pitfalls and Edge Cases
- No
.contains()
onstr
: use"sub" in s
."abc".contains("a")
raisesAttributeError
.
.find()
truthiness: index0
is falsey; compare with!= -1
or preferin
for existence.
.index()
raises: handleValueError
if the substring might be absent.
- Empty substring:
"" in s
isTrue
;s.find("")
returns0
;s.count("")
islen(s)+1
.
- Case sensitivity: all operations are case-sensitive unless you normalize or use regex flags.
- Bytes vs. text: do not mix
bytes
andstr
in searches; decode or encode to match types.
- Substring semantics:
"ab" in "cat,bat"
isFalse
even though"a"
and"b"
appear separately—matches must be contiguous.
Hands-On Examples You Can Run
These short snippets show correct patterns and highlight common gotchas.
Existence checks
headline = "Where's Waldo?"
print("Waldo" in headline) # True
print("Wenda" in headline) # False
print(headline.find("Waldo")) # 8
print(headline.find("Wenda")) # -1
Bounded search with start and end
End is exclusive, so search positions mirror slice semantics.
headline = "Where's Waldo?"
print(headline.find("Waldo", 0, 6)) # -1 (searching "Where'")
print(headline.find("Waldo", 0, 13)) # 8 (searching up to the 'o' in Waldo)
Counting and replacing
inventory = "apple, apple, pear, apple"
print(inventory.count("apple")) # 3
print(inventory.replace("apple", "orange", 2)) # orange, orange, pear, apple
A Small Practice Exercise
This example processes a list of movie blurbs. It demonstrates existence checks, bounded searches, and safe replacements without regex.
blurbs = [
"Critics say the lead actor delivers a career-best performance.",
"Fans argue the actor actor scene was intentionally repetitive.",
"A clever cameo steals the show."
]
for blurb in blurbs:
# Check whether "actor" appears between indexes 20 and 40 (end exclusive).
window_has_actor = "actor".casefold() in blurb[20:40].casefold()
if not window_has_actor:
print("Window check: actor not found")
continue
# If "actor actor" appears, collapse to a single "actor".
if "actor actor" in blurb:
print(blurb.replace("actor actor", "actor"))
# If "actor actor actor" appears, collapse to a single "actor".
elif "actor actor actor" in blurb:
print(blurb.replace("actor actor actor", "actor"))
else:
print(blurb)
Output:
Window check: actor not found
Fans argue the actor scene was intentionally repetitive.
Window check: actor not found
Conclusion
Use the in
operator for simple containment, .find()
or .index()
when you need positions, .count()
for non-overlapping tallies, and .replace()
for substitutions. Normalize with .casefold()
for case-insensitive checks, and prefer dedicated prefix/suffix helpers when appropriate. We avoid converting .find()
results to Booleans and treat empty-substring behavior as a special case to keep string code correct and clear.