When you need to compare string in Python, a lot of us reach for .strip().lower() by default to pre-normalize the strings. But it turns out that .lower() doesn’t really work when you start thinking about non-ASCII characters. For example, the German letter “ß” (Eszett) should probably be equivalent to “ss” in a case-insensitive comparison, but .lower() doesn’t handle that correctly. So how can we do better? Python has a method .casefold() that is specifically designed for caseless matching of strings. It takes .lower() a step further by handling more complex cases, including those involving special characters and different alphabets:
words = [
"HELLO",
"Straße", # German ß
"İstanbul", # Turkish dotted capital I
"ΜΆΪΟΣ", # Greek with accents
]
for word in words:
print(f"Original: {word}")
print(f"lower(): {word.lower()}")
print(f"casefold(): {word.casefold()}")
print("-" * 30)
# Equality comparison example
a = "Straße"
b = "strasse"
print("Comparison example:")
print("Using lower():", a.lower() == b.lower())
print("Using casefold():", a.casefold() == b.casefold())
Gives:
Original: HELLO
lower(): hello
casefold(): hello
------------------------------
Original: Straße
lower(): straße
casefold(): strasse
------------------------------
Original: İstanbul
lower(): i̇stanbul
casefold(): i̇stanbul
------------------------------
Original: ΜΆΪΟΣ
lower(): μάϊος
casefold(): μάϊοσ
------------------------------
Comparison example:
Using lower(): False
Using casefold(): True
So now you know :)