Issue
This Content is from Stack Overflow. Question asked by Hanzcerb
I have this string
Book Release Date: 2 June, 2010 [Edition#5]
Book Release Date: 24 October, 1996
I want to use a regex to find the date only like follow:
2 June, 2010
24 October, 1996
I have tried using this pattern that is close to what I want
# this pattern result
# 2 June, 2010 [Edition#5]
# 24 October, 1996
date = re.findall(r"(?<=(Book Release Date: ))(.*)(?=([|n))", text)
# this pattern result
# 2 June, 2010
# None
date = re.findall(r"(?<=(Book Release Date: ))(.*)(?=[)", text)
Solution
You don’t need any lookaround assertions, just a single capture group that will be returned using re.findall
\bBook Release Date: (\d+ [A-Z][a-z]+, \d{4})\b
Explanation
\bBook Release Date:
(
Capture group 1\d+ [A-Z][a-z]+
Match 1+ digits, space, uppercase char A-Z, 1+ lowercase chars, \d{4}
Match,
and 4 digits
)
Close group 1\b
A word boundary to prevent a partial word match
Example
import re
pattern = r"\bBook Release Date: (\d+ [A-Z][a-z]+, \d{4})\b"
s = ("Book Release Date: 2 June, 2010 [Edition#5]\n"
"Book Release Date: 24 October, 1996")
print(re.findall(pattern, s))
Output
['2 June, 2010', '24 October, 1996']
This Question was asked in StackOverflow by Hanzcerb and Answered by The fourth bird It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.