[SOLVED] Regex end with a character or end of line with lookahead

Issue

This Content is from Stack Overflow. Question asked by Hanzcerb

I have this string

Book Release Date: 2 June, 2010 [Edition#5]

Book Release Date: 24 October, 1996

I want to use a regex to find the date only like follow:

2 June, 2010

24 October, 1996

I have tried using this pattern that is close to what I want

# this pattern result
# 2 June, 2010 [Edition#5]
# 24 October, 1996
date = re.findall(r"(?<=(Book Release Date: ))(.*)(?=([|n))", text)

# this pattern result
# 2 June, 2010
# None
date = re.findall(r"(?<=(Book Release Date: ))(.*)(?=[)", text)



Solution

You don’t need any lookaround assertions, just a single capture group that will be returned using re.findall

\bBook Release Date: (\d+ [A-Z][a-z]+, \d{4})\b

Explanation

  • \bBook Release Date:
  • ( Capture group 1
    • \d+ [A-Z][a-z]+ Match 1+ digits, space, uppercase char A-Z, 1+ lowercase chars
    • , \d{4} Match , and 4 digits
  • ) Close group 1
  • \b A word boundary to prevent a partial word match

Regex demo | Python demo

Example

import re
 
pattern = r"\bBook Release Date: (\d+ [A-Z][a-z]+, \d{4})\b"
 
s = ("Book Release Date: 2 June, 2010 [Edition#5]\n"
    "Book Release Date: 24 October, 1996")
 
print(re.findall(pattern, s))

Output

['2 June, 2010', '24 October, 1996']


This Question was asked in StackOverflow by Hanzcerb and Answered by The fourth bird It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?