[SOLVED] Get German title and year of movie with regex

Issue

This Content is from Stack Overflow. Question asked by Daniel Schmitt

I would like to remove everything from string except title and year of a movie

I would like to keep everything that’s not in a bracket:

**Dirty.Work.Wie.deweit.wuerdest.Du.gehen.2018**[.German.AC3.WEBRip]
**Zwei.baerenstarke.Typen.1983**[.DE.EN.DTSHD.MasteDEr.5.1.2160p.HDR10.x265-kellerratte]
**The.Hills.Have.Eyes.1977**[.COMPLETE.UHD.BLURAY-UNTOUCHED]
**Wonder.Woman.1984.2020.**[GERMAN.DUBBED.DL.2160p.HDR.WEB.x265]
**Wonder.Woman.1984**[.GERMAN.]**2020**[.DUBBED.DL.2160p.HDR.WEB.x265]
**2012**[.German.]**2006**[.DL.2160p.UHD.BluRay.HDR.HEVC.Remux]

**Sherlock.Holmes.2009**[.German.DL.]**2022**[.ock.Holmes.UHD.BluRay.2160p.UHD.BluRay.HDR.HEVC.Remux]

This is what I tried:

((?<=bd{4}b)|b(German|DE)b.*)

https://regex101.com/r/Z4cRMn/1

Does anyone has a clue how to do it? Need some examples maybe



Solution

You can get close to what you need using

\[[^\]\[]*]|\W*\b(?:\d{4}|German|DE)\b(?!.*\[[^\]\[]*]).*

See the regex demo. Details:

  • \[ – a [ char
  • [^\]\[]* – zero or more chars other than [ and ]
  • ] – a ] char
  • | – or
  • \W* – zero or more non-word chars
  • \b – a word boundary
  • (?:\d{4}|German|DE) – four digits or German or DE
  • \b – a word boundary
  • (?!.*\[[^\]\[]*]) – immediately to the right of the current location, there should not be
    • .* – any zero or more chars other than line break chars as many as possible
    • \[[^\]\[]*][, zero or more chars other than [ and ] and then a ]
  • .* – the rest of the line.


This Question was asked in StackOverflow by Daniel Schmitt and Answered by Wiktor Stribiżew It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?