Issue
This Content is from Stack Overflow. Question asked by Daniel Schmitt
I would like to remove everything from string except title and year of a movie
I would like to keep everything that’s not in a bracket:
**Dirty.Work.Wie.deweit.wuerdest.Du.gehen.2018**[.German.AC3.WEBRip]
**Zwei.baerenstarke.Typen.1983**[.DE.EN.DTSHD.MasteDEr.5.1.2160p.HDR10.x265-kellerratte]
**The.Hills.Have.Eyes.1977**[.COMPLETE.UHD.BLURAY-UNTOUCHED]
**Wonder.Woman.1984.2020.**[GERMAN.DUBBED.DL.2160p.HDR.WEB.x265]
**Wonder.Woman.1984**[.GERMAN.]**2020**[.DUBBED.DL.2160p.HDR.WEB.x265]
**2012**[.German.]**2006**[.DL.2160p.UHD.BluRay.HDR.HEVC.Remux]
**Sherlock.Holmes.2009**[.German.DL.]**2022**[.ock.Holmes.UHD.BluRay.2160p.UHD.BluRay.HDR.HEVC.Remux]
This is what I tried:
((?<=bd{4}b)|b(German|DE)b.*)
https://regex101.com/r/Z4cRMn/1
Does anyone has a clue how to do it? Need some examples maybe
Solution
You can get close to what you need using
\[[^\]\[]*]|\W*\b(?:\d{4}|German|DE)\b(?!.*\[[^\]\[]*]).*
See the regex demo. Details:
\[
– a[
char[^\]\[]*
– zero or more chars other than[
and]
]
– a]
char|
– or\W*
– zero or more non-word chars\b
– a word boundary(?:\d{4}|German|DE)
– four digits orGerman
orDE
\b
– a word boundary(?!.*\[[^\]\[]*])
– immediately to the right of the current location, there should not be.*
– any zero or more chars other than line break chars as many as possible\[[^\]\[]*]
–[
, zero or more chars other than[
and]
and then a]
.*
– the rest of the line.
This Question was asked in StackOverflow by Daniel Schmitt and Answered by Wiktor Stribiżew It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.