The “basic” regex matching just the year field is \d{4}
but it is not enough.
We must prohibit that before the year occurs:
- 3 letters (month) and a space,
- then 2 digits (day) and a space.
This can be achieved with negative lookbehind:
(?<!\w{3} \d{2} )
But note that:
- after the day field there can be an optional comma (
,
), - but negative lookbehind does not allow quantifiers
making the pattern non-fixed width,
so we can not put ,*
after the day field.
Fortunately, the case is not very complicated and we can circumvent this
limitation, using two negative lookbehinds, one with a comma after
the day field and another without.
So the whole regex can be as follows:
(?<!\w{3} \d{2}, )(?<!\w{3} \d{2} )\d{4}
If you want to disallow only actual month names at the beginning
(not any 3-letter word), replace both instances of \w{3}
with:
(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
Edit
Another solution, “aware” also of full month names is:
Catch initially two groups:
- an optional “envelope” (non-capturing) group, composed of:
- a word (capturing group No 1, possibly month name),
- a space, 2 digits (day), optional comma and a space,
- then 4 digits (capturing group No 2).
To sum up:
(?:(\w+) \d{2},* )?(\d{4})
Then, in the post-processing phase you must:
- check the 1st captured group whether it contains either full or abbreviated
name of any month, - reject the cases where the test just before succeeded.
But if the test just mentioned failed, then capturing group No 2 contains
the year field you are looking for.
solved Regular expression for date shown as YYYY