[Solved] Regular expression for date shown as YYYY


The “basic” regex matching just the year field is \d{4}
but it is not enough.
We must prohibit that before the year occurs:

  • 3 letters (month) and a space,
  • then 2 digits (day) and a space.

This can be achieved with negative lookbehind:

(?<!\w{3} \d{2} )

But note that:

  • after the day field there can be an optional comma (,),
  • but negative lookbehind does not allow quantifiers
    making the pattern non-fixed width,

so we can not put ,* after the day field.

Fortunately, the case is not very complicated and we can circumvent this
limitation, using two negative lookbehinds, one with a comma after
the day field and another without.

So the whole regex can be as follows:

(?<!\w{3} \d{2}, )(?<!\w{3} \d{2} )\d{4}

If you want to disallow only actual month names at the beginning
(not any 3-letter word), replace both instances of \w{3} with:

(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)

Edit

Another solution, “aware” also of full month names is:
Catch initially two groups:

  • an optional “envelope” (non-capturing) group, composed of:
    • a word (capturing group No 1, possibly month name),
    • a space, 2 digits (day), optional comma and a space,
  • then 4 digits (capturing group No 2).

To sum up:

(?:(\w+) \d{2},* )?(\d{4})

Then, in the post-processing phase you must:

  • check the 1st captured group whether it contains either full or abbreviated
    name of any month,
  • reject the cases where the test just before succeeded.

But if the test just mentioned failed, then capturing group No 2 contains
the year field you are looking for.

solved Regular expression for date shown as YYYY