In too deep
--#--
Constructing a regex that accepts any valid date,
while rejecting invalid ones like 2/29/1999, is very tough
indeed. It's so tough that when I needed to do it, I nearly
threw up my hands. Then I went looking for a good date
regex.
The closest I came was a partial formula in Friedl's book, where
he teases you with possible solutions to matching the day values
in a 31-day month.
The key, really, is reducing the problem to alterations among
every possible way to express a date. There aren't that many, really!
At any rate, after about 10 hours of noodling, I came up with the
following regex for validating a date.
$date =~ m%^(0?[1-9]|1[0-2])/([12][0-9]|3[01]|0?[1-9])/(19|20)(\d\d)$%;
The only problem with it is, it allows dates like 2/29/1999.
I actually implemented this in a Java servlet using a library
called Perl Tools. But when someone wanted to require that the
date be absolutely valid, I supplemented it by instanciating
a Java Calendar and checking to see if an exception were thrown.
I know, I know. Why not just skip the regex? That's a perfectly
reasonable approach. But by this point I was in too deep. Another 10
hours later, I'd noodled the solution. (The regex itself should all be one
line, but it wouldn't fit in this format. I've marked the line continuations
by //.)
$year = substr($date, length($date) -2, 2);
if (($year % 4) != 0) {
if ($date =~ m%(^(0?[13578]|1[02])/([12][0-9]|3[01]|0?[1-9])/(19|20)(\d\d)$) //
|(^(0?[469]|11)/([12][0-9]|30|0?[1-9])/(19|20)(\d\d)$) //
|(^(0?2)/(1[0-9]|2[0-8]|0?[1-9])/(19|20)(\d\d)$)%) {
print "OK!\n";
}
else {
print "$not OK\n";
}
}
else {
if ($date =~ m%(^(0?[13578]|1[02])/([12][0-9]|3[01]|0?[1-9])/(19|20)(\d\d)$) //
|(^(0?[469]|11)/([12][0-9]|30|0?[1-9])/(19|20)(\d\d)$) //
|(^(0?2)/(1[0-9]|2[0-9]|0?[1-9])/(19|20)(\d\d)$)%) {
print "OK!\n";
}
else {
print "not OK\n";
}
}
The artificial line breaks also show the basic logic here: 31-day months (1,3,5,7,8
or 10,12); 30-day months (4,6,9, or 11) and February.
Sure, it's not purely a regex. It uses arithmatic to check for leap year. So
what? Now, that leap year check could be implemented as a regex. Why
don't you show me? Oh, and by the way, it is inaccurate on one date
between 1/1/1900 and 12/31/2099. Can you guess which one?