Day 28 - Python - Strings 3

Extract and Validate Dates from Text

You are given a block of text that may contain dates in various formats. Your task is to extract and validate these dates using the following criteria:

  1. The date must be in one of the following formats:

    • DD-MM-YYYY
    • MM/DD/YYYY
    • YYYY.MM.DD
    • Month Day, Year (e.g., January 10, 2025)
  2. The month should be a valid month name or number (e.g., January, February, 03, 04, etc.).

  3. The day should be a valid day number for the given month.

  4. The year should be a valid 4-digit number.

If the date is valid, extract and print the date in YYYY-MM-DD format. If invalid, print Invalid Date.

Input Format:

  • The first line contains an integer nn, the number of lines of text.
  • The next nn lines each contain a string of text that may contain one or more dates.

Output Format:

For each line of text:

  • If a valid date is found, print the date in the format YYYY-MM-DD.
  • If no valid date is found or the date is invalid, print: Invalid Date

Sample Input 1:

3 Today is 10-01-2025, and tomorrow will be 01/11/2025. The event will be held on July 07, 2025. Invalid date: 2025.13.32

Sample Output 1:

2025-01-10
2025-01-11 2025-07-07 Invalid Date

Sample Input 2:

5 Dhoni's birthday is on 07-07-2025, mark your calendars! Diwali will fall on 11/12/2025 this year. The new year begins on January 1, 2026. The aliens will arrive on 30/02/2025, be prepared! The invasion starts on 2025.13.45, brace yourselves.

Sample Output 2:

2025-07-07 2025-11-12 2026-01-01 Invalid Date Invalid Date

Explanaton:

  1. Regex Patterns:

    • (\d{2})-(\d{2})-(\d{4}) matches DD-MM-YYYY format.
    • (\d{2})/(\d{2})/(\d{4}) matches MM/DD/YYYY format.
    • (\d{4})\.(\d{2})\.(\d{2}) matches YYYY.MM.DD format.
    • ([A-Za-z]+) (\d{1,2}), (\d{4}) matches Month Day, Year format.
  2. Validation:

    • For DD-MM-YYYYMM/DD/YYYY, and YYYY.MM.DD, the code checks if the day is valid for the given month.
    • For the Month Day, Year format, the month name is matched against a predefined list of valid months, and the day is validated against the month.
  3. Output:

    • The valid date is printed in YYYY-MM-DD format.
    • If no valid date is found or the date is invalid, Invalid Date is printed.

Python Code:

import re month_names = { "January": 1, "February": 2, "March": 3, "April": 4, "May": 5,
    "June": 6,"July": 7, "August": 8, "September": 9, "October": 10,
    "November": 11, "December": 12 } days_in_month = { 1: 31, 2: 28, 3: 31, 4: 30, 5: 31, 6: 30, 7: 31, 8: 31, 9: 30, 10: 31, 11: 30, 12: 31 } patterns = [ r"(\d{2})-(\d{2})-(\d{4})", # DD-MM-YYYY r"(\d{2})/(\d{2})/(\d{4})", # MM/DD/YYYY r"(\d{4})\.(\d{2})\.(\d{2})", # YYYY.MM.DD r"([A-Za-z]+) (\d{1,2}), (\d{4})" # Month Day, Year ] def validate_and_extract_dates(n, text_lines): for line in text_lines: found = False for pattern in patterns:

            # Find all matching date patterns in the line matches = re.findall(pattern, line)
             for match in matches: found = True
# Check if the pattern is DD-MM-YYYY or MM/DD/YYYY or YYYY.MM.DD if pattern == patterns[0] or pattern == patterns[1]
                        or pattern == patterns[2]: day, month, year = map(int, match) if pattern == patterns[1]: # MM/DD/YYYY format if 1 <= month <= 12 and
                             1 <= day <= days_in_month.get(month, 31): print(f"{year:04d}-{day:02d}-{month:02d}")
else: print("Invalid Date")
else: # Handle DD-MM-YYYY and YYYY.MM.DD formats normally
if 1 <= month <= 12 and
                            1 <= day <= days_in_month.get(month, 31): print(f"{year:04d}-{month:02d}-{day:02d}") else: print("Invalid Date") # Check if the pattern is Month Day, Year (e.g., July 07, 2025) elif pattern == patterns[3]: month_name, day, year = match month = month_names.get(month_name, -1) if month != -1 and 1 <= int(day)
                             <= days_in_month.get(month, 31): print(f"{year}-{month:02d}-{int(day):02d}") else: print("Invalid Date") if not found: print("Invalid Date") n = int(input()) text_lines = [input().strip() for _ in range(n)] validate_and_extract_dates(n, text_lines)

Insights:

  • Date Format Flexibility: The code supports multiple date formats, such as DD-MM-YYYY, MM/DD/YYYY, YYYY.MM.DD, and Month Day, Year, providing flexibility for various input scenarios.
  • Regex Matching: The use of regular expressions allows the extraction of date patterns from strings, making it easy to handle different date formats in a concise manner.

  • Pattern-Based Validation: Each date format has a specific validation rule associated with it, ensuring that both numeric and textual representations of dates are processed correctly.

  • Month Validation: The code ensures that the month is within the valid range (1-12), preventing dates with non-existent months from being accepted.

  • Day Validation: The day is validated against the number of days in a specific month, accounting for variations such as leap years in February.

  • Leap Year Consideration: February is correctly validated for leap years (28 days in 2025), ensuring that the code can handle date validation in different years.

  • Multiple Dates Per Line: The code can handle multiple dates in a single line by using re.findall(), which extracts all matching date patterns, allowing for efficient processing of complex input.

  • Error Handling: Invalid dates (such as those with out-of-range months or days) are identified and marked as "Invalid Date," ensuring clear feedback to users.

  • Scalability: This approach can be extended to include additional date formats by adding new patterns, making it scalable for future enhancements or changes in date formatting requirements.

  • Readability: The code uses clear and structured patterns with well-defined validation rules, making it easy to understand and maintain, even for complex date input scenarios.

Just like validating dates, life requires us to adapt to different formats and face challenges, but with the right checks and perspective, we can ensure we stay on track. Stay flexible in approach, but always validate your choices for a well-structured future. 🗓️🧘

Comments