'udate' date format (underscore date)

Essentially: udate is ISO8601 with udate_ prefix. And _ instead of -. Now extended to support timezones with a double underscore (__), followed by p for positive or n for negative UTC offsets.

Purpose:
The udate format is a custom date format that uses underscores (_) as delimiters instead of dashes and prefixes udate_. This format is particularly useful for tools like jq, which may not handle dashes (-) or strings starting with a number well in certain contexts, such as when used as JSON keys.


Key Idea:

The udate format follows a structure similar to ISO 8601 but replaces dashes with underscores and always begins with the prefix udate_, making it more friendly to parsers and tools that might otherwise interpret dashes or numeric strings in unexpected ways. Before converting back into ISO 8601 format, the udate_ prefix must be removed, and underscores can then be replaced with the proper ISO-compliant delimiter (e.g., dashes or no delimiters in the basic format). Timezones are added at the end with a double underscore (__) followed by p (for positive offsets) or n (for negative offsets), and the four-digit UTC offset (e.g., 0700 for UTC+07:00).

Note: The n (negative) is used instead of m to avoid confusion with months or minutes.

Example:

  • Date (Year-Month-Day):
    udate_2024_01_01
    This corresponds to January 1st, 2024.

  • Date with Time and Positive Timezone (Year-Month-DayTHour:Minute:__pTimezone):
    udate_2020_11_02_20_00__p0700
    This corresponds to November 2nd, 2020, at 20:00, with a timezone offset of UTC+07:00.

  • Date with Time and Negative Timezone (Year-Month-DayTHour:Minute:__nTimezone):
    udate_2020_11_02_20_00__n0800
    This corresponds to November 2nd, 2020, at 20:00, with a timezone offset of UTC-08:00 (Pacific Standard Time, PST).


Motivation:

Motivation

The motivation for creating the udate format arises from limitations encountered in tools such as jq, which struggles with keys containing dashes or numeric strings starting a key. By using underscores as delimiters and prefixing with udate_:

  1. Tool Compatibility:
    Many JSON-based tools, including jq, handle underscores better than dashes and have issues with numeric keys. By using the udate_ prefix, the format becomes a convenient way to avoid these pitfalls when parsing or processing JSON with dates as keys.

  2. Preprocessing Flexibility:
    The format can be easily preprocessed to convert the underscores into ISO 8601-compliant dashes, allowing for full compatibility with existing standards and systems that expect ISO 8601 dates.

  3. Clarity:
    Using a clear prefix like udate_ ensures that it is recognizable as a special format. This helps avoid confusion or misinterpretation when working in multi-step data workflows that include intermediate non-ISO date encodings.


Preprocessing Strategy:

To convert a udate back into ISO 8601 format, the following steps need to be followed:

  1. Remove the udate_ prefix.
  2. Replace underscores with dashes (or remove them for ISO 8601's basic format).
  3. For formats that include a timezone (indicated by __p for positive or __n for negative), convert the timezone to the ISO 8601 ±hh:mm format.

Example in bash:

processed_date=$(echo "$udate" | sed 's/^udate_//' | sed 's/_/-/g' | sed 's/__p\([0-9]\{4\}\)$/+\1/g' | sed 's/__n\([0-9]\{4\}\)$/-\1/g')

This will convert udate_2020_11_02_20_00__n0800 to 2020-11-02T20:00-08:00.


Formalizing the udate Format

Prefix:
The udate format always begins with the prefix udate_ to clearly indicate that the following string represents a date or time format. This prefix distinguishes it from other types of data and makes it clear that the format is non-ISO compliant but convertible.

Delimiter:
In the udate format, underscores (_) replace the dashes (-) used in the extended ISO 8601 format. This substitution ensures compatibility with tools that might misinterpret or mishandle dashes or strings starting with a number, such as when using the format in JSON keys or certain command-line tools.

Timezone Delimiter:
A double underscore (__) is used to separate the date or time from the timezone. The timezone is expressed as a four-digit offset from UTC, prefixed by p for positive or n for negative, to avoid confusion with months or minutes.


udate Format Structure:

  1. Date (Year-Month-Day):
    Format: udate_YYYY_MM_DD

    • Example: udate_2024_01_01 for January 1st, 2024.
  2. Date with Time (Year-Month-DayTHour:Minute:Second):
    Format: udate_YYYY_MM_DDTHH_MM_SS

    • Example: udate_2024_01_01T12_30_45 for January 1st, 2024, at 12:30:45 PM.
  3. Date with Time and Timezone:
    Format: udate_YYYY_MM_DDTHH_MM__pTimezone or udate_YYYY_MM_DDTHH_MM__nTimezone

    • Example: udate_2020_11_02_20_00__p0700 for November 2nd, 2020, 20:00, with UTC+07:00.
    • Example: udate_2020_11_02_20_00__n0800 for November 2nd, 2020, 20:00, with UTC-08:00.
  4. Week (Year-Week-Day):
    Format: udate_YYYY_Www_D

    • Example: udate_2024_W01_1 for the first Monday of the first week of 2024.
  5. Day of the Year (Year-DayOfYear):
    Format: udate_YYYY_DDD

    • Example: udate_2024_032 for February 1st, 2024 (the 32nd day of the year).
  6. Time (Hour:Minute:Second):
    Format: udate_THH_MM_SS

    • Example: udate_T12_30_45 for 12:30:45 PM.

Use Case:

  • When working in environments that require compatibility with JSON-based tools like jq.
  • When storing dates in JSON files where dashes might cause issues with key lookups or parsing.
  • When needing to ensure date strings are jq-friendly while preserving flexibility to be ISO-compliant.
  • When handling timezone-sensitive data, ensuring the correct interpretation of time offsets, while avoiding confusion with other time-related fields.

Summary of Formats:

  • udate_YYYY_MM_DD → Standard date.
  • udate_YYYY_Www_D → Week-based date.
  • udate_THH_MM_SS → Time only.
  • udate_YYYY_DDD → Day of the year.
  • udate_YYYY_MM_DDTHH_MM_SS → Full date-time with time.
  • udate_YYYY_MM_DDTHH_MM__pTimezone → Full date-time with positive timezone.
  • udate_YYYY_MM_DDTHH_MM__nTimezone → Full date-time with negative timezone.

Children
  1. Motivation