Timezones¶
Working with timezones can be a real headache. Here are some common scenarios and how to handle them.
Note
In general, a lot of problems and ambiguity can be removed by picking a timezone and sticking to it - and converting all incoming timeseries into this timezone.
Preparations¶
We assume our data is in fr, which is a pandas.Series or pandas.DataFrame.
The frequency of the index must be set or inferrable. This means that gaps are filled (e.g. with fr.resample().asfreq()) and, if necessary, timestamps are localized (with fr.tz_localize()). In addition, each timestamp must fall at a period boundary, i.e., when dealing with daily values, the timestamps are all at midnight, and not at e.g. noon. This is all checked by the functions below.
Finally, it is assumed that the timestamps are left-bound.
Convert to timezone-aware¶
The most common case is where we have a timezone tied to a geographic location. In that case, we can do our conversions with the portfolyo.force_tzaware() function. In this example we will use the “Europe/Berlin” timezone. Here is how we can convert any timeseries (or dataframe) fr into this timezone, depending on the situation:
The index already has the correct timezone set. This can be checked with
fr.index.tz. In this case, nothing needs to be done (and if we pass it toportfolyo.force_tzaware(), it is returned as-as).The index has a timezone set (i.e., is “tz-aware”) but to a different timezone,let’s say “Japan”. In that case, we need to decide if we want to convert the series while keeping the local time or while keeping the universal time.
Keeping the local time (“floating conversion”) means that the value belonging to
2020-01-01 15:00in Japan is converted to2020-01-01 15:00in Europe/Berlin.Keeping the universal time (“non-floating conversion”) means that the
15:00value is converted to07:00in Europe/Berlin. For daily (and longer) values, this is not possible, as any given calendar day in Japan falls on two calendar days in Europe/Berlin.
For these conversions, we use the
portfolyo.force_tzaware()function and specify a value for thefloatingparameter.The index is tz-agnostic and has exactly 24h for each day. In this case, a conversion to e.g. the “Europe/Berlin” timezone is lossy if we are dealing with hourly values (and shorter) - after all,the target timezone has a missing/repeated hour at the start/end of DST. This is also done by the
portfolyo.force_tzaware()function.The index is not tz-aware, but its timestamps still imply a certain timezone and show local time (also called “wall time”) in that zone. (Again using “Europe/Berlin”, this means that it contains the timestamps
2020-03-29 00:00,01:00,03:00,..., as well as2020-10-25 00:00,01:00,02:00,02:00,03:00,.... As mentioned above, this data cannot by converted directly byportfolyo.force_tzaware(): it must first be localized with thefr.tz_localize()method thatpandasprovides.
Convert to timezone-agnostic¶
Alternatively, we might not want to deal with timezones at all, and convert any timeseries (or dataframe) fr to be “tz-agnostic”. The data is then no longer tied to a specific geographic location, and therefore do not have DST and every day has exactly 24h. In that case, we can do our conversions with the force_tzagnostic() function. Here’s how we can use this function:
The index is already “standardized”. No conversion is needed (and passing it to
force_tzagnostic()returns it as-is).The index has a timezone set. The conversion is done with
force_tzagnostic(). As above: if the source data has hourly (or shorter) values in a timezone with a DST-transition (like “Europe/Berlin”), the conversion is lossy.The index is not tz-aware, but its timestamps are the “wall time” values for a specific timezone. As above: this data cannot be converted directly but must first be localized with
fr.tz_localize().
Assumptions¶
The conversion functions assume that we are dealing with “time-averable” data, such as values in [MW] or [Eur/MWh].
Two examples:
Daily values: 10 [MW] on 2020-10-25 (which has 24h in a standardized context) is converted into 10 [MW] on 2020-10-25 in the Europe/Berlin timezone (which has 25h). If the values have a unit of [MWh] or [Eur], this is probably not what we want.
Hourly values: a value of 4 [MW] on 2020-10-25 02:00-03:00 (standardized) is duplicated to TWO values of 4 [MW] each in the Europe/Berlin timezone. This is probably what we want, regardless of the unit.
When converting “time-summable” data, such as values in [MWh] or [Eur], do the unit conversion into [MW] or [Eur/MWh] before doing the timezone conversion.