Data Structure

Your data must be structured in a specific way to be used in the package.

Phenology Observation Data

Observation data consists of the following

  • doy: These are the julian date (1-365) of when a specific phenological event happened.
  • site_id: A site identifier for each doy observation
  • year: A year identifier for each doy observation

These should be structured in columns in a pandas data.frame, where every row is a single observation. For example the built in vaccinium dataset looks like this:

from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')

obserations.head()

                species  site_id  year  doy  phenophase
0  vaccinium corymbosum        1  1991  100         371
1  vaccinium corymbosum        1  1991  100         371
2  vaccinium corymbosum        1  1991  104         371
3  vaccinium corymbosum        1  1998  106         371
4  vaccinium corymbosum        1  1998  106         371

There are extra columns here for the species and phenophase, those will be ignored inside the pyPhenology package.

Phenology Environmental Data

The majority of the models use only daily mean temperature as a driver. This is required for for every day of the winter and spring leading up to the phenophase event. The predictors data.frame should have the following structure.

  • site_id: A site identifier for each location.
  • year: The year of the temperature timeseries
  • temperature: The observed daily mean temperature in degrees Celcius.
  • doy: The julian date of the mean temperature

These should columns be in a data.frame like the observations. The example vaccinium dataset has temperature observations:

predictors.head()
    site_id  temperature  year  doy  latitude  longitude  daylength
0        1        -3.86  1989    0   42.5429   -72.2011       8.94
1        1        -4.71  1989    1   42.5429   -72.2011       8.95
2        1        -1.56  1989    2   42.5429   -72.2011       8.97
3        1        -7.88  1989    3   42.5429   -72.2011       8.98
4        1       -15.24  1989    4   42.5429   -72.2011       9.00

Note than any other columns in the predictors data.frame besides the ones used will be ignored.

Currently two other models use other predictors besides daily mean temerature. The M1 uses daylength as a predictor as well as daily mean temperature. The predictors data.frame should thus have a daylength column in addition to the temperature as shown above.

The Naive model uses only latitude in it’s calculation and thus requires a predictors data.frame with the latitude for every site. For example:

predictors.head()

   site_id   latitude
0      258  39.184269
1      414  44.277962
2      475  47.027077
3      637  44.340950
4      681  41.296783

On the Julian Date

The julian date (usually referenced as DOY for “day of year”) is used throughout the package. This can be negative if referencing something from the prior season. For example consider the following data from the aspen dataset:

predictors.head()

   site_id  temperature  year  doy   latitude   longitude  daylength
0      258         6.28  2009  -67  39.184269 -106.854614      10.52
1      414         8.12  2009  -67  44.277962  -70.315315      10.22
2      475         5.30  2009  -67  47.027077 -114.049248      10.04
3      637         8.30  2009  -67  44.340950  -72.461220      10.22
4      681         9.85  2009  -67  41.296783 -105.574600      10.40

The doy -67 here refers to Oct. 26 for the growing year 2009. Formating dates in this fashion allows for a continuous range of numbers across years, and is common in phenology studies.

January 1 will always be DOY 1.

Notes

  • If you have only a single site, make a “dummy” site_id column set to 1 for both temperature and observation dataframes.
  • If you have only a single year then it still must be represented in the year column of both data.frames.