Data Structure¶

Your data must be structured in a specific way to be used in the package.

Phenology Observation Data¶

Observation data consists of the following

doy: These are the julian date (1-365) of when a specific phenological event happened.
site_id: A site identifier for each doy observation
year: A year identifier for each doy observation

These should be structured in columns in a pandas data.frame, where every row is a single observation. For example the built in vaccinium dataset looks like this:

from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')

obserations.head()

                species  site_id  year  doy  phenophase
0  vaccinium corymbosum        1  1991  100         371
1  vaccinium corymbosum        1  1991  100         371
2  vaccinium corymbosum        1  1991  104         371
3  vaccinium corymbosum        1  1998  106         371
4  vaccinium corymbosum        1  1998  106         371

There are extra columns here for the species and phenophase, those will be ignored inside the pyPhenology package.

Phenology Environmental Data¶

The majority of the models use only daily mean temperature as a driver. This is required for for every day of the winter and spring leading up to the phenophase event. The predictors data.frame should have the following structure.

site_id: A site identifier for each location.
year: The year of the temperature timeseries
temperature: The observed daily mean temperature in degrees Celcius.
doy: The julian date of the mean temperature

These should columns be in a data.frame like the observations. The example vaccinium dataset has temperature observations:

predictors.head()
    site_id  temperature  year  doy  latitude  longitude  daylength
      1        -3.86  1989    0   42.5429   -72.2011       8.94
      1        -4.71  1989    1   42.5429   -72.2011       8.95
      1        -1.56  1989    2   42.5429   -72.2011       8.97
      1        -7.88  1989    3   42.5429   -72.2011       8.98
      1       -15.24  1989    4   42.5429   -72.2011       9.00

Note than any other columns in the predictors data.frame besides the ones used will be ignored.

Currently two other models use other predictors besides daily mean temerature. The M1 uses daylength as a predictor as well as daily mean temperature. The predictors data.frame should thus have a daylength column in addition to the temperature as shown above.

The Naive model uses only latitude in it’s calculation and thus requires a predictors data.frame with the latitude for every site. For example:

predictors.head()

   site_id   latitude
    258  39.184269
    414  44.277962
    475  47.027077
    637  44.340950
    681  41.296783

On the Julian Date¶

The julian date (usually referenced as DOY for “day of year”) is used throughout the package. This can be negative if referencing something from the prior season. For example consider the following data from the aspen dataset:

predictors.head()

   site_id  temperature  year  doy   latitude   longitude  daylength
    258         6.28  2009  -67  39.184269 -106.854614      10.52
    414         8.12  2009  -67  44.277962  -70.315315      10.22
    475         5.30  2009  -67  47.027077 -114.049248      10.04
    637         8.30  2009  -67  44.340950  -72.461220      10.22
    681         9.85  2009  -67  41.296783 -105.574600      10.40

The doy -67 here refers to Oct. 26 for the growing year 2009. Formating dates in this fashion allows for a continuous range of numbers across years, and is common in phenology studies.

January 1 will always be DOY 1.

Notes¶

If you have only a single site, make a “dummy” site_id column set to 1 for both temperature and observation dataframes.
If you have only a single year then it still must be represented in the year column of both data.frames.