pyPhenology.models.WeightedEnsemble

class pyPhenology.models.WeightedEnsemble(core_models)[source]

Fit an ensemble of many models with associated weights

This model can combine multiple models into an ensemble where predictions are the weighted average of the predictions from each model. The weights are derived via “stacking” as described in Dormann et al. 2018. The steps are as followed:

  1. Subset the data into random training/testing sets.
  2. Fit each core model on the training set.
  3. Make predictions on the testing set.
  4. Find the weights which minimize RMSE of the testing set.
  5. Repeat 1-4 for H iterations.
  6. Take the average weight for each model from all iterations as final weight used in the ensemble. These will sum to 1.
  7. Fit the core models a final time on the full dataset given to the fit() method. Parameters derived from this final iterations will be used to make predictions.

Note that the core models must be passed initialized. They will be fit within the Weighted Ensemble model:

from pyPhenology import models, utils
observations, predictors = utils.load_test_data(name='vaccinium')

m1 = models.Thermaltime(parameters={'T':0})
m2 = models.Thermaltime(parameters={'T':5})
m3 = models.Thermaltime(parameters={'T':-5})
m4 = models.Thermaltime(parameters={'T':10})
m5 = models.Uniforc(parameters={'t1':1})
m6 = models.Uniforc(parameters={'t1':30})
m7 = models.Uniforc(parameters={'t1':60})

ensemble = models.WeightedEnsemble(core_models=[m1,m2,m3,m4,m5,m6,m7])
ensemble.fit(observations, predictors)
Notes:
Dormann, Carsten F., et al. Model averaging in ecology: a review of Bayesian, information‐theoretic and tactical approaches for predictive inference. Ecological Monographs. https://doi.org/10.1002/ecm.1309
__init__(core_models)[source]

Weighted Ensemble model

core_models : list of pyPhenology models, or a saved model file

Methods

__init__(core_models) Weighted Ensemble model
ensemble_shape([shape]) Returns a tuple signifying the layers of submodels ie.
fit(observations, predictors[, iterations, …]) Fit the underlying core models
get_params()
get_weights()
predict([to_predict, predictors, …]) Make predictions..
save_params(filename[, overwrite]) Save model parameters
score([metric, doy_observed, to_predict, …]) Get the scoring metric for fitted data Get the score on the dataset used for fitting (if fitting was done), otherwise set to_predict, and predictors as used in model.predict().
fit(observations, predictors, iterations=10, held_out_percent=0.2, loss_function='rmse', method='DE', optimizer_params='practical', n_jobs=1, verbose=False, debug=False)[source]

Fit the underlying core models

Parameters:
observations : dataframe
pandas dataframe of phenology observations
predictors : dataframe
pandas dataframe of associated predictors
iterations : int
Number of stacking iterations to use.
held_out_percent : float
Percent of randomly held out data to use in each stacking iteration. Must be between 0 and 1.
n_jobs : int
number of parallel processes to use
kwargs :
Other arguments passed to core model fitting (eg. optimzer methods)
predict(to_predict=None, predictors=None, aggregation='mean', n_jobs=1, **kwargs)[source]

Make predictions..

Predictions will be made using each core models, then a final average model derrived using the fitted weights.

Parameters:

see core model description

aggregation : str
Either ‘weighted_mean’ to get a normal prediciton, or ‘none’ to get predictions for all models. If using ‘none’ this returns a tuple of (weights, predictions).
n_jobs : int
number of parallel processes to use