| Title: | Bayesian Dynamic Systems Modeling |
|---|---|
| Description: | Implements methods for building and analyzing models based on panel data as described in the paper by Moral-Benito (2013, <doi:10.1080/07350015.2013.818003>). The package provides functions to estimate dynamic panel data models and analyze the results of the estimation. |
| Authors: | Mateusz Wyszynski [aut], Marcin Dubel [ctb, cre], Krzysztof Beck [ctb] |
| Maintainer: | Marcin Dubel <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0 |
| Built: | 2026-05-28 08:27:38 UTC |
| Source: | https://github.com/cran/bdsm |
This function creates a ranking of best models according to one of the possible criterion (PMP under binomial model prior, PMP under binomial-beta model prior, R^2 under binomial model prior, R^2 under binomial-beta model prior). The function gives two types of tables in three different formats: inclusion table (where 1 indicates presence of the regressor in the model and 0 indicates that the variable is excluded from the model) and estimation results table (it displays the best models and estimation output for those models: point estimates, standard errors, significance level, and R^2).
best_models( bma_list, criterion = 1, best = 5, round = 3, estimate = TRUE, robust = TRUE )best_models( bma_list, criterion = 1, best = 5, round = 3, estimate = TRUE, robust = TRUE )
bma_list |
bma object (the result of the bma function) |
criterion |
The criterion that will be used for a basis of the model ranking: |
best |
The number of the best models to be considered |
round |
Parameter indicating the decimal place to which number in the tables should be rounded (default round = 3) |
estimate |
A parameter with values TRUE or FALSE indicating which table should be displayed when
TRUE - table with estimation to the results |
robust |
A parameter with values TRUE or FALSE indicating which type of stanrdard errors should be displayed
when the function finishes calculations. Works only if estimate = TRUE. Works well when best is small. |
A list with best_models objects:
matrix with inclusion of the regressors in the best models
matrix with estimation output in the best models with regular standard errors
matrix with estimation output in the best models with robust standard errors
knitr_kable table with inclusion of the regressors in the best models (the best for the display on the console - up to 11 models)
knitr_kable table with estimation output in the best models with regular standard errors (the best for the display on the console - up to 6 models)
knitr_kable table with estimation output in the best models with robust standard errors (the best for the display on the console - up to 6 models)
gTree table with inclusion of the regressors in the best models (displayed as a plot). Use grid::grid.draw() to display.
gTree table with estimation output in the best models with regular standard errors (displayed as a plot). Use grid::grid.draw() to display.
gTree table with estimation output in the best models with robust standard errors (displayed as a plot). Use grid::grid.draw() to display.
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) best_5_models <- best_models(bma_results, criterion = 1, best = 5, estimate = TRUE, robust = TRUE)library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) best_5_models <- best_models(bma_results, criterion = 1, best = 5, estimate = TRUE, robust = TRUE)
This function calculates BMA statistics based on the provided model space. Other objects for furhter analysis are also returned.
bma(model_space, df, round = 4, EMS = NULL, dilution = 0, dil.Par = 0.5)bma(model_space, df, round = 4, EMS = NULL, dilution = 0, dil.Par = 0.5)
model_space |
List with params and stats from the model space |
df |
Data frame with data for the SEM analysis. |
round |
Parameter indicating the decimal place to which number in the BMA tables and prior and posterior model sizes should be rounded (default round = 4) |
EMS |
Expected model size for model binomial and binomial-beta model prior |
dilution |
Binary parameter: 0 - NO application of a dilution prior; 1 - application of a dilution prior (George 2010). |
dil.Par |
Parameter associated with dilution prior - the exponent of the determinant (George 2010). Used only if parameter dilution = 1. |
A list with 16 elements.
A table containing the results based on the binomialmodel prior.
A table containing the results based on the binomial-beta model prior.
A vector containing the names of the regressors, used by the functions.
The total number of regressors.
The number of models present in the model space.
A table containing model IDs and posterior model probabilities (PMPs) for the jointness function.
A table containing model IDs, PMPs, coefficients, standard deviations,and standardized regression coefficients (stdRs) for the best_models function.
The expected model size for the binomial and binomial-beta model priors, as specified by the user (default is EMS = R/2).
A table of uniform and random model priors distributed over model sizes for the model_sizes function.
A table containing the posterior model probabilities for use in the model_sizes function.
A table containing the model priors, used by the model_pmp function.
A parameter indicating whether the priors were diluted, used in the model_sizes function.
A vector of coefficients for the lagged dependent variable in the coef_hist function.
A vector of nonzero coefficients for the regressors in the coef_hist function.
A table containing the degrees of freedom for the estimated models in the best_models function.
A table containing the prior and posterior expected model sizes for the binomial and binomial-beta model priors.
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 )library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 )
This function draws graphs of the distribution (in the form of histogram or kernel density) of the coefficients for all the considered regressors over the part of the model space that includes this regressors (half of the model space).
bma_list |
bma object (the result of the bma function) |
BW |
Parameter indicating what method should be chosen to find bin widths for the histograms:
|
binW |
A vector with bin widths to be used to construct histograms for the regressors. The vector must be of the size equal to total number of regressors. The vector with bin widths is used only if parameter BW="vec". |
BN |
Parameter taking the values (default: BN = 0): |
num |
A vector with the numbers of bins used to be used to construct histograms for the regressors. The vector must be of the size equal to total number of regressors. The vector with bin widths is used only if parameter BN=1. |
kernel |
A parameter taking the values (default: kernel = 0): |
A list with the graphs of the distribution of coefficients for all the considered regressors.
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) coef_plots <- coef_hist(bma_results, kernel = 1)library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) coef_plots <- coef_hist(bma_results, kernel = 1)
Approximate standard deviations are computed for the models in the given model space. Two versions are computed.
compute_model_space_stats( df, dep_var_col, timestamp_col, entity_col, params, exact_value = FALSE, model_prior = "uniform", cl = NULL )compute_model_space_stats( df, dep_var_col, timestamp_col, entity_col, params, exact_value = FALSE, model_prior = "uniform", cl = NULL )
df |
Data frame with data for the SEM analysis. |
dep_var_col |
Column with the dependent variable |
timestamp_col |
The name of the column with timestamps |
entity_col |
Column with entities (e.g. countries) |
params |
A matrix (with named rows) with each column corresponding to a model. Each column specifies model parameters. Compare with optim_model_space_params |
exact_value |
Whether the exact value of the likelihood should be
computed ( |
model_prior |
Which model prior to use. For now there are two options:
|
cl |
An optional cluster object. If supplied, the function will use this
cluster for parallel processing. If |
Matrix with columns describing likelihood and standard deviations for each model. The first row is the likelihood for the model (computed using the parameters in the provided model space). The second row is almost 1/2 * BIC_k as in Raftery's Bayesian Model Selection in Social Research eq. 19 (see TODO in the code below). The third row is model posterior probability. Then there are rows with standard deviations for each parameter. After that we have rows with robust standard deviation (not sure yet what exactly "robust" means).
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) compute_model_space_stats( df = data_prepared, dep_var_col = gdp, timestamp_col = year, entity_col = country, params = small_model_space$params )library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) compute_model_space_stats( df = data_prepared, dep_var_col = gdp, timestamp_col = year, entity_col = country, params = small_model_space$params )
Data used in Growth Empirics in Panel Data under Model Uncertainty and Weak Exogeneity (Moral-Benito, 2016, Journal of Applied Econometrics).
economic_growtheconomic_growth
economic_growthA data frame with 365 rows and 12 columns (73 countries and 4 periods + extra one for lagged dependent variable):
Year
Country ID
Logarithm of GDP per capita (2000 US dollars at PP)
Ratio of real domestic investment to GDP
Stock of years of secondary education in the total population
Average growth rate of population
Population in millions of people
Purchasing-power-parity numbers for investment goods
Exports plus imports as a share of GDP
Ratio of government consumption to GDP
Logarithm of the life expectancy at birth
Composite index given by the democracy score minus the autocracy score
http://qed.econ.queensu.ca/jae/datasets/moral-benito001/
Create matrix which contains exogenous variables used in the Simultaneous Equations Model (SEM) representation. Currently these are: dependent variable from the lowest time stamp and regressors from the second lowest time stamp. The matrix is then used to compute likelihood for SEM analysis.
exogenous_matrix(df, timestamp_col, entity_col, dep_var_col)exogenous_matrix(df, timestamp_col, entity_col, dep_var_col)
df |
Data frame with data for the SEM analysis. |
timestamp_col |
Column which determines time periods. For now only natural numbers can be used as timestamps |
entity_col |
Column which determines entities (e.g. countries, people) |
dep_var_col |
Column with dependent variable |
Matrix of size N x k+1 where N is the number of entities considered and k is the number of chosen regressors
set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) exogenous_matrix(df, times, entities, dep_var)set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) exogenous_matrix(df, times, entities, dep_var)
This function performs feature standardization (also known as z-score normalization) by centering the features around their mean and scaling by their standard deviation.
feature_standardization(df, excluded_cols, group_by_col, scale = TRUE)feature_standardization(df, excluded_cols, group_by_col, scale = TRUE)
df |
Data frame with the data. |
excluded_cols |
Unquoted column names to exclude from standardization. If missing, all columns are standardized. |
group_by_col |
Unquoted column names to group the data by before applying standardization. If missing, no grouping is performed. |
scale |
Logical. If |
A data frame with standardized features.
df <- data.frame( year = c(2000, 2001, 2002, 2003, 2004), country = c("A", "A", "B", "B", "C"), gdp = c(1, 2, 3, 4, 5), ish = c(2, 3, 4, 5, 6), sed = c(3, 4, 5, 6, 7) ) # Standardize every column df_with_only_numeric_values <- df[, setdiff(names(df), "country")] feature_standardization(df_with_only_numeric_values) # Standardize all columns except 'country' feature_standardization(df, excluded_cols = country) # Standardize across countries (grouped by 'country') feature_standardization(df, group_by_col = country) # Standardize, excluding 'country' and group-wise by 'year' feature_standardization(df, excluded_cols = country, group_by_col = year)df <- data.frame( year = c(2000, 2001, 2002, 2003, 2004), country = c("A", "A", "B", "B", "C"), gdp = c(1, 2, 3, 4, 5), ish = c(2, 3, 4, 5, 6), sed = c(3, 4, 5, 6, 7) ) # Standardize every column df_with_only_numeric_values <- df[, setdiff(names(df), "country")] feature_standardization(df_with_only_numeric_values) # Standardize all columns except 'country' feature_standardization(df, excluded_cols = country) # Standardize across countries (grouped by 'country') feature_standardization(df, group_by_col = country) # Standardize, excluding 'country' and group-wise by 'year' feature_standardization(df, excluded_cols = country, group_by_col = year)
A list with multiple elements summarising the BMA analysis
full_bma_resultsfull_bma_results
An object of class list of length 16.
A list with two elements: params and stats computed using the
optim_model_space function and the economic_growth dataset.
full_model_spacefull_model_space
full_model_spaceA list with two elements.
A double matrix with 40 rows and 2^9 = 512 columns with the
parameters for the model space.
Each column represents a different model.
A matrix representing the statistics computed with
compute_model_space_stats based on params.
The first row contains likelihoods for the models.
The second row are almost 1/2 * BIC_k
as in Raftery's Bayesian Model Selection in Social Research, eq. 19.
The rows 3-7 are standard deviations.
Finally, the rows 8-12 are robust standard deviations.
Creates the hessian matrix for a given likelihood function.
hessian(lik, theta, ...)hessian(lik, theta, ...)
lik |
function |
theta |
kx1 matrix |
... |
other parameters passed to |
Hessian kxk matrix where k is the number of parameters included in the theta matrix
lik <- function(theta) { return(theta[1]^2 + theta[2]^2) } hessian(lik, c(1, 1))lik <- function(theta) { return(theta[1]^2 + theta[2]^2) } hessian(lik, c(1, 1))
This function builds a representation of the model space, by creating a
dataframe where each column represents values of the parameters for a given
model. Real value means that the parameter is included in the model. A
parameter not present in the model is marked as NA.
init_model_space_params( df, timestamp_col, entity_col, dep_var_col, init_value = 1 )init_model_space_params( df, timestamp_col, entity_col, dep_var_col, init_value = 1 )
df |
Data frame with data for the SEM analysis. |
timestamp_col |
Column which determines time periods. For now only natural numbers can be used as timestamps |
entity_col |
Column which determines entities (e.g. countries, people) |
dep_var_col |
Column with dependent variable |
init_value |
Initial value for parameters present in the model. Default
is |
Currently the set of features is assumed to be all columns which remain after
excluding timestamp_col, entity_col and dep_var_col.
A power set of all possible exclusions of linear dependence on the given feature is created, i.e. if there are 4 features we end up with 2^4 possible models (for each model we independently decide whether to include or not a feature).
matrix of model parameters
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:5] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) init_model_space_params(data_prepared, year, country, gdp)library(magrittr) data_prepared <- bdsm::economic_growth[, 1:5] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) init_model_space_params(data_prepared, year, country, gdp)
This function allows to turn data in the format with lagged values for a chosen column (i.e. there are two columns with the same quantity, but one column is lagged in time) into the format with just one column
join_lagged_col( df, col, col_lagged, timestamp_col, entity_col, timestep = NULL )join_lagged_col( df, col, col_lagged, timestamp_col, entity_col, timestep = NULL )
df |
Dataframe with data with a column with lagged values |
col |
Column with quantity not lagged |
col_lagged |
Column with the same quantity as |
timestamp_col |
Column with timestamps (e.g. years) |
entity_col |
Column with entities (e.g. countries) |
timestep |
Difference between timestamps (e.g. 10) |
A dataframe with two columns merged, i.e. just one column with the desired quantity is left.
df <- data.frame( year = c(2000, 2001, 2002, 2003, 2004), country = c("A", "A", "B", "B", "C"), gdp = c(1, 2, 3, 4, 5), gdp_lagged = c(NA, 1, 2, 3, 4) ) join_lagged_col(df, gdp, gdp_lagged, year, country, 1)df <- data.frame( year = c(2000, 2001, 2002, 2003, 2004), country = c("A", "A", "B", "B", "C"), gdp = c(1, 2, 3, 4, 5), gdp_lagged = c(NA, 1, 2, 3, 4) ) join_lagged_col(df, gdp, gdp_lagged, year, country, 1)
This function calculates four types of the jointness measures based on the posterior model probabilities calculated using binomial and binomial-beta model prior. The four measures are:
HCGHM - for Hofmarcher et al. (2018) measure;
LS - for Ley & Steel (2007) measure;
DW - for Doppelhofer & Weeks (2009) measure;
PPI - for posterior probability of including both variables.
The measures under binomial model prior will appear in a table above the diagonal, and the measure calculated under binomial-beta model prior below the diagonal.
REFERENCES
Doppelhofer G, Weeks M (2009) Jointness of growth determinants. Journal of Applied Econometrics., 24(2), 209-244. doi: 10.1002/jae.1046
Hofmarcher P, Crespo Cuaresma J, Grün B, Humer S, Moser M (2018) Bivariate jointness measures in Bayesian Model Averaging: Solving the conundrum. Journal of Macroeconomics, 57, 150-165. doi: 10.1016/j.jmacro.2018.05.005
Ley E, Steel M (2007) Jointness in Bayesian variable selection with applications to growth regression. Journal of Macroeconomics, 29(3), 476-493. doi: 10.1016/j.jmacro.2006.12.002
jointness(bma_list, measure = "HCGHM", rho = 0.5, round = 3)jointness(bma_list, measure = "HCGHM", rho = 0.5, round = 3)
bma_list |
bma object (the result of the bma function) |
measure |
Parameter for choosing the measure of jointness: |
rho |
The parameter "rho" ( |
round |
Parameter indicating the decimal place to which the jointness measures should be rounded (default round = 3). |
A table with jointness measures for all the pairs of regressors used in the analysis. The results obtained with the binomial model prior are above the diagonal, while the ones obtained with the binomial-beta prior are below.
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) jointness_table <- jointness(bma_results, measure = "HCGHM", rho = 0.5, round = 3)library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) jointness_table <- jointness(bma_results, measure = "HCGHM", rho = 0.5, round = 3)
List of matrices for SEM model
matrices_from_df( df, timestamp_col, entity_col, dep_var_col, lin_related_regressors = NULL, which_matrices = c("Y1", "Y2", "Z", "cur_Y2", "cur_Z", "res_maker_matrix") )matrices_from_df( df, timestamp_col, entity_col, dep_var_col, lin_related_regressors = NULL, which_matrices = c("Y1", "Y2", "Z", "cur_Y2", "cur_Z", "res_maker_matrix") )
df |
Dataframe with data for the likelihood computations. |
timestamp_col |
Column which determines time stamps. For now only natural numbers can be used. |
entity_col |
Column which determines entities (e.g. countries, people) |
dep_var_col |
Column with dependent variable |
lin_related_regressors |
Vector of strings of column names. Which subset of regressors is in non trivial
linear relation with the dependent variable ( |
which_matrices |
character vector with names of matrices which should be
computed. Possible matrices are
|
Named list with matrices as its elements
matrices_from_df(economic_growth, year, country, gdp, c("pop", "sed"), c("Y1", "Y2"))matrices_from_df(economic_growth, year, country, gdp, c("pop", "sed"), c("Y1", "Y2"))
This function draws four graphs of prior and posterior model probabilities for the best individual models:
a) The results with binomial model prior (based on PMP - posterior model probability)
b) The results with binomial-beta model prior (based on PMP - posterior model probability)
Models on the graph are ordered according to their posterior model probability.
bma_list |
bma_list object (the result of the bma function) |
top |
The number of the best model to be placed on the graphs |
A list with three graphs with prior and posterior model probabilities for individual models:
The results with binomial model prior (based on PMP - posterior model probability)
The results with binomial-beta model prior (based on PMP - posterior model probability)
On graph combining the aforementioned graphs
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) model_graphs <- model_pmp(bma_results, top = 16)library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) model_graphs <- model_pmp(bma_results, top = 16)
This function draws four graphs of prior and posterior model probabilities:
a) The results with binomial model prior (based on PMP - posterior model probability)
b) The results with binomial-beta model prior (based on PMP - posterior model probability)
bma_list |
bma_list object (the result of the bma function) |
A list with three graphs with prior and posterior model probabilities for model sizes:
The results with binomial model prior (based on PMP - posterior model probability)
The results with binomial-beta model prior (based on PMP - posterior model probability)
One graph combining all the aforementioned graphs
library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) size_graphs <- model_sizes(bma_results)library(magrittr) data_prepared <- bdsm::economic_growth[, 1:6] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) bma_results <- bma( model_space = bdsm::small_model_space, df = data_prepared, round = 3, dilution = 0 ) size_graphs <- model_sizes(bma_results)
This function calculates model space, values of the maximized likelihood function, BICs, and standard deviations of the parameters that will be used in Bayesian model averaging.
optim_model_space( df, timestamp_col, entity_col, dep_var_col, init_value, exact_value = FALSE, cl = NULL, control = list(trace = 0, maxit = 10000, fnscale = -1, REPORT = 100, scale = 0.05) )optim_model_space( df, timestamp_col, entity_col, dep_var_col, init_value, exact_value = FALSE, cl = NULL, control = list(trace = 0, maxit = 10000, fnscale = -1, REPORT = 100, scale = 0.05) )
df |
Data frame with data for the analysis. |
timestamp_col |
The name of the column with time stamps |
entity_col |
Column with entities (e.g. countries) |
dep_var_col |
Column with the dependent variable |
init_value |
The value with which the model space will be initialized. This will be the starting point for the numerical optimization. |
exact_value |
Whether the exact value of the likelihood should be
computed ( |
cl |
An optional cluster object. If supplied, the function will use this
cluster for parallel processing. If |
control |
a list of control parameters for the optimization which are
passed to optim. Default is
|
List with two objects:
params - table with parameters of all estimated models
stats - table with the value of maximized likelihood function, BIC, and standard errors for all estimated models
## Not run: library(magrittr) data_prepared <- bdsm::economic_growth[, 1:5] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) optim_model_space( df = data_prepared, dep_var_col = gdp, timestamp_col = year, entity_col = country, init_value = 0.5 ) ## End(Not run)## Not run: library(magrittr) data_prepared <- bdsm::economic_growth[, 1:5] %>% bdsm::feature_standardization( excluded_cols = c(country, year, gdp) ) %>% bdsm::feature_standardization( group_by_col = year, excluded_cols = country, scale = FALSE ) optim_model_space( df = data_prepared, dep_var_col = gdp, timestamp_col = year, entity_col = country, init_value = 0.5 ) ## End(Not run)
Given a dataset and an initial value for parameters, initializes a model space with parameters equal to the initial value for each model. Then for each model performs a numerical optimization and finds parameters which maximize the likelihood.
optim_model_space_params( df, timestamp_col, entity_col, dep_var_col, init_value, exact_value = FALSE, cl = NULL, control = list(trace = 0, maxit = 10000, fnscale = -1, REPORT = 100, scale = 0.05) )optim_model_space_params( df, timestamp_col, entity_col, dep_var_col, init_value, exact_value = FALSE, cl = NULL, control = list(trace = 0, maxit = 10000, fnscale = -1, REPORT = 100, scale = 0.05) )
df |
Data frame with data for the analysis. |
timestamp_col |
The name of the column with time stamps. |
entity_col |
Column with entities (e.g. countries). |
dep_var_col |
Column with the dependent variable. |
init_value |
The value with which the model space will be initialized. This will be the starting point for the numerical optimization. |
exact_value |
Whether the exact value of the likelihood should be
computed ( |
cl |
An optional cluster object. If supplied, the function will use this
cluster for parallel processing. If |
control |
a list of control parameters for the optimization which are
passed to optim. Default is
|
List (or matrix) of parameters describing analyzed models.
Data used in Growth Empirics in Panel Data under Model Uncertainty and Weak Exogeneity (Moral-Benito, 2016, Journal of Applied Econometrics).
original_economic_growthoriginal_economic_growth
original_economic_growthA data frame with 292 rows and 13 columns (73 countries and 4 periods + extra one for lagged dependent variable):
Year
Country ID
Logarithm of GDP per capita (2000 US dollars at PP)
Lagged logarithm of GDP per capita (2000 US dollars at PP)
Ratio of real domestic investment to GDP
Stock of years of secondary education in the total population
Average growth rate of population
Population in millions of people
Purchasing-power-parity numbers for investment goods
Exports plus imports as a share of GDP
Ratio of government consumption to GDP
Logarithm of the life expectancy at birth
Composite index given by the democracy score minus the autocracy score
http://qed.econ.queensu.ca/jae/datasets/moral-benito001/
For now it is assumed that we can only exclude linear relationships between regressors and the dependent variable.
regressor_names_from_params_vector(params)regressor_names_from_params_vector(params)
params |
a vector with parameters describing the model |
The vector needs to have named rows, i.e. it is assumed it comes from a model space (see init_model_space_params for details).
Names of regressors which are assumed to be linearly connected with dependent
variable within the model described by the params vector.
params <- c(alpha = 1, beta_gdp = 1, beta_gdp_lagged = 1, phi_0 = 1, err_var = 1) regressor_names_from_params_vector(params)params <- c(alpha = 1, beta_gdp = 1, beta_gdp_lagged = 1, phi_0 = 1, err_var = 1) regressor_names_from_params_vector(params)
Create residual maker matrix from a given matrix m. See article about
projection matrix on
the Wikipedia.
residual_maker_matrix(m)residual_maker_matrix(m)
m |
Matrix |
M x M matrix where M is the number of rows in the m matrix.
residual_maker_matrix(matrix(c(1,2,3,4), nrow = 2))residual_maker_matrix(matrix(c(1,2,3,4), nrow = 2))
Create coefficients matrix for Simultaneous Equations Model (SEM) representation.
sem_B_matrix(alpha, periods_n, beta = NULL)sem_B_matrix(alpha, periods_n, beta = NULL)
alpha |
numeric |
periods_n |
integer |
beta |
numeric vector. Default is c() for no regressors case. |
List with two matrices B11 and B12
sem_B_matrix(3, 4, 4:6)sem_B_matrix(3, 4, 4:6)
Create matrix for Simultaneous Equations Model (SEM) representation with coefficients placed next to initial values of regressors, dependent variable and country-specific time-invariant variables.
sem_C_matrix(alpha, phi_0, periods_n, beta = NULL, phi_1 = NULL)sem_C_matrix(alpha, phi_0, periods_n, beta = NULL, phi_1 = NULL)
alpha |
numeric |
phi_0 |
numeric |
periods_n |
numeric |
beta |
numeric vector. Default is c() for no regressors case. |
phi_1 |
numeric vector. Default is c() for no regressors case. |
matrix
alpha <- 9 phi_0 <- 19 beta <- 11:15 phi_1 <- 21:25 periods_n <- 4 sem_C_matrix(alpha, phi_0, periods_n, beta, phi_1)alpha <- 9 phi_0 <- 19 beta <- 11:15 phi_1 <- 21:25 periods_n <- 4 sem_C_matrix(alpha, phi_0, periods_n, beta, phi_1)
Create matrix which contains dependent variable data used in the Simultaneous Equations Model (SEM) representation on the left hand side of the equations. The matrix contains the data for time periods greater than or equal to the second lowest time stamp. The matrix is then used to compute likelihood for SEM analysis.
sem_dep_var_matrix(df, timestamp_col, entity_col, dep_var_col)sem_dep_var_matrix(df, timestamp_col, entity_col, dep_var_col)
df |
Data frame with data for the SEM analysis. |
timestamp_col |
Column which determines time periods. For now only natural numbers can be used as timestamps |
entity_col |
Column which determines entities (e.g. countries, people) |
dep_var_col |
Column with dependent variable |
Matrix of size N x T where N is the number of entities considered and T is the number of periods greater than or equal to the second lowest time stamp.
set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) sem_dep_var_matrix(df, times, entities, dep_var)set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) sem_dep_var_matrix(df, times, entities, dep_var)
Likelihood for the SEM model
sem_likelihood( params, data, timestamp_col, entity_col, dep_var_col, lin_related_regressors = NULL, per_entity = FALSE, exact_value = TRUE )sem_likelihood( params, data, timestamp_col, entity_col, dep_var_col, lin_related_regressors = NULL, per_entity = FALSE, exact_value = TRUE )
params |
Parameters describing the model. Can be either a vector or a list with named parameters. See 'Details' |
data |
Data for the likelihood computations. Can be either a list of matrices or a dataframe. If the dataframe, additional parameters are required to build the matrices within the function. |
timestamp_col |
Column which determines time stamps. For now only natural numbers can be used. |
entity_col |
Column which determines entities (e.g. countries, people) |
dep_var_col |
Column with dependent variable |
lin_related_regressors |
Which subset of columns should be used as
regressors for the current model. In other words |
per_entity |
Whether to compute overall likelihood or a vector of likelihoods with per entity value |
exact_value |
Whether the exact value of the likelihood should be
computed ( |
The params argument is a list that should contain the following
components:
alpha scalar value which determines linear dependence on lagged
dependent variable
phi_0 scalar value which determines linear dependence on the value
of dependent variable at the lowest time stamp
err_var scalar value which determines classical error component
(Sigma11 matrix, sigma_epsilon^2)
dep_vars double vector of length equal to the number of time stamps
(i.e. time stamps greater than or equal to the second lowest time stamp)
beta double vector which determines the linear dependence on
regressors different than the lagged dependent variable; The vector should
have length equal to the number of regressors.
phi_1 double vector which determines the linear dependence on
initial values of regressors different than the lagged dependent variable;
The vector should have length equal to the number of regressors.
phis double vector which together with psis determines upper
right and bottom left part of the covariance matrix; The vector should have
length equal to the number of regressors times number of time stamps minus 1,
i.e. regressors_n * (periods_n - 1)
psis double vector which together with psis determines upper
right and bottom left part of the covariance matrix; The vector should have
length equal to the number of regressors times number of time stamps minus 1
times number of time stamps divided by 2, i.e.
regressors_n * (periods_n - 1) * periods_n / 2
The value of the likelihood for SEM model (or a part of interest of the likelihood)
set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) df <- feature_standardization(df, excluded_cols = c(times, entities)) sem_likelihood(0.5, df, times, entities, dep_var)set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) df <- feature_standardization(df, excluded_cols = c(times, entities)) sem_likelihood(0.5, df, times, entities, dep_var)
Matrix with psi parameters for SEM representation
sem_psi_matrix(psis, timestamps_n, features_n)sem_psi_matrix(psis, timestamps_n, features_n)
psis |
double vector with psi parameter values |
timestamps_n |
number of time stamps (e.g. years) |
features_n |
number of features (e.g. population size, investment rate) |
A matrix with timestamps_n rows and
(timestamps_n - 1) * feature_n columns. Psis are filled in row by row
in a block manner, i.e. blocks of size feature_n are placed next to
each other
sem_psi_matrix(1:30, 4, 5)sem_psi_matrix(1:30, 4, 5)
Create matrix which contains regressors data used in the Simultaneous Equations Model (SEM) representation on the left hand side of the equations. The matrix contains regressors data for time periods greater than or equal to the second lowest time stamp. The matrix is then used to compute likelihood for SEM analysis.
sem_regressors_matrix(df, timestamp_col, entity_col, dep_var_col)sem_regressors_matrix(df, timestamp_col, entity_col, dep_var_col)
df |
Data frame with data for the SEM analysis. |
timestamp_col |
Column which determines time periods. For now only natural numbers can be used as timestamps |
entity_col |
Column which determines entities (e.g. countries, people) |
dep_var_col |
Column with dependent variable |
Matrix of size N x (T-1)*k where N is the number of entities considered, T is
the number of periods greater than or equal to the second lowest time stamp
and k is the number of chosen regressors. If there are no regressors returns
NULL.
set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) sem_regressors_matrix(df, times, entities, dep_var)set.seed(1) df <- data.frame( entities = rep(1:4, 5), times = rep(seq(1960, 2000, 10), each = 4), dep_var = stats::rnorm(20), a = stats::rnorm(20), b = stats::rnorm(20) ) sem_regressors_matrix(df, times, entities, dep_var)
Create covariance matrix for Simultaneous Equations Model (SEM) representation. Only the part necessary to compute concentrated likelihood function is computed (cf. Appendix in the Moral-Benito paper)
sem_sigma_matrix(err_var, dep_vars, phis = NULL, psis = NULL)sem_sigma_matrix(err_var, dep_vars, phis = NULL, psis = NULL)
err_var |
numeric |
dep_vars |
numeric vector |
phis |
numeric vector |
psis |
numeric vector |
List with two matrices Sigma11 and Sigma12
err_var <- 1 dep_vars <- c(2, 2, 2, 2) phis <- c(10, 10, 20, 20, 30, 30) psis <- c(101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112) sem_sigma_matrix(err_var, dep_vars, phis, psis)err_var <- 1 dep_vars <- c(2, 2, 2, 2) phis <- c(10, 10, 20, 20, 30, 30) psis <- c(101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112) sem_sigma_matrix(err_var, dep_vars, phis, psis)
A list with two elements: params and stats computed using the
optim_model_space function and the economic_growth dataset, but
using only three regressors: ish, sed and pgrw.
small_model_spacesmall_model_space
small_model_spaceA list with two elements.
A double matrix with 40 rows and 2^3 = 8 columns with the
parameters for the model space.
Each column represents a different model.
A matrix representing the statistics computed with
compute_model_space_stats based on params.
The first row contains likelihoods for the models.
The second row are almost 1/2 * BIC_k
as in Raftery's Bayesian Model Selection in Social Research, eq. 19.
The rows 3-7 are standard deviations.
Finally, the rows 8-12 are robust standard deviations.