Back to the main page.
Bug 3082 - ft_statfun_indepsamplesregrT does not correct df's for multiple independent variables
Status | CLOSED FIXED |
Reported | 2016-02-29 19:28:00 +0100 |
Modified | 2019-08-10 12:41:10 +0200 |
Product: | FieldTrip |
Component: | core |
Version: | unspecified |
Hardware: | PC |
Operating System: | Mac OS |
Importance: | P5 normal |
Assigned to: | Arjen Stolk |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
Arjen Stolk - 2016-02-29 19:28:40 +0100
In case one adds multiple regressors to the design matrix, e.g. cfg.design(1,:) = [1 2 3] cfg.design(2,:) = [2 2 2] cfg.ivar = [1 2] all regressors are used in the model, provided their row index is specified in cfg.ivar. This is desired behavior, I'd say. However, the df's are not corrected accordingly. Namely, the df is defined as: [nsmpl,nrepl] = size(dat); df = nrepl - nblocks - 1; if df<1 error('Insufficient error degrees of freedom for this analysis.') end; where nrepl is given by the data, and nblocks follows from cfg.cvar, if specified. In other words, nothing is done with the dimensions of cfg.design of cfg.ivar. I presume that this is how the design matrix should be specified, i.e. adding other main and control variables to it? ft_statistics_montecarlo states: % cfg.ivar = number or list with indices, independent variable(s) % cfg.uvar = number or list with indices, unit variable(s) % cfg.wvar = number or list with indices, within-cell variable(s) % cfg.cvar = number or list with indices, control variable(s) giving the impressions multiple independent variables can be entered into the design matrix. So, the question is 'to correct or not correct'?
Jan-Mathijs Schoffelen - 2016-11-29 09:26:22 +0100
Arjen, do you already know the answer?
Arjen Stolk - 2016-11-29 18:05:24 +0100
(In reply to Jan-Mathijs Schoffelen from comment #1) Nope, and sorry had forgotten about this one. @Eric, do you know whether the df's should be adjusted according to the number of regressors?
Eric Maris - 2016-11-30 10:26:37 +0100
Dear colleagues, This is an annoying question, not because it cannot be answered, but because it takes a long time to explain(-;) In essence: 1. permutation inference in a regression context is only straightforward when there is only a single predictor. 2. in the parametric framework, of course one must adjust the degrees of freedom according to the number of predictors. I'm a bit concerned that, by advertising statfun_indepsamplesRegrT as a general purpose function for all types of regressions designs, users will start thinking that cluster-based permutation inference is also valid for these designs (with the FT implementation of cluster-based permutation inference). Moreover, the FT cluster-based permutation inference remains valid even with erroneous df's for the regression T-stats that are used for thresholding. There are valid permutation-based methods for regression problems with multiple regressors, but these are not documented on the FT wiki. They could be implemented though using the current FT code, but this would require some additional outside-FT coding by the user. best, Eric
Arjen Stolk - 2016-11-30 18:40:46 +0100
(In reply to Eric Maris from comment #3) Hey Eric, Thanks for this elaboration. Two general observations are that 1) there seems to be an increasing demand for more complex modeling of the data (at least, my impression of what people ask at conferences and on the email discussion list), and 2) the cluster-based permutation equals fair testing in most people's mind (cf. that one article about dissatisfactory FWE correction). Given these observations, would it be an idea to give a warning, possibly with a link to documentation explaining why this is an unfair approach, when people try to do this? Or even better, a full tutorial on adequate cluster-based permutation testing in the context of GLM? I presume the latter, given the expanding audience for this technique, would also make a good publication. Just my .02, Arjen
Arjen Stolk - 2017-01-25 19:04:34 +0100
"Moreover, the FT cluster-based permutation inference remains valid even with erroneous df's for the regression T-stats that are used for thresholding." I trust you on your blue eyes, Eric (and knowing that whatever "black box" applies to the data distribution also applies to the randomization distribution). Given that this possible inaccuracy does not influence the statistical outcome, and apparently is not easy to fix, I'm closing this thread. I do think questions regarding multiple regressor will reappear in near future, and require answers/advice beyond what we have previously documented at http://www.fieldtriptoolbox.org/faq/how_can_i_test_for_correlations_between_neuronal_data_and_quantitative_stimulus_and_behavioural_variables
Arjen Stolk - 2017-01-25 19:04:55 +0100
CLOSED
Robert Oostenveld - 2019-08-10 12:35:00 +0200
This closes a whole series of bugs that have been resolved (either FIXED/WONTFIX/INVALID) for quite some time. If you disagree, please file a new issue on https://github.com/fieldtrip/fieldtrip/issues.