Back to the main page.
Bug 2069 - freqstatistics does not respect channel order in data
Status | CLOSED FIXED |
Reported | 2013-03-19 11:01:00 +0100 |
Modified | 2014-01-29 13:28:38 +0100 |
Product: | FieldTrip |
Component: | core |
Version: | unspecified |
Hardware: | PC |
Operating System: | Windows |
Importance: | P3 critical |
Assigned to: | Eelke Spaak |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
Eelke Spaak - 2013-03-19 11:01:15 +0100
After about 1,5 days of digging around, I finally found quite a severe bug in the stats routines. See attached test script. The idea is that both freq- and timelockstatistics do not use the channel order in data.label correctly.
Eelke Spaak - 2013-03-19 11:01:45 +0100
Created attachment 440 test script to reproduce bug
Eelke Spaak - 2013-03-19 11:04:23 +0100
And PS: also shown in the test script is that with identical shuffling, the freq and timelock results are different. A 'geluk bij een ongeluk', because this is how I originally became aware of the bug.
Eelke Spaak - 2013-03-19 11:18:51 +0100
Created attachment 441 test script There was an error in my test script, now fixed in the new attachment. Conclusion is that only freqstatistics is affected by this bug, not timelockstatistics.
Jörn M. Horschig - 2013-03-19 11:24:58 +0100
today's youth shows no respect :) Eelke, I won't doubt this, but two remarks wrt your test script 1) when doing monte carlo statistics, you must not compare t-values and if unequal assume that two runs are different - it's a monte carlo technique. so doing all (a.stat == b.stat) is an invalid assertion. Better would be to refrain from monte carlo cluster based permutation here and verify this with an ordinary t-test 2) when testing a dataset to another dataset with shuffeled channel order, the resulting stat will have a different channel order as well, thus a.stat == b.stat is again invalid. match_str should be used to get the channel indices of one stat-structure wrt the other. Finally, see bug 1707 which becomes more and more important. Maybe I/we should spent some time during the next bug binge for this. I vote for 'we' because,a s stated in that bug, it lots of work. I can start thinking of a well-structured test script so that work load can be distributed
Jörn M. Horschig - 2013-03-19 11:25:57 +0100
(In reply to comment #3) and 2) has been solved ;) 1) remains though
Eelke Spaak - 2013-03-19 11:27:37 +0100
(In reply to comment #4) As mentioned, you were right with point 2 :) But your point 1 is invalid: the monte carlo p-value estimates are dependent on random permutations, but the t-stats should be identical, as they do not involve any randomness.
Eelke Spaak - 2013-03-19 11:29:47 +0100
I tracked this down to an error in ft_selectdata, which does not reorder the channel dimension when converting this to the rpt dimension.
Jörn M. Horschig - 2013-03-19 11:31:01 +0100
ah, you only permute the 'raw' t-values, then I'm sorry ;)
Eelke Spaak - 2013-03-19 12:13:35 +0100
Had a short meeting with JM just now. The plan is to eliminate ft_selectdata_old anyway, since ft_selectdata should only be used for selecting data, and not for appending. Therefore, the calls to ft_selectdata in ft_freqstatistics (lines 205-227) should be replaced with a call to ft_appendfreq followed by ft_selectdata. It turns out that ft_appendfreq also does not correctly reorder channels, so this needs to be fixed first. After fixing, an e-mail to the list should be sent, instructing people to redo their stats across subjects in case they did a channel repair...
Jan-Mathijs Schoffelen - 2013-03-19 12:14:39 +0100
My bad apparently. Building in code that bypasses statistics_wrapper relies on ft_selectdata(_old) which does not handle the channel order correctly. Discussed with Eelke: replace the bad piece of code with a call to ft_appendfreq. As of yet ft_appendfreq does not check for the consistency of the channel order, but a piece of code can be copied over from ft_appenddata I think. Once the fix is in place and tested we should send an e-mail to the discussion list, I guess.
Eelke Spaak - 2013-03-19 13:32:50 +0100
question: should ft_appendfreq also support non-monotonic freq or time dimensions?
Eelke Spaak - 2013-03-19 13:37:32 +0100
(In reply to comment #11) I guess not, as this is already properly checked (and an error is thrown in case of varargin{1}.time = [1 2 3 4 5] and varargin{2}.time = [5 1 2 3 4]). Could be implemented, but probably is rather useless. And in any case, it is properly error-handled already, so we can just leave it in the user's hands.
Jan-Mathijs Schoffelen - 2013-03-19 13:57:10 +0100
...as long as it isn't left in my hands, because we now all know what happens then :-(
Eelke Spaak - 2013-03-19 14:51:59 +0100
Should be fixed in 7695: Sending ft_appendfreq.m Sending ft_freqstatistics.m Adding private/reorderdim.m Adding test/test_bug2069.m Sending test/test_ft_appendfreq.m Transmitting file data ..... Committed revision 7695. JM, could you test my changes? I also made some changes to ft_appendfreq when the appenddim was freq, time or chan, not just to the rpt case. How should we handle ft_selectdata(_old)? If people use that to append some data sets that have the same channels but ordered differently, they will get wrong results. Throw an error?
Eelke Spaak - 2013-03-19 15:35:24 +0100
Also @JM: could you perhaps implement the error that should be thrown in ft_selectdata_old? I looked for some 10 minutes but could not find out where to implement this properly...
Jan-Mathijs Schoffelen - 2013-03-19 17:00:35 +0100
OK guys, the problem seems to have been introduced with revision 4101 (september 2011), where a selection of code has been commented out. Essentially, this selection of code did an explicit check on the equality of the 'dimtok' fields (freq, time, label) in order to automatically determine over which dimension to concatenate. When the label order is not the same for all input, the ft_selectdata_old (wrongfully) determined this to be the dimension of concatentation, that most likely would have lead to a non-specific error, rather than to incorrect output.
Jörn M. Horschig - 2013-03-20 09:47:43 +0100
I guess the problems then started to occur for most users with allowing for repairing missing channels and that we decided to concatenate these, see bug 644 (beginning of 2012).
Eelke Spaak - 2013-03-20 10:06:10 +0100
Just sent an e-mail to the list. For some reason I don't have it yet, so I thought I'd post it here, in case anyone else was thinking of writing an e-mail too.
Jörn M. Horschig - 2013-03-20 16:59:08 +0100
hey girls: I just found the below code in ft_math. Robert wrote it, assuming he knows what he is doing, it should be used throughout FieldTrip: % ensure that the data in all inputs has the same channels, time-axis, etc. tmpcfg = []; tmpcfg.parameter = cfg.parameter; [varargin{:}] = ft_selectdata(tmpcfg, varargin{:}); % restore the provenance information [cfg, varargin{:}] = rollback_provenance(cfg, varargin{:}); HOWEVER, see bug 2071 (meaning, the rolling back does not work, yet)
Robert Oostenveld - 2013-03-20 17:28:10 +0100
(In reply to comment #19) Let's discuss the rollback in the next ft meeting.
Eelke Spaak - 2013-05-08 15:17:41 +0200
This was fixed, right? Guess I forgot to mark as resolved.