Back to the main page.
Bug 2507 - unexpected behaviour of ft_appenddata when appending more than two datasets
Status | CLOSED FIXED |
Reported | 2014-03-20 16:06:00 +0100 |
Modified | 2019-08-10 12:29:32 +0200 |
Product: | FieldTrip |
Component: | core |
Version: | unspecified |
Hardware: | PC |
Operating System: | Windows |
Importance: | P5 major |
Assigned to: | Eelke Spaak |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
- 2014-03-20 16:06:40 +0100
Hey Trippoids, Okay, so I think I've encountered some strange behaviour when trying to concatenate >2 data subsets using ft_appenddata. Suppose I have three datasets that represent three experimental conditions.; dataA, dataB and dataC. If I do... dataAB=ft_appenddata([],dataA,dataB) ...and then I plot this data, the first trial of dataAB is the same as the first trial of dataA (I used ft_databrowser to compare them. They look the same) However if I do... dataABC=ft_appenddata([],dataA,dataB,dataC) ...and then use the databrowser, the first trial is NOT the first trial of dataA. I think it is the first trial of dataC. The plot thickens. If I look at the trialinfos, the first row of dataABC.trialinfo *is* the first row of dataA.trialinfo. That's super upsetting because I want to use the trialinfos to split my data back up again at a later stage. It seems that the mapping between trialinfo and data is lost. If it helps I can post a snippet of the data I used to discover this, but I guess anybody can try to replicate this with any dataset containing, say, six trials, simply by using ft_selectdata to create three subsets of two trials each. I'm assuming that the expected behaviour for the user is: appending dataA dataB and dataC results in all the A trials followed by B trials followed by C trials, and that is currently *not* happening. Plus the trialinfos and data are discrepant. In my opinion (and if this is a 'real' bug as opposed to stupid behaviour by me) this qualifies as relatively serious. Let me know if I can contribute to solving this somehow. Best, Tom
Eelke Spaak - 2014-03-20 16:15:46 +0100
(In reply to t.marshall from comment #0) Hi Tom, I can't seem to reproduce your bug, at least not with the very simple toy example below. Could you see (1) whether the bug also occurs for you in the below script; and (2) if it does not, please post a snippet of code and data? Cheers! %% create some data data1 = []; data1.label = {'a' 'b' 'c'}; data1.time = {1:1000 1:1000 1:1000}; data1.trial = {randn(3,1000) randn(3,1000) randn(3,1000)}; data1.trialinfo = [1;2;3]; data2 = data1; data2.trialinfo = [4;5;6]; data2.trial = {randn(3,1000) randn(3,1000) randn(3,1000)}; data3 = data1; data3.trialinfo = [7;8;9]; data3.trial = {randn(3,1000) randn(3,1000) randn(3,1000)}; %% append datcmb = ft_appenddata([], data1, data2, data3); %% check assert(isequal(datcmb.trialinfo, [data1.trialinfo; data2.trialinfo; data3.trialinfo])); assert(isequal(datcmb.trial, cat(2, data1.trial, data2.trial, data3.trial)));
Jan-Mathijs Schoffelen - 2014-03-20 16:17:02 +0100
Hi Tom, I cannot reproduce it on my end: data1.trial{1} = 1; data1.time{1} = 1; data1.label = {'chan01'}; data1.trialinfo = 1; data2 = data1; data3 = data1; data2.trial{1} = 2; data3.trial{1} = 3; data2.trialinfo = 2; data3.trialinfo = 3; data = ft_appenddata([],data1,data2,data3); data = trial: {[1] [2] [3]} time: {[1] [1] [1]} trialinfo: [3x1 double] label: {'chan01'} fsample: NaN cfg: [1x1 struct] sampleinfo: [3x2 double] The 1,2,3 end up in expected order in the trial-field. data.trialinfo ans = 1 2 3 This also looks OK. Could you try this out, and see whether you agree?
Jan-Mathijs Schoffelen - 2014-03-20 16:17:47 +0100
hehe, I only needed one trial, one channel, and one time point.
Eelke Spaak - 2014-03-20 16:19:11 +0100
(In reply to Jan-Mathijs Schoffelen from comment #3) Yes, well done :) I still prefer mine though :P
- 2014-03-20 16:34:50 +0100
Created attachment 600 three small data subsets that append weirdly
- 2014-03-20 16:35:53 +0100
Eelke, I agree that your example behaves as expected. It also works if the three subsets contain unequal trial numbers (just wanted to check)... However, my data still behave weirdly. See the attached heavily-trimmed-down version.
- 2014-03-20 16:38:00 +0100
I'm wrong. It's not the appending. It's the databrowser.
Robert Oostenveld - 2014-03-20 16:57:38 +0100
(In reply to Eelke Spaak from comment #4) bummer! I thought "let's quickly reproduce this with a small test script", but I have been beaten to it (twice). I had made this: dataA = []; dataA.trial = {[1], [2]}; dataA.time = {[0], [0]}; dataA.label = {'chan'}; dataB = []; dataB.trial = {[3], [4]}; dataB.time = {[0], [0]}; dataB.label = {'chan'}; dataC = []; dataC.trial = {[5], [6]}; dataC.time = {[0], [0]}; dataC.label = {'chan'}; dataABC = ft_appenddata([], dataA, dataB, dataC) dataAll = trial: {[1] [2] [3] [4] [5] [6]} time: {[0] [0] [0] [0] [0] [0]} label: {'chan'} fsample: NaN cfg: [1x1 struct] sampleinfo: [6x2 double] I have lost on speed and size, but do I still get some points on style?
Robert Oostenveld - 2014-03-20 16:58:29 +0100
(In reply to t.marshall from comment #7) If it is in ft_databrowser, it must be due to ft_fetch_data.
Eelke Spaak - 2014-03-21 11:04:15 +0100
I suspect the problem is caused by ft_appenddata insisting on the input data sets having a sampleinfo (line 72): varargin{i} = ft_checkdata(varargin{i}, 'datatype', 'raw', 'feedback', 'no', 'hassampleinfo', 'yes'); if the input data does not have a sample info, it is created from scratch, leading to the infamous warning pair: Warning: the data does not contain a trial definition > In utilities/private/warning_once at 158 In utilities/private/fixsampleinfo at 66 In ft_datatype_raw at 154 In ft_checkdata at 224 In ft_appenddata at 72 Warning: reconstructing sampleinfo by assuming that the trials are consecutive segments of a continuous recording > In utilities/private/warning_once at 158 In utilities/private/fixsampleinfo at 79 In ft_datatype_raw at 154 In ft_checkdata at 224 In ft_appenddata at 72 The databrowser indeed calls ft_fetch_data, which then warnings: Warning: samples present in multiple trials, using only the last occurence of each sample > In ft_fetch_data at 136 In ft_databrowser>redraw_cb at 1463 In ft_databrowser at 676 So that must be it. I tried fixing this by simply removing the 'hassampleinfo' requirement from ft_appenddata:72. This still did not do the trick, as in ft_datatype_raw the default value for 'hassampleinfo' is 'ifmakessense' and I believe the criteria for it making sense are now not correct (at least not in the context of appending data). The sampleinfo field will be reconstructed *unless* there already *is* a sampleinfo which is inconsistent with the actual data (as stored in .trial). I believe we should default to *not* reconstructing it when hassampleinfo=='ifmakessense'. Then this option would result in the sampleinfo being *kept* if it is already present in the data and consistent with the data, but not reconstructed if it is not there. Some input needed: shall I go ahead and make this change? Summarizing, 'hassampleinfo' will then have three values: yes ==> always reconstruct if not present ifmakessense ==> keep if already present and is consistent, don't reconstruct no ==> always remove
Robert Oostenveld - 2014-03-21 11:31:28 +0100
(In reply to Eelke Spaak from comment #10) I suspect ifmakessense makes only sense within a single dataset. In the examples that we now have, I would expect the appended data not to have any sample info, i.e., if the input to ft_appenddata has no sample info, why would the output have it?
Jan-Mathijs Schoffelen - 2014-03-21 11:33:36 +0100
(In reply to Robert Oostenveld from comment #11) "ifmakessense makes only sense" Nice :)
Eelke Spaak - 2014-04-24 09:27:01 +0200
bash-4.1$ svn commit utilities/ft_datatype_raw.m ft_appenddata.m Sending ft_appenddata.m Sending utilities/ft_datatype_raw.m Transmitting file data .. Committed revision 9452. bugfix(2507): the behaviour for the default option hassampleinfo=ifmakessense in ft_datatype_raw is now to *not* reconstruct sampleinfo if it was not present in the input data. Sampleinfo will be kept if it is present in the input data, and if it is consistent with the actual trial structure. Tom, I believe this fixes the bug; could you check (in 15 minutes or so) if indeed the problem is solved? If not, please reopen this bug.