Back to the main page.

Bug 1431 - ft_selectdata_old unexpected toilim behavior

Reported 2012-04-17 12:29:00 +0200
Modified 2015-02-18 17:13:47 +0100
Product: FieldTrip
Component: core
Version: unspecified
Hardware: PC
Operating System: Windows
Importance: P3 normal
Assigned to: Diego Lozano Soldevilla
Depends on:
See also:

Johanna - 2012-04-17 12:29:08 +0200

I entered a toilim of [-0.7 -0.3] for selecting time points of a 'freq' structure, which contained .time = [-0.6:.2:2.4]. The output included points [-0.6 -0.4 -0.2], because on line 526 'nearest' is used to find the boundaries of time points, and in this case, -0.2 must have been just a bit closer to -0.3 than -0.4 was. However, since -0.2 is clearly outside the range of [-0.7 -0.3] (not within machine precision of the boundaries), I consider this unexpected behavior. My data has the 'rpt' dim, thus ft_selectdata_new fails on line 84.

Johanna - 2012-05-16 09:58:13 +0200

It seems the fix to this already exists as commented-out line. In r4101 by Robert, he 'reverted' the code from seltoi = find(data.time>=seltoi(1) & data.time<=seltoi(2)); to seltoi = nearest(data.time, seltoi(1)):nearest(data.time, seltoi(2)); Given this history of code changes, I would like Robert's comments on why it was changed back to 'nearest' before I change it back again to the first option. As a followup question, what is the desired behaviour if 'toilim' is outside the range of the data.time? With the first option, seltoi will be empty, and the output will be a data structure with size=0 in the time dimension. (but not crash). Alternatively, with the second option, seltoi will be a single number nearest the edge of data.time, either beginning or end, but could in principle be many seconds away from the values stated in 'toilim'.

Robert Oostenveld - 2012-07-04 14:56:30 +0200

after discussion with Johanna and the FT team, this seems least bad if numel(val)==2 % interpret this as a range specification like [minval maxval] % see also sel = find(array>val(1) & array<val(2)); indx(1) = sel(1); indx(2) = sel(end); return end

Johanna - 2012-07-04 14:59:22 +0200

note to self: contact Craig (bug 498) and add changes to test_nearest.m

Johanna - 2012-07-04 17:44:58 +0200

Hi Robert and Craig, How about this slight addition to what Robert suggested (for this to be inserted into nearest.m)? if numel(val)==2 % interpret this as a range specification like [minval maxval] % see also intervaltol=eps; sel = find(array>=val(1) & array<=val(2)); indx(1) = sel(1); indx(2) = sel(end); if indx(1)>1 && abs(array(indx(1)-1)-val(1))<intervaltol indx(1)=indx(1)-1; end if indx(2)<length(array) && abs(array(indx(2)+1)-val(2))<intervaltol indx(2)=indx(2)+1; end return end These two extra if-statements check if the value in array just below minval or the value in array just above maxval, are indeed within machine precision of the desired min/maxval, and if so, then use that instead. Without this addition, I get: >> nearest(.1:.1:1.0,[.1 .3]) ans = 1 2 But with this addition I get: >> nearest(.1:.1:1.0,[.1 .3]) ans = 1 3 which I think is the desired output. In future, intervaltol can be changed to something other than eps, or user-specified cfg if needed. If you are ok with this version, then I'll commit it, but then also make changes wherever nearest.m is called for the [minval maxval] pair. Also, I will add these to test_nearest.m: assert(all(nearest(.1:.1:1.0,[.1 .3])==[1 3])) assert(all(nearest(.1:.1:1.0,[.11 .3])==[2 3])) assert(all(nearest(.1:.1:1.0,[.1 .29])==[1 2])) assert(all(nearest(.1:.1:1.0,[0 .3])==[1 3])) assert(all(nearest(.1:.1:1.0,[.11 .29])==[2 2])) assert(all(nearest(.1:.1:1.0,[-inf 1])==[1 10])) assert(all(nearest(.1:.1:1.0,[.79 inf])==[8 10])) assert(all(nearest(.1:.1:1.0,[.8 10])==[8 10])) assert(all(nearest(.1:.1:1.0,[-2 8])==[1 10])) assert(all(nearest(.1:.1:1.0,[.79 .99])==[8 9])) assert(all(nearest(.001:.001:.01,[.002 .003])==[2 3])) assert(all(nearest(.001:.001:.1,[0 1])==[1 100])) However, one final question that I asked in this bug already initially, but forgot to discuss with Robert today: what should happen when the [minval maxval] are both outside the range of array? e.g. nearest(.1:.1:1.0,[3 8]) ? 'nearest' before would have returned [10 10], but perhaps it should crash/error in an informative way telling the user they specified values outside the range in data.

Robert Oostenveld - 2012-07-04 18:56:40 +0200

(In reply to comment #4) If both are on the same side outside the array range, I would indeed find an error appropriate

Craig Richter - 2012-07-05 01:20:40 +0200

I agree with spawning an error. Too much hand-holding can cause the user to end up in a place they don't intend. If the toi boundaries don't make sense then a clear message as to why is probably the most helpful message that the program can respond with. If we try to engineer things to do 'the most logical thing', then the user will likely end up with something totally unintended but yet seems to 'work'. This can make finding the 'bug' in their code down the road nearly impossible. I'm a puritan in this regard: if the software gets weird input then the best response is to generate an error message describing why rather than to make a best guess as to what on earth the user actually intended to do.

Johanna - 2012-07-05 11:42:32 +0200

svn commit 6238 for nearest.m and test_nearest.m complete. Still todo: make changes to functions that call nearest

Johanna - 2012-07-05 16:30:22 +0200

svn commit 6240, so now ft_selectdata_old uses this new nearest for toilim (not foilim). I thus consider this bug fixed. However, from grep -R nearest *, I think the following could also be updated, based on having a pair of [beg end] in a time array. But I think this would need to be assessed on a case by case basis. Perhaps a new bug for these? contrib/spike/private/preproc.m: begsample = nearest(time, cfg.baselinewindow(1)); contrib/spike/private/preproc.m: endsample = nearest(time, cfg.baselinewindow(2)); contrib/spike/ft_spikedensity.m: begSample = nearest(timeAxis, cfg.latency(1)); contrib/spike/ft_spike_jpsth.m:indx = nearest(psth.time,cfg.latency(1)) : nearest(psth.time,cfg.latency(2)); fileio/private/ft_checkdata.m: ix(i) = 1-nearest(tmptime, begtime(i)); fileio/private/ft_checkdata.m: iy(i) = nearest(tmptime, endtime(i))-length(tmptime); fileio/private/ft_checkdata.m: begsmp(i) = nearest(time, data.time{i}(1)); fileio/private/ft_checkdata.m: endsmp(i) = nearest(time, data.time{i}(end)); ft_combineplanar.m: tbeg = nearest(data.time, cfg.baselinewindow(1)); ft_combineplanar.m: tend = nearest(data.time, cfg.baselinewindow(2)); ft_databrowser.m: opt.trlop = nearest(begsamples, thissample); ft_dipolefitting.m: tbeg = nearest(data.time, cfg.latency(1)); ft_dipolefitting.m: tend = nearest(data.time, cfg.latency(end)); ft_dipolefitting.m: tbeg = nearest(data.time, cfg.latency(1)); ft_dipolefitting.m: tend = nearest(data.time, cfg.latency(end)); ft_freqgrandaverage.m: timesel = nearest(varargin{i}.time, tbeg):nearest(varargin{i}.time, tend); ft_movieplotTFR.m:xbeg = nearest(xvalues, cfg.xlim(1)); ft_movieplotTFR.m:xend = nearest(xvalues, cfg.xlim(2)); ft_multiplotCC.m: fbin = [nearest(xparam, cfg.xlim(1)) nearest(xparam, cfg.xlim(2))]; ft_multiplotER.m: xidmin(i,1) = nearest(varargin{i}.(xparam), xmin); ft_multiplotER.m: xidmax(i,1) = nearest(varargin{i}.(xparam), xmax); ft_multiplotTFR.m: xmin = nearest(data.(xparam), xmin); ft_multiplotTFR.m: xmax = nearest(data.(xparam), xmax); ft_multiplotTFR.m: ymin = nearest(data.(yparam), ymin); ft_multiplotTFR.m: ymax = nearest(data.(yparam), ymax); ft_mvaranalysis.m: begsmp = nearest(data.time{k}, zwindow(1)); ft_mvaranalysis.m: endsmp = nearest(data.time{k}, zwindow(2)); ft_mvaranalysis.m: sample = nearest(timeaxis, cfg.toi(j)); ft_redefinetrial.m: begsample(i) = nearest(data.time{i}, cfg.toilim(1)); ft_redefinetrial.m: endsample(i) = nearest(data.time{i}, cfg.toilim(2)); ft_singleplotER.m: xidmin(i,1) = nearest(varargin{i}.(xparam), xmin); ft_singleplotER.m: xidmax(i,1) = nearest(varargin{i}.(xparam), xmax); ft_singleplotTFR.m: xmin = nearest(data.(xparam), xmin); ft_singleplotTFR.m: xmax = nearest(data.(xparam), xmax); ft_singleplotTFR.m: ymin = nearest(data.(yparam), ymin); ft_singleplotTFR.m: ymax = nearest(data.(yparam), ymax); ft_sourcedescriptives.m: begsmp = nearest(source.time, cfg.baselinewindow(1)); ft_sourcedescriptives.m: endsmp = nearest(source.time, cfg.baselinewindow(2)); ft_sourcedescriptives.m: begsmp = nearest(source.time, cfg.baselinewindow(1)); ft_sourcedescriptives.m: endsmp = nearest(source.time, cfg.baselinewindow(2)); ft_sourcemovie.m:xbeg = nearest(xparam, xlim(1)); ft_sourcemovie.m:xend = nearest(xparam, xlim(2)); ft_timelockanalysis.m:begsampl = nearest(abstimvec, latency(1)); ft_timelockanalysis.m:endsampl = nearest(abstimvec, latency(2)); ft_timelockanalysis.m: begsampl = nearest(data.time{i}, latency(1)); ft_timelockanalysis.m: endsampl = nearest(data.time{i}, latency(2)); ft_timelockanalysis.m: begsampl = nearest(data.time{i}, cfg.covariancewindow(1)); ft_timelockanalysis.m: endsampl = nearest(data.time{i}, cfg.covariancewindow(2)); ft_timelockanalysis_new.m: begsampl = nearest(data.time{ii}, cfg.covlatency(1)); ft_timelockanalysis_new.m: endsampl = nearest(data.time{ii}, cfg.covlatency(2)); ft_timelockanalysis_new.m: begsampl = nearest(abstimvec, cfg.latency(1)); ft_timelockanalysis_new.m: endsampl = nearest(abstimvec, cfg.latency(2)); ft_timelockanalysis_new.m: begsampl = nearest(data.time{ii}, cfg.latency(1)); ft_timelockanalysis_new.m: endsampl = nearest(data.time{ii}, cfg.latency(2)); ft_timelockbaseline.m: tbeg = nearest(timelock.time, cfg.baseline(1)); ft_timelockbaseline.m: tend = nearest(timelock.time, cfg.baseline(2)); ft_timelockgrandaverage.m:idxs = nearest(varargin{1}.time, min(cfg.latency)); ft_timelockgrandaverage.m:idxe = nearest(varargin{1}.time, max(cfg.latency)); ft_topoplotTFR.m: xmin = nearest(data.(xparam), xmin); ft_topoplotTFR.m: xmax = nearest(data.(xparam), xmax); private/preproc.m: begsample = nearest(time, cfg.baselinewindow(1)); private/preproc.m: endsample = nearest(time, cfg.baselinewindow(2)); private/rejectvisual_summary.m: begsample = nearest(data.time{i}, cfg.latency(1)); private/rejectvisual_summary.m: endsample = nearest(data.time{i}, cfg.latency(2)); private/rejectvisual_channel.m: begsample = nearest(data.time{i}, cfg.latency(1)); private/rejectvisual_channel.m: endsample = nearest(data.time{i}, cfg.latency(2)); private/rejectvisual_trial.m: begsample = nearest(data.time{i}, cfg.latency(1)); private/rejectvisual_trial.m: endsample = nearest(data.time{i}, cfg.latency(2)); private/prepare_timefreq_data.m: timesel = nearest(remember{1}.time, cfg.latency(1)):nearest(remember{1}.time, cfg.latency(2)); private/prepare_timefreq_data.m: timesel = nearest(varargin{c}.time, cfg.latency(1)):nearest(varargin{c}.time, cfg.latency(2)); utilities/ft_checkdata.m: ix(i) = 1-nearest(tmptime, begtime(i)); utilities/ft_checkdata.m: iy(i) = nearest(tmptime, endtime(i))-length(tmptime); utilities/ft_checkdata.m: time(nearest(time, max(endtime))+1:end) = []; utilities/ft_checkdata.m: begsmp(i) = nearest(time, data.time{i}(1)); utilities/ft_checkdata.m: endsmp(i) = nearest(time, data.time{i}(end)); utilities/ft_selectdata_new.m: tbeg = nearest(data.time, cfg.latency(1), false, false); utilities/ft_selectdata_new.m: tend = nearest(data.time, cfg.latency(2), false, false); utilities/ft_selectdata_new.m: tbin(1) = nearest(data.time, cfg.toilim(1), false, false); utilities/ft_selectdata_new.m: tbin(2) = nearest(data.time, cfg.toilim(2), false, false); utilities/ft_selectdata_new.m: fbeg = nearest(data.freq, cfg.latency(1), false, false); utilities/ft_selectdata_new.m: fend = nearest(data.freq, cfg.latency(2), false, false);

Diego Lozano Soldevilla - 2012-10-26 13:21:49 +0200

Created attachment 353 toy meg data 1 channel x 1 trial

Diego Lozano Soldevilla - 2012-10-26 13:22:33 +0200

(In reply to comment #8) Hi, Now nearest function does accurate job when selecting specific time of interest data (ft_redefinetrial, ft_selectdata). However, if one select data before time-freq analysis, we obtain one sample more that we need that mess the expected frequency resolution that one expect. Here and example (load the attachment): meg is a one meg channel data x 1 trial whose time length whose define as: cfg.trialdef.prestim = 2; % in seconds cfg.trialdef.poststim = 4.1; % in seconds cfg = ft_definetrial(cfg); %In my cfg.trialfun the samples are defined as at the tutorial see begsample = event(selectevents(i)).sample - cfg.trialdef.prestim*hdr.Fs; endsample = event(selectevents(i)).sample + cfg.trialdef.poststim*hdr.Fs - 1; The "- 1" sample at the end is used to avoid sample repetitions (I asumed but might be I'm wrong!!!) Then the "real" piece of data selected is [-2 4.0983] not 4.1 because my sampling rate is 600hz If one select a temporal piece of the data: cfg = []; cfg.toilim = [-1 3]; meg2 = ft_redefinetrial(cfg, meg); >> meg2.time{1,1}(1,1) ans = -1 >> meg2.time{1,1}(1,end) ans = 3 %Number of samples >> size(meg2.time{1,1}) ans = 1 2401 And here the problem because when you compute ft_freqanalysis the frequency resolution is not what you'd expect: cfg = []; cfg.output = 'pow'; cfg.method = 'mtmconvol'; cfg.taper = 'dpss'; cfg.foi = 40:2.5:100; cfg.toi = [-1:0.05:3]; cfg.trials = 1; cfg.t_ftimwin = ones(length(cfg.foi),1).*0.4; cfg.tapsmofrq = 5; cfg.keeptrials = 'yes'; freq1 = ft_freqanalysis(cfg, meg2); trial 1, frequency 25 (99.96 Hz), 3 tapers freq1.freq ans = Columns 1 through 15 39.9833 42.4823 44.9813 47.4802 49.9792 52.4781 54.9771 57.4761 59.9750 62.4740 64.9729 67.4719 69.9708 72.4698 74.9688 Columns 16 through 25 77.4677 79.9667 82.4656 84.9646 87.4636 89.9625 92.4615 94.9604 97.4594 99.9584 Without using the extra sample: cfg = []; cfg.latency = [-1 3-(1/600)]; cfg.trials = 1; meg3 = ft_selectdata_new(cfg,meg); cfg = []; cfg.output = 'pow'; cfg.method = 'mtmconvol'; cfg.taper = 'dpss'; cfg.foi = 40:2.5:100; cfg.toi = [-1:0.05:3]; cfg.trials = 1; cfg.t_ftimwin = ones(length(cfg.foi),1).*0.4; cfg.tapsmofrq = 5; cfg.keeptrials = 'yes'; freq3 = ft_freqanalysis(cfg, meg3); >> freq3.freq ans = Columns 1 through 15 40.0000 42.5000 45.0000 47.5000 50.0000 52.5000 55.0000 57.5000 60.0000 62.5000 65.0000 67.5000 70.0000 72.5000 75.0000 Columns 16 through 25 77.5000 80.0000 82.5000 85.0000 87.5000 90.0000 92.5000 95.0000 97.5000 100.0000 Should be a convention about the use or not of this extra sample because could have undesirable consequences for future analysis time-dependent. The easy solution is to subtract a sample after nearest.m at the endsample but not sure about the consequences/arbitrariness: In ft_redefinetrial, line 146; endsample(i) = nearest(data.time{i}, cfg.toilim(2));

Johanna - 2012-10-26 14:42:56 +0200

Hi Diego, Let's discuss in next FT meeting. My opinion is that the code behaves as expected (i.e. user calls for [-1 3] and gets this). The concept is the same as realizing that there are 2401 samples in the vector 0:1:2400. If you want 2400 samples, then using 1:2400 or 0:2399 are both valid, but lead to slightly different results.

Johanna - 2012-10-31 15:04:06 +0100

Discussed in FT meeting: behavior of ft_redefinetrial is as expected. However, Diego raises good point of some seeming inconsistencies or confusions regarding when the last sample is automatically dropped or not in different stages of processing. For ft_redefinetrial, please see options cfg.length and cfg.overlap for different ways of redefining other than via toilim. Also a discussion of whether trialfun_general needed to be changed due to an automatic subtraction of 1 sample, but we agreed no change needed. Action point: create FAQ on how/when specification of limits in varying functions work. See exsiting Diego will take the lead on creating this FAQ.

Diego Lozano Soldevilla - 2012-10-31 17:45:59 +0100

(In reply to comment #12) First FAQ draft created I've not included the cfg.overlap option yet because I don't get what's doing... when I do: cfg=[]; cfg.length=1; cfg.overlap=0; meg=ft_redefinetrial(cfg,dataFIC);% 300samples are selected cfg=[]; cfg.length=1; cfg.overlap=1; meg=ft_redefinetrial(cfg,dataFIC);% all samples that dataFIC contain are selected!! Diego

Robert Oostenveld - 2012-10-31 21:50:43 +0100

(In reply to comment #13) I suggest to name it after the question, not after the answer. E.g. before today, you were not aware of teh relevance of inclusive/exclusive. So something like Why do I get 1001 samples instead of 1000? or Why do I not get exact integer frequencies?

Diego Lozano Soldevilla - 2012-11-01 09:37:32 +0100

(In reply to comment #14) Agreed. Indeed I like both so I combined them Diego

Robert Oostenveld - 2014-04-15 18:55:53 +0200

Hi Diego, could you have a look at this? thanks, R

Diego Lozano Soldevilla - 2014-04-17 11:38:54 +0200

(In reply to Robert Oostenveld from comment #16) Hi Robert, I created the following FAQ to clarify the inclusive/exclusive interval selection: Let me know if it makes sense to you

Robert Oostenveld - 2015-02-18 15:41:49 +0100

(In reply to Diego Lozano Soldevilla from comment #17) Hi Diego, I rewrote it a bit and renamed it to Thanks for documenting this. If I read it I find the actual implementation logical, but I also realise that the consequences can be non-logical for people that are not aware of the consequence of presence or absence of this last sample.

Diego Lozano Soldevilla - 2015-02-18 17:13:47 +0100

(In reply to Robert Oostenveld from comment #18) thank you Robert! It's more logical now