Back to the main page.
Bug 1240 - ft_read_data call ft_filetype over and over again in call to ft_getopt
Status | CLOSED FIXED |
Reported | 2011-12-22 12:30:00 +0100 |
Modified | 2012-04-11 16:48:38 +0200 |
Product: | FieldTrip |
Component: | fileio |
Version: | unspecified |
Hardware: | PC |
Operating System: | Linux |
Importance: | P4 enhancement |
Assigned to: | Robert Oostenveld |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
Roemer van der Meij - 2011-12-22 12:30:41 +0100
This clogs the screen when the filetype cannot be determined, and easily causes 1000s of unnecessary if-elseif calls. My suggestion would be to make an exception like (keeping consistent with ft_getopt): dataformat = ft_getopt(varargin, 'dataformat', []); if isempty(dataformat) dataformat = ft_filetype(filename); end Or to switch back to the old style of using isfield. The above would be best I think. Shall I make this change?
Roemer van der Meij - 2011-12-22 12:31:36 +0100
CC: Robert (I of course meant over and over again when reading in data, not inside ft_getopt).
Boris Reuderink - 2011-12-22 13:13:30 +0100
Changed status to new, defect by developer. Robert, could you decide if you agree with the solution proposed by Roemer?
Robert Oostenveld - 2011-12-22 14:27:51 +0100
ft_filetype should be very fast in returning the value as long as the filename did not change. It should not get further than line 90 in the code, i.e. ... % these are for remembering the type on subsequent calls with the same input arguments persistent previous_argin previous_argout previous_pwd if nargin<2 % ensure that all input arguments are defined desired = []; end current_argin = {filename, desired, varargin{:}}; current_pwd = pwd; if isequal(current_argin, previous_argin) && isequal(current_pwd, previous_pwd) % don't do the detection again, but return the previous value from cache type = previous_argout{1}; return <----- UP TO HERE end If that does not happen, then that is because of an intermittent call to ft_filetype with another filename. Can you check, eg. add disp(['DEBUGGING ' filename]) at the begin and then see whether the filename alternates? If so, the problem might have to be resolved elsewhere.
Roemer van der Meij - 2011-12-22 14:39:50 +0100
Well, the ft_read_data call occurs at every trial. So at every trial, it tries to determine the filetype automatically when parsing the input into ft_read_data, and throws a warning. The filename doesn't change over calls (it's reported in the warning). I hadn't thought of the line 90 code to stop all the if-elseifs, so it doesn't take a lot of compuation time. But it's still unexpected behavior. If you have set your dataformat (like I do in the case of my own format), there is no reason to automatically determine the filetype. It only makes sense to do that when the dataformat is empty. That would avoid a warning for every trial that you read in, each taking up six lines in the command line. I doesn't happen often that a function is necessary during an ft_getopt, so it doesn't make sense to build-in functionality there to execute function calls, hence the idea for the simple workaround.
Robert Oostenveld - 2011-12-22 17:39:36 +0100
It being called many times should not be the issue. That is a design principle that I think requires discussion in the following FT meeting in the new year. But the warning being printed on the first call, whereas the call is only to determine the default value that remains unused anyway, is indeed unexpected behaviour. Similar occurrences should be in the code for other type-detections. I was able to find the following: manzana> grep getopt.*type\( *.m */*.m */*/*.m fileio/ft_flush_data.m:dataformat = ft_getopt(varargin, 'dataformat', ft_filetype(filename)); fileio/ft_flush_event.m:eventformat = ft_getopt(varargin, 'eventformat', ft_filetype(filename)); fileio/ft_flush_header.m:headerformat = ft_getopt(varargin, 'headerformat', headerformat = ft_filetype(filename)); fileio/ft_read_data.m:dataformat = ft_getopt(varargin, 'dataformat', ft_filetype(filename)); fileio/ft_read_event.m:eventformat = ft_getopt(varargin, 'eventformat', ft_filetype(filename)); fileio/ft_read_header.m:headerformat = ft_getopt(varargin, 'headerformat', ft_filetype(filename)); % the default is automatically detected fileio/ft_read_headshape.m:fileformat = ft_getopt(varargin,'format', ft_filetype(filename)); fileio/ft_read_mri.m:mriformat = ft_getopt(varargin, 'format', ft_filetype(filename)); fileio/ft_read_sens.m:fileformat = ft_getopt(varargin, 'fileformat', ft_filetype(filename)); fileio/ft_read_spike.m:spikeformat = ft_getopt(varargin, 'spikeformat', ft_filetype(filename)); fileio/ft_read_vol.m:fileformat = ft_getopt(varargin, 'fileformat', ft_filetype(filename)); fileio/ft_write_data.m:dataformat = ft_getopt(varargin, 'dataformat', ft_filetype(filename)); fileio/ft_write_event.m:eventformat = ft_getopt(varargin, 'eventformat', ft_filetype(filename)); fileio/ft_write_mri.m:dataformat = ft_getopt(varargin, 'dataformat', ft_filetype(filename)); But I still don't understand why you get many warnings? You should only get one. B.t.w. what is your file format that remains undetected?
Roemer van der Meij - 2011-12-23 10:41:28 +0100
Ah, I think the multiple warnings is due to the code at line 1000. If filetype is unknown, it doesn't save the argout and such as persistent variables. Is there a reason for this I am thinking of right now? Because, if the input stays the same, than the output should remain the same, regardless of whether the filetype is unknown or not right? The specific fileformat I am using is the one I implemented some two years ago, which is only usable for our archive and a collaboration we have right now (nmc_archive_k). It's a fileformat that has a too generic filetype to be determined automatically (and it's not necessary since it is specified by the user anyway).
Robert Oostenveld - 2011-12-23 10:52:00 +0100
the 'unknown' not being cached in the persistent variable is more a safety feature. The function is not only called when reading, but might also get called just prior to writing (to figure out the desired output format, which is then probably specified by the used, but ft_getopt sill would call ft_filetype to get a potential default). The file then does not yet exist, which causes the filetype detection to fail. ft_getopt and the writing code don't care, because the user has to specify it anyway. On the next (read) call the file will exist and the filetype detection could work, were it not that the 'unknown' would then be remembered from the previous call. So I think that a save change would be at the end - if the file exists and the type is not known, then remember it for the next call Upon the next call the file will be the same (assuming that its content did not change). Do you actually understand why there is the check on isempty(previous_argin)?
Robert Oostenveld - 2012-01-20 09:51:59 +0100
@Roemer, can you please read my comment 7 and see whether it resolves the issue. If so, I suggest to close the bug.
Roemer van der Meij - 2012-01-20 10:35:51 +0100
Hi Robert, Sorry, totally forgot about the bug. I now added another elseif at line 990 or so. If the file or dir already exist, and the type is unknown, it saves the current output. Multiple irrelevant warnings are no longer given, the fix works. Filename can also be a directory in case of some recording devices right? (CTF?). At the moment it checks for the existence of filename as a file or as a dir.
Robert Oostenveld - 2012-01-20 10:52:12 +0100
(In reply to comment #9) filename can indeed also be a directory. There are multiple dataformats (CTF is one of them) with a collection of well-defined files in a directory where the directory is used to specify the dataset.