Back to the main page.

Bug 2411 - implement matching pursuit decomposition

Status CLOSED WONTFIX
Reported 2013-12-04 11:15:00 +0100
Modified 2019-08-10 12:32:51 +0200
Product: FieldTrip
Component: core
Version: unspecified
Hardware: PC
Operating System: Mac OS
Importance: P3 normal
Assigned to: Haiteng Jiang
URL:
Tags:
Depends on:
Blocks:
See also:

Robert Oostenveld - 2013-12-04 11:15:33 +0100

On 4 Dec 2013 Pawel wrote: we have a working matching pursuit (with gabor function dictionary) Matlab implementation. I would like to make this algorithm more avaliable for neuroscience community, do you think adding it (as a method for time-frequency energy map estimation or a preprocessing step for inverse solution) to Fieldtrip Toolbox will be a good idea? If you are interested in such an idea, could you provide us any help with designing the code integration? (e.g. what function should we look for to assert the same input/output, where to place the code etc)


Robert Oostenveld - 2013-12-04 11:19:44 +0100

I suspect it would fit the best under the umbrella of ft_freqanalysis, i.e. taking raw data as input and a time-frequency representation as output. That would require it to be implemented in the same style as ft_specest_hilbert.m ft_specest_mtmconvol.m ft_specest_mtmfft.m ft_specest_tfr.m ft_specest_wavelet.m which are in the fieldtrip/specest subdirectory. Code on which it depends could go in fieldtrip/specest/private. These "specest" functions define the Application Programming Interface (API) between the high-level and the low-level code. See also http://fieldtrip.fcdonders.nl/development/specest for a description and please report here whether you think that might work. I can imagine that some bookeeping changes would be needed in ft_freqanalysis itself, but hopefully those changes should not be too large.


Haiteng Jiang - 2013-12-04 12:37:14 +0100

Hi Robert , I have interest in implementing since I actually have one publication on Matching pursuit.http://www.sciencedirect.com/science/article/pii/S0167876013000974. Let us discuss it in today's FT meeting later. Haiteng(In reply to comment #1)


Robert Oostenveld - 2013-12-05 14:37:30 +0100

Hi Pawel, We discussed it in a meeting. The plan is to make a directory fieldtrip/contrib/matchingpursuit and put your code in there. There should be one function in there called ft_specest_matchingpursuit, which would be called from ft_freqanalysis. My DCCN colleague Haiteng will help you get it implemented. Note that Haiteng is new to FT development, so he has to learn svn etc. along the way.


- 2014-05-13 18:44:47 +0200

Dear All, sorry but I had no time for this project for some time, now it is much more better, shall we get back to the topic now?


Haiteng Jiang - 2014-05-14 14:26:16 +0200

Hi Pawel, Thanks for getting back. At this moment, I am busy with implementing cross-frequency analysis function as well. As Robert proposed , it would fit the best under the umbrella of ft_freqanalysis. Could you please tell me the way you want to get started? I will devote more time on this in June since I have a hard deadline at the end of this month.


- 2014-05-15 10:32:45 +0200

(In reply to Haiteng Jiang from comment #5) Hi Haiteng, I know you are familiar with matching pursuit (MP) but let me explain the algorithm to you again so you can understand how I see the pros and cons of this approach. MP is an algorithm that provides sparse signal representation by means of, so called, atoms i.e. functions chosen from predefined set (dictionary). The dictionary is an overcomplete base for the signal space i.e. for the input signal having N samples the dictionary contains M >> N functions (shapes, waveforms) and some subsets of the dictionary are bases for the signal space. Therefore there is more than one way to express the input signal by means of this atoms. One can use a customized dictionary. Nevertheless we opt to use Gabor function set (sine oscilation multiplied by a Gaussian envelope). Why? First of all Gabor functions provide the best time-frequency resolution. Moreover you can compute time-frequency representation for each of selected atoms separately so to obtain cross-terms free time-frequency map. Each Gabor function is given by only 5 parameters (frequency and phase of sine, width and center of Gauss, and amplitude), so if your signal could be described well using K atoms you end up with a very sparse (compressed) representation i.e. you need only 5*K numbers to express your signal. Eventually this choice allows to take advantage of FFT during computation what makes the algorithm fast enough. MP is an iterative algorithm i.e. first it selects the best atom from the dictionary, that is adjusted to current signal (by means of amplitude and phase) and later on subtracted from the current signal, finally it repeats the procedure with obtained residuum. Now you can see the main drawbacks of MP. First of all -- greediness -- the selection criterion is purely energetic; second -- the stoping criterion problem i.e. how many atoms (iteration) you should compute, when to stop? E.g. is 99% of signal energy explained a good result? My collages like to compute a lot of iteration (like 200) and then filter out interesting atoms, but taking into account the greediness in some application the 100th iteration could be meaningless. You can apply MP to single channel signal (averaged or single trial) but also to multichannel or multitrial data. There are two flavours of multivariate MP. In the first one atoms are selected based on the averaged signal, in the next step they are adjusted to single trials (it is blind for non phase locked activity, sometimes it is called evoked MP). In the second one atoms are selected based on all single trials simultanously, but it is more time consuming. MP was used not only as a time-frequency metod but also as a inverse soultion preprocessing method and as a denoising method. I will take a look at ft_specest_* and wite you more.


Robert Oostenveld - 2014-05-15 11:31:43 +0200

(In reply to pawel.kordowski from comment #6) thanks for the clear explanation on the background. My suggestion is to (try to) implement a function in fieldtrip/specest with the name ft_specest_matchingpursuit. If possible it should have the same function specification as the others, i.e. function [spectrum,ntaper,freqoi] = ft_specest_mtmfft(dat, time, varargin) function [spectrum,freqoi,timeoi] = ft_specest_hilbert(dat, time, varargin) function [spectrum,freqoi,timeoi] = ft_specest_tfr(dat, time, varargin) function [spectrum,freqoi,timeoi] = ft_specest_wavelet(dat, time, varargin) function [spectrum,ntaper,freqoi,timeoi] = ft_specest_mtmconvol(dat, time, varargin) with dat and time as first two inputs, and all others as key-value pairs in the varargin. The output should be the decomposed "spectrum" and its description. See e.g. >> [a, b, c] = ft_specest_wavelet(randn(20,30), 1:30); >> whos a b c Name Size Bytes Class Attributes a 20x15x30 144000 double complex b 1x15 120 double c 1x30 240 double you see that the first output is a complex valued Nchans by Nfrequencies by Ntime. The 2nd output describes the frequency dimension, the 3rd the time dimension. The channel dimension does not change and needs no description. As I understand it, the frequency dimension would now be replaced with the "atom" or dictionary dimension. But each atom still could be described with a number (frequency), right? The high-level ft_freqanalysis function wraps around the present ft_specest_xxx functions, loops over trials that are present in the data, and computes the cross-spectrum if needed. Since not all atoms would be present in all trials, I would for the moment assume that trials where an atom was _not_ selected, it would be represented with a nan. That assumes single trial MP, though. To get the multitrial MP implemented with this (restricted) function interface, I suggest something like this in the high-level code: [avgdecomp,freqoi,timeoi] = ft_specest_matchingpursuit(dat, time, ...) where dat is the averaged data over trials, i.e. nchan*ntime, followed by for i=1:ntrial [trldecomp,freqoi,timeoi] = ft_specest_matchingpursuit(dat, time, 'avgdecomp', avgdecomp, ...) end i.e. where the avgdecomp would be used in the loop over trials to specify which atoms to use in each individual trial. Of course in the "..." the remaining specification of the MP parameters would be passed along. Do you think that this would work for the general outline of the function and how it would be called from the higher-level code?


Haiteng Jiang - 2014-05-15 15:55:58 +0200

Hi Pawel & Robert, Thanks for your helpful input. The pipeline looks nice. Two more cents from me: 1 Computation time issue: If we want to support multi-channel /multi-trial optional , I am quite suspicious that 200 iteration is enough to get convergence (e.g. 0.99)based on my experience. MP is actually very time-consuming in real M/EEG application. Nevertheless , let us test it first. 2 In order to denoise after MP, we need to remove 'noisy' atoms , which means another function (e.g., ft_rejectatoms) is required like ft_rejectcomponent.


- 2014-05-15 17:37:11 +0200

As I understand it, the frequency dimension would now be replaced with the "atom" or dictionary dimension. But each atom still could be described with a number (frequency), right? I'M NOT SURE. FROM MP POINT OF VIEW THE MOST NATURAL OUTPUT IS A MATRIX (LET'S CALLED IT THE BOOK) Kx5 CONTAINING ATOMS (REPRESENTED BY 5 REAL NUMBERS PER ITERATION: AMPLITUDE, FREQUENCY, WIDTH, POSITION AND PHASE) SELECTED IN EACH OF THE K ITERATIONS. SUCH AN OUTPUT IS NICE BECAUSE YOU CAN FILER OUT ATOMS OF INTEREST BASED ON E.G. FREQUENCY OR POSITION. BUT THEN THE ft_freqanalysis WOULD HAVE TO KNOW THAT MP DECOMPOSITION WAS PERFORMED AND BEHAVE IN DIFFERENT WAY I.E. COMPUTE CORSS-SPECTRUM BASED ON THE BOOK. IS IT OK? OR SHALL WE ADD ONE MORE FUNCTION IN BETWEEN? Since not all atoms would be present in all trials, I would for the moment assume that trials where an atom was _not_ selected, it would be represented with a nan. That assumes single trial MP, though. NOT ALL ATOMS FROM DICTIONARY ARE PRESENT IN ALL TRIALS, BUT ALL ATOMS FROM THE BOOK ARE PRESENT IN ALL TRIALS. THE DICTIONARY IS REALLY BIG, IT'S NOT EFFICIENT TO KEEP SO MUCH NANs, IT'S BETTER TO REMEMBER ONLY THE CHOSEN ATOMS. To get the multitrial MP implemented with this (restricted) function interface, I suggest something like this in the high-level code: [avgdecomp,freqoi,timeoi] = ft_specest_matchingpursuit(dat, time, ...) where dat is the averaged data over trials, i.e. nchan*ntime, followed by for i=1:ntrial [trldecomp,freqoi,timeoi] = ft_specest_matchingpursuit(dat, time, 'avgdecomp', avgdecomp, ...) end i.e. where the avgdecomp would be used in the loop over trials to specify which atoms to use in each individual trial. Of course in the "..." the remaining specification of the MP parameters would be passed along. IT IS OK IN CASE OF EVOKED MP, IN CASE OF INDUCED MP WE NEED ALL SINGLE TRIALS TO SELECT THE BEST ATOM IN EACH ITERATION. IS IT NOT BETTER TO USE ONE KEY-VALUED INPUT TO CHOOSE WHICH MP WE WANT? IN CASE OF MULTICHANNEL, MULTITRIAL SIGNAL YOU CAN PERFORM: 1) MP SEPARATELY FOR EACH CHANNEL AND EACH TRIAL (HERE YOU GET ONE BOOK PER TRIAL AND PER CHANNEL SO N_CH x N_TR DIFFERENT BOOKS) 2-5) INDUCED/EVOKED MP IN CHANNEL/TRIAL DIMENSION (4 OPTIONS) (HERE YOU GET ONLY ONE BOOK AND DIFFERENT AMPLITUDES OR PHASES FOR EACH TRIAL AND CHANNEL) 6-9) INDUCED/EVOKED MP IN ONLY ONE DIMENSION; SEPARATED DECOMPOSITION IN THE OTHER (AGAIN 4 OPTIONS) (HERE YOU GET ONE BOOK PER THE "SEPARATED" DIMENSION) MOREOVER 2-9 CAN BE COMPUTED WITH ADJUSTING ONLY AMPLITUDE OR BOTH AMPLITUDE AND PHASE, SO FINALLY THERE IS 13 OPTIONS (2-5 DOUBLED NOW) :-) I'M NOT SURE IF IT IS EASY TO PROVIDE ALL OPTIONS :-) (In reply to Robert Oostenveld from comment #7)


Robert Oostenveld - 2014-05-16 10:21:13 +0200

regarding the book with Kx5 (AMPLITUDE, FREQUENCY, WIDTH, POSITION AND PHASE) and regarding the proposed [datdecomp,freqoi,timeoi] = ft_specest_matchingpursuit(dat, time, ...) I would represent amplitude and phase together as a complex number in the datdecomp array, frequency in the freqoi vector and position in the timoi vector. The 3-D array datdecomp is Nchan*Nfreq*Ntime (or Nposition, since I guess that position refers to position along the time axis). However, I am not sure whether this would really work. Could you give me a single function like this: book = test_matchingpursuit(dat, ...) where dat is a 1*Ntime vector and book is a K*5 matrix? That would help me to understand the issue with regards to consistent data representation and efficiency.


Jan-Mathijs Schoffelen - 2016-02-19 11:53:28 +0100

is this ever going to happen?


Haiteng Jiang - 2016-02-21 12:10:08 +0100

(In reply to Jan-Mathijs Schoffelen from comment #11) Apologize for not getting back to this. I really had time pressure to finish my PhD on time after the initiation of this bug. I did some matching pursuit research during my master. This topic is not my main research interest any more. I am afraid that I don't want to continue. Therefore, Robert or Jan-Mathijs can ask someone else to take over. Apologize again!


Robert Oostenveld - 2016-02-21 14:12:48 +0100

Let's close this. If anyone is interested and willing to actively contribute, please reopen.


Robert Oostenveld - 2019-08-10 12:32:51 +0200

This closes a whole series of bugs that have been resolved (either FIXED/WONTFIX/INVALID) for quite some time. If you disagree, please file a new issue on https://github.com/fieldtrip/fieldtrip/issues.