Back to the main page.
Bug 2991 - implement support for the *.besa file format
Status | CLOSED FIXED |
Reported | 2015-10-21 20:29:00 +0200 |
Modified | 2019-08-10 12:32:31 +0200 |
Product: | FieldTrip |
Component: | external_besa |
Version: | unspecified |
Hardware: | PC |
Operating System: | Mac OS |
Importance: | P5 normal |
Assigned to: | Arjen Stolk |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
Robert Oostenveld - 2015-10-21 20:29:34 +0200
Hi Todor, We came across some external users with ECoG data from Nihon Koden that is stored in the *.besa file format. It seems that FieldTrip does not support this yet. Do you happen to have a MATLAB reader that you could share? thanks Robert
Robert Oostenveld - 2015-11-12 13:32:20 +0100
I had some email exchange with BESA and received this On 12 Nov 2015, at 13:25, Robert Spangler wrote: We decided to publish the file format description for our BESA format (*.besa). Furthermore, we are really interested in providing MATLAB routines to import/export *.besa files from BESA to Fieldtrip and vice versa, however we do not have the possibility to achieve this task since our developers are fully occupied at the moment. In case one of your users, or maybe one of your developers would be willing to start implementing these routines, we will provide all the support they need. The file format description can be downloaded from the link below: http://www.besa.de/downloads/file-formats/ @Arjen, I guess it is up to us to act upon this.
Robert Oostenveld - 2015-11-12 13:36:36 +0100
(In reply to Robert Oostenveld from comment #1) I had a quick look at the documentation. The documentation is well written in terms of details. Although it looks complex, the overall file format is not that hard since it is a "tagged file format" with blocks, each block with a mini header (with the ID and size) and the data. Decoding the structure of the file therefore is easy, decoding the content of (some of the) blocks will be more difficult.
Kris Anderson - 2015-11-14 05:44:57 +0100
Hi guys, I'm working on a Matlab importing script and have some questions concerning the documentation. Any idea how I could get in touch with someone at BESA for more info? 1 - Section 2.3 (CHNU) - An array of what kind of integers (int16, int32)? 2 - Section 2.3 (CHLS) - "If a non-positive value is found in the array, a value of "1.f" is used instead". Is 'f' the value just read, so the result would be something like 1.0 plus f divided by the minimum power of 10 greater than f? - Also, LSB does not affect floats (compressed or uncompressed), but does affect compressed int16s, correct? 3 - Section 2.3 (CHCU) - "Stored as an array <number of channels (CHNR)> series of 2-byte characters.". Just one character per channel, then? 4 - Section 2.3 (CHCM) - Same as above Lastly, I only have test data from a particular system that does not include many of the possible fields (events, most channel info, etc) in the file. The data I'm working with is not compressed, either. So far the script can read these files fine, but I'd like to make it work with all .besa files, generally. Could I possibly get some sample files from someone with a besa license? I'd be happy to share the finished script with BESA and fieldtrip. Thanks so much for getting documentation for the file format. I've been using an undocumented .exe file to convert from .besa to .edf before any analyses and having a way to directly convert will save a lot of time and space.
Todor Jordanov - 2015-11-16 09:34:18 +0100
Hi Kris, the specialist responsible for the besa file format is Robert Spangler (rspangler@besa.de). Please feel free to contact him for any questions about the format.
Arjen Stolk - 2015-11-16 09:37:10 +0100
(In reply to Todor Jordanov from comment #4) Hi Todor, Do you think you could get Robert to join this chat, such that we too can think along? Yours, Arjen
Robert Oostenveld - 2015-11-16 10:12:59 +0100
(In reply to Arjen Stolk from comment #5) Robert Spangler is CCed on this bugzilla thread, so he is automatically receiving emails that relate to this.
Robert Spangler - 2015-11-16 11:19:59 +0100
Hey Kris, If you have any question regarding the file format, just post them here. I am CCed to this thread. In your recent post you asked: 1 - Section 2.3 (CHNU) - An array of what kind of integers (int16, int32)? This is an int16 array storing a unique number for each channel. 2 - Section 2.3 (CHLS) - "If a non-positive value is found in the array, a value of "1.f" is used instead". Is 'f' the value just read, so the result would be something like 1.0 plus f divided by the minimum power of 10 greater than f? - Also, LSB does not affect floats (compressed or uncompressed), but does affect compressed int16s, correct? This value is only set if int16 data are written in the data blocks and was to convert the float values to int16 values during writing the data. For reading you need to multiply the int16 values in order to get the float values again: float value = int16*LSB If data are stored as float values, you can ignore this section. 3 - Section 2.3 (CHCU) - "Stored as an array <number of channels (CHNR)> series of 2-byte characters.". Just one character per channel, then? No, it can be multiple characters per channel, separated by '\0'. E.g.: mV\0µV\0nV\0 -> Channel 1: mV Channel 2: µV Channel 3: nV 4 - Section 2.3 (CHCM) - Same as above Yes, its the same procedure as for CHCU. E.g.: Bad channel\0Paused during recording\0Dummy channel\0My favourite channel\0 -> Channel 1: Bad channel Channel 2: Paused during recording Channel 3: Dummy channel Channel 3: My favourite channel I can send you some demo data in *.edf and *.besa format if you like (via Dropbox). Or you can give me EDF files (that contain well defined characteristics) and I will convert them to *.besa. So you can do unit tests after reading both files. Cheers, Robert
Robert Spangler - 2015-11-16 11:23:28 +0100
Sorry. Just found a typo after submission: 1 - Section 2.3 (CHNU) - An array of what kind of integers (int16, int32)? This is an int32 array storing a unique number for each channel. Robert
Kris Anderson - 2015-11-17 01:49:13 +0100
Thanks Robert, Here is a dropbox link to a .bdf file: https://www.dropbox.com/sh/cpi4fidcceuvzlt/AADj2IobeRvDS1wfLuTIu_6Wa?dl=0 Could you convert this to both a compressed and uncompressed .besa format?
Robert Spangler - 2015-11-17 15:53:45 +0100
Hey Kris, I converted the *.bdf file using 3 different types of compression: - no compression - maximum compression - fast compression Output data was written as float arrays. Please note that I did not convert any events, since there seems to be an issue with one of the events. Have to look at the implementation of our BDF reader to check what is happening there. You can download the files here: https://www.dropbox.com/sh/0maxgeeyxzei1cf/AACxvfGsQv9YzA4ZVheQ3fKva?dl=0 Robert
Robert Oostenveld - 2015-11-17 17:28:25 +0100
(In reply to Robert Spangler from comment #10) thanks for making progress on this. For my own reference (and for eventual including in the Donders FT testing framework): I hope copied the files to /home/common/matlab/fieldtrip/data/test/bug2991
Kris Anderson - 2015-11-22 02:06:35 +0100
In section 3.1.2.5, values for the prefix byte of various pre-compression schemes are defined. Is it possible that some schemes might be missing? I am stuck at the moment with a prefix value of 18 at one point in the (compressed) data. Everything up to that looks as expected. I just thought I'd ask because it looks like some possible scenarios aren't covered: - Second Scheme + first 2 entries int - Third Scheme + first 2 entries int - zlib on Second Scheme + first 2 entries int - zlib on Third Scheme + first 2 entries int
Kris Anderson - 2015-11-22 23:44:49 +0100
After going through the data, it appears that prefix 18 means zlib on Second Scheme + first 2 entries int.
Robert Spangler - 2015-11-23 12:13:06 +0100
There are two entries missing in the list: 18: zlib on Second Scheme + first 2 entries int 19: zlib on Third Scheme + first 2 entries int The following two options are not possible: - Second Scheme + first 2 entries int - Third Scheme + first 2 entries int Prefix byte values that are not mentioned in the file format description (1, 2, 10-12, 16, 20-28) are not used and could be handled by returning an error. The file format description will be updated accordingly!
Kris Anderson - 2015-11-23 22:56:20 +0100
Okay, these files are being read correctly. Could you upload a sample file with events? It doesn't have to match the .bdf I uploaded, just anything containing event blocks.
Robert Spangler - 2015-11-24 12:07:13 +0100
I added a couple of files that include events (triggers, comments, generic events, segment events) to the Dropbox folder: https://www.dropbox.com/sh/68c954j1chy2usy/AACa_hdH_7KBNF2j6YWjcVssa?dl=0
Kris Anderson - 2015-12-09 02:05:34 +0100
Created attachment 760 Preliminary readbesa matlab function
Kris Anderson - 2015-12-09 02:35:13 +0100
I've attached the function where I am at so far. All of the files can be read except for 'Segment - Generic.besa'. The first block has a prefix byte = 4 (Second scheme) and I am reading a value of 255 outside of an 'announcing byte' section, which is undefined. I've spent some time troubleshooting but can't find the problem. Going to keep trying to figure it out but I thought I'd post an update in the meantime and ask for any insight you guys might have. Some other notes: A) Page 39 Table 2. 'Third a' should have a delta=3, not 2 B) HSPC on Page 21 should be HSPD C) I'm a little confused about signed vs unsigned values for some elements. DATS on Page 33, for example. It specifies the number of samples written to a block of data, which should always be positive, but the doc says that it is a '32 bit integer value', not 32 bit unsigned integer. More than 2 billion samples is unlikely, but not sure what would happen in that case. D) There are some tags that are less than four characters described in the Event section (MPS, IMP, NR, VAL). Aren't all tags supposed to be 4 bytes? E) I don't understand what to do in the case a negative LSB is encountered in the Channel and Location block. Here is what the doc says: - If a non-positive value is found in the array, a value of "1.f" is used instead. What would 'f' be in this case?
Robert Oostenveld - 2015-12-09 07:39:11 +0100
(In reply to Kris Anderson from comment #18) Hi Kris, Thanks for the great work and progress you made. I have added the format FieldTrip-style to ft_filetype, which returns 'besa_besa' (manufacturer_subformat) and also added some preliminary code to ft_read_header, ft_read_data and ft_read_event. I am traveling at the moment and don't have test data with me, furthermore I have little time right now. Perhaps Arjan could have a go at testing it and extending the FT glue in the ft_read_xxx functions. One thing I realize to be dealt with is the passing of the data selection to the low level function (so that it can read a selection at a time). It might be that that is difficult to implement. Brute force (i.e. read everything, discard most) would be good enough to start with, then I can implemented the internal caching (read everything, return a small section but keep the rest for the next read call) like I have done it for some other formats. If caching is needed to make it sufficiently efficient to read small segments, then I can implement that (to keep it consistent with the caching for some other formats). mac011> svn commit Sending ft_filetype.m Sending ft_read_data.m Sending ft_read_event.m Sending ft_read_header.m Adding private/read_besa_besa.m Transmitting file data ..... Committed revision 10985.
Kris Anderson - 2015-12-09 23:03:33 +0100
It shouldn't be too difficult to only read part of the data, because the number of sample points in each data block is specified in the header. Though the minimum amount of data that can be read is one block, and the blocks can technically be any size. In practice they should be manageable. I made a separate function that reads the header so that can be read quickly and did a pull request on github.
Arjen Stolk - 2015-12-09 23:05:50 +0100
(In reply to Kris Anderson from comment #20) Nice work, Kris! As soon as Robert pulls the code in, we can test it out on some of our own previously recorded data.
Robert Oostenveld - 2015-12-09 23:50:10 +0100
(In reply to Arjen Stolk from comment #21) mac011> svn commit Sending fileio/private/read_besa_besa.m Adding fileio/private/read_besa_besa_header.m Transmitting file data .. Committed revision 10987. I merged the pull request into the original SVN copy, it will automatically end up in git.
Arjen Stolk - 2015-12-10 00:19:34 +0100
Thanks, Robert. Check 1: reading (supposedly) the same data (a besa file converted to edf). Seems that the header file is not consistent across edf and besa format yet (or reader functions). Will run a few more tests. >> bhdr = ft_read_header('/Users/arjsto/Documents/Ecog/data/IR29/2015100810_0004.besa') ehdr = ft_read_header('/Users/arjsto/Documents/Ecog/data/IR29/2015100810_0004.edf') Warning: all channels must have unique labels, creating unique labels > In ft_read_header (line 2124) bhdr = orig: [1x1 struct] nChans: 168 Fs: 5000 nSamples: 5363000 nSamplesPre: [] nTrials: 1 label: {168x1 cell} chantype: {168x1 cell} chanunit: {168x1 cell} Warning: all channels must have unique labels, creating unique labels > In ft_read_header (line 2124) ehdr = Fs: 5000 nChans: 169 label: {169x1 cell} nSamples: 5363030 nSamplesPre: 0 nTrials: 1 orig: [1x1 struct] chantype: {169x1 cell} chanunit: {169x1 cell}
Robert Spangler - 2015-12-14 12:24:02 +0100
(In reply to Kris Anderson from comment #18) Hey Chris, here are some notes regarding your questions from post #18: Some other notes: A) Page 39 Table 2. 'Third a' should have a delta=3, not 2 -> Correct. I updated the document. There is a new version available here: http://www.besa.de/downloads/file-formats/ B) HSPC on Page 21 should be HSPD -> HSPC is used for head surface point coordinates (in mm). HSPD is used for head surface point labels. C) I'm a little confused about signed vs unsigned values for some elements. DATS on Page 33, for example. It specifies the number of samples written to a block of data, which should always be positive, but the doc says that it is a '32 bit integer value', not 32 bit unsigned integer. More than 2 billion samples is unlikely, but not sure what would happen in that case. -> We also use this integer value internally to return error codes with negative values. Therefore, we chose a signed instead of an unsigned value. I agree that 2 billion samples should cover most of the cases, so a signed integer should be fine. D) There are some tags that are less than four characters described in the Event section (MPS, IMP, NR, VAL). Aren't all tags supposed to be 4 bytes? -> All tags consist of 4 characters. However, some tags (MPS, IMP, NR, VAL, ...) only use 2 or 3 characters to define the type. Remaining characters are filled with whitespace characters. E) I don't understand what to do in the case a negative LSB is encountered in the Channel and Location block. Here is what the doc says: - If a non-positive value is found in the array, a value of "1.f" is used instead. What would 'f' be in this case? -> This is a C++ statement (the documentation was written by developers). The .f tells the compiler to interpret the literal as a floating point number of type float. Without the .f the number gets interpreted as an integer. In C++ source code we need to do this to get a float value. In Matlab code, you do not have to do anything if the LSB is negative. Just interpret the data you read from file as float values. Robert
Kris Anderson - 2016-01-21 01:44:24 +0100
Hi all, I modified the read_besa_besa function to match the behavior of read_edf and submitted a pull request: https://github.com/fieldtrip/fieldtrip/pull/73 It looks like EDF is storing an extra channel with event information which BESA doesn't have. Event information is stored in the BESA header. Exporting the event information from a .besa file to fieldtrip still needs to be implemented, but continuous files are working. Not all compression formats have been tested with this script, so people should be careful using it.
Arjen Stolk - 2016-01-21 01:53:38 +0100
(In reply to Kris Anderson from comment #25) Great work, Kris. I'll run a couple of tests as soon as Robert pulls your git-request!
Robert Oostenveld - 2016-01-22 12:31:48 +0100
(In reply to Kris Anderson from comment #25) Thanks. I made a pull73 branch, merged yours and rebased it to master. Then I did a diff to master to determine the precise patch. Using that patch I patched the latest svn version. mac011> patch -p1 < ~/Desktop/patch73 patching file fileio/ft_filetype.m patching file fileio/ft_read_data.m patching file fileio/ft_read_header.m patching file fileio/private/read_besa_besa.m patching file fileio/private/read_besa_besa_header.m mac011> svn commit Sending fileio/ft_filetype.m Sending fileio/ft_read_data.m Sending fileio/ft_read_header.m Sending fileio/private/read_besa_besa.m Sending fileio/private/read_besa_besa_header.m Transmitting file data ..... Committed revision 11104.
Robert Oostenveld - 2016-01-22 12:32:46 +0100
please note that I am making progress with bug 3049 and that full migration from svn to github is getting closer...
Arjen Stolk - 2016-01-22 20:22:03 +0100
Thanks both. Ran a couple of quick tests. Apart from a 30 to 40 samples mismatch between EDF and BESA, and EDF adding another (annotation) channel, it seems to be fine. I ran a check to see whether my photodiode event detection produces the same samples-of-interest for both the EDF and BESA files, and it does. What needs to be done still, is to write a wiki page on how to get started with besa in fieldtrip. I have made a start here: http://www.fieldtriptoolbox.org/getting_started/besa Could any of the besa guys and ladies maybe fill in the missing information on the header description (in particular the 'event' information), and background information about the besa company & data format? For an example of the latter, see introduction section of: http://www.fieldtriptoolbox.org/getting_started/edf Doesn't have to be that elaborated. Thanks, Arjen DATASET 1: >> hdr = ft_read_header([getenv('HOME') '/Projects/Ecog/data/IR30/2015102714_0003.edf' ]) Warning: all channels must have unique labels, creating unique labels > In ft_read_header (line 2135) hdr = Fs: 5000 nChans: 191 label: {191x1 cell} nSamples: 9719040 nSamplesPre: 0 nTrials: 1 orig: [1x1 struct] chantype: {191x1 cell} chanunit: {191x1 cell} >> hdr_b = ft_read_header([getenv('HOME') '/Projects/Ecog/data/IR30/2015102714_0003.besa' ]) Warning: all channels must have unique labels, creating unique labels > In ft_read_header (line 2135) hdr_b = orig: [1x1 struct] nChans: 190 Fs: 5000 nSamples: 9719000 nSamplesPre: 0 nTrials: 1 label: {190x1 cell} chantype: {190x1 cell} chanunit: {190x1 cell} DATASET 2: >> hdr = ft_read_header([getenv('HOME') '/Projects/Ecog/data/IR29/2015100810_0004.edf' ]) Warning: all channels must have unique labels, creating unique labels > In ft_read_header (line 2135) hdr = Fs: 5000 nChans: 169 label: {169x1 cell} nSamples: 5363030 nSamplesPre: 0 nTrials: 1 orig: [1x1 struct] chantype: {169x1 cell} chanunit: {169x1 cell} >> hdr_b = ft_read_header([getenv('HOME') '/Projects/Ecog/data/IR29/2015100810_0004.besa' ]) Warning: all channels must have unique labels, creating unique labels > In ft_read_header (line 2135) hdr_b = orig: [1x1 struct] nChans: 168 Fs: 5000 nSamples: 5363000 nSamplesPre: 0 nTrials: 1 label: {168x1 cell} chantype: {168x1 cell} chanunit: {168x1 cell}
Robert Oostenveld - 2016-01-22 20:42:11 +0100
to all: I realized that we already have a BESA page at http://www.fieldtriptoolbox.org/integrating_with/integrating_with_besa that is part of this series http://www.fieldtriptoolbox.org/integrating_with which nowadays does not seem to be linked in the menu (and therefore almost impossible to find). We should avoid overlap. Actually, I suggest that - now that the "getting started" series becomes more and more elaborate - we move all 4 (besa, spm8, eeglab, loreta) to the getting started section and delete the old "integrating with" section. The 5th link is to a faq which is also linked from elsewhere.
Robert Oostenveld - 2016-01-22 21:02:40 +0100
(In reply to Robert Oostenveld from comment #30) I have merged them, i.e. there is now a single section with links to all the system-specific getting started pages. http://www.fieldtriptoolbox.org/getting_started/shared That section is shared by including it in these two pages http://www.fieldtriptoolbox.org/getting_started http://www.fieldtriptoolbox.org/reading_data where the 1st one is a top-level menu item and the 2nd one is located under "user documentation -> importing your data" I simply merged the old BESA section to the new page. The whole section should be reviewed and cleaned up. Next week I'll be doing an educational session together with Harald (from BESA, now also CC) in Finland and will discuss with him.
Arjen Stolk - 2016-01-22 23:12:41 +0100
Excellent!
Kris Anderson - 2016-01-27 01:45:13 +0100
Created attachment 768 .besa minus .edf converted
Kris Anderson - 2016-01-27 01:47:16 +0100
Here are some results from comparing data obtained from a Nihon Kohden system in .besa and converted .edf format: - The median difference between waveforms is 0.00002uA. Max difference is 0.006uA. See histogram of differences attached. This is probably floating point error. - .edf file contains 1 extra channel, an annotations channel that does not exist in the .besa file - .edf file has some extra samples at the end of the file, in one case, 5363030 samples vs 5363000 samples. The extra samples are just flat with a DC offset.
Robert Oostenveld - 2016-01-27 10:03:22 +0100
(In reply to Kris Anderson from comment #34) Hi Kris, Thanks for the great work. To me it seems that (besides for the documentation) that the issue can be closed. I am in a meeting with Harald (CC< from BESA) and have suggested him to also review and improve http://www.fieldtriptoolbox.org/getting_started/besa best Robert
Arjen Stolk - 2016-01-28 06:25:06 +0100
Excellent. Thanks in advance, Harald. Arjen