Back to the main page.

Bug 2134 - ft_pre/postamble_provenance: find alternative for calculating hash

Status	NEW
Reported	2013-04-24 15:47:00 +0200
Modified	2019-04-02 13:46:36 +0200
Product:	FieldTrip
Component:	core
Version:	unspecified
Hardware:	PC
Operating System:	Windows
Importance:	P3 normal
Assigned to:
URL:
Tags:
Depends on:
Blocks:
See also:

Roemer van der Meij - 2013-04-24 15:47:14 +0200

Right now, input/output data is identified by a unique md5 hash based on all data (except data.cfg's). In order to do this, the data is first 'serialized' into a separate variable, i.e. getting all the bytes that form the data into a linear (contiguous) representation. The is necessary for calculating the md5 hash using CalcMD5, a function from the matlab file exchange. This causes several problems: 1) CalcMD5 cannot handle input bigger than 2^31 bytes, meaning we don't have hashes for big input 2) this serialization causes short memory spikes, as the data is temporarily re-represented Problem 1 could possibly be solved easily by searching for another function to calculate the hash (perhaps also on the file-exchange). Problem 2 is more difficult. We discussed this a little in the meeting (24-4-13), where it was suggested to calculate the hash on a part of the data (hopefully enough to get a unique stable identifier). However, the problem with this approach is that it is data-type specific. Another option brought forward was to maybe 'sub-serialize', i.e. only get every other byte, or every 10th byte for that matter.

Jan-Mathijs Schoffelen - 2019-04-02 13:46:36 +0200

https://nl.mathworks.com/matlabcentral/fileexchange/31272-datahash https://nl.mathworks.com/matlabcentral/fileexchange/25921-getmd5