Back to the main page.
Bug 2134 - ft_pre/postamble_provenance: find alternative for calculating hash
| Status | NEW |
| Reported | 2013-04-24 15:47:00 +0200 |
| Modified | 2019-04-02 13:46:36 +0200 |
| Product: | FieldTrip |
| Component: | core |
| Version: | unspecified |
| Hardware: | PC |
| Operating System: | Windows |
| Importance: | P3 normal |
| Assigned to: | |
| URL: | |
| Tags: | |
| Depends on: | |
| Blocks: | |
| See also: |
Roemer van der Meij - 2013-04-24 15:47:14 +0200
Right now, input/output data is identified by a unique md5 hash based on all data (except data.cfg's). In order to do this, the data is first 'serialized' into a separate variable, i.e. getting all the bytes that form the data into a linear (contiguous) representation. The is necessary for calculating the md5 hash using CalcMD5, a function from the matlab file exchange. This causes several problems: 1) CalcMD5 cannot handle input bigger than 2^31 bytes, meaning we don't have hashes for big input 2) this serialization causes short memory spikes, as the data is temporarily re-represented Problem 1 could possibly be solved easily by searching for another function to calculate the hash (perhaps also on the file-exchange). Problem 2 is more difficult. We discussed this a little in the meeting (24-4-13), where it was suggested to calculate the hash on a part of the data (hopefully enough to get a unique stable identifier). However, the problem with this approach is that it is data-type specific. Another option brought forward was to maybe 'sub-serialize', i.e. only get every other byte, or every 10th byte for that matter.