Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add md5 & some non-crypto hashes. Close #34 #44

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add md5 & some non-crypto hashes. Close #34 #44

wants to merge 1 commit into from

Conversation

dcbishop
Copy link

Adds md5, xxhash, spooky, murmur and 2 variants of crc32

@Kubuxu
Copy link
Member

Kubuxu commented Aug 15, 2016

As we are adding more and more hash functions:
can we start adding default and maximum lengths of the hashes alongside them?

Why the split on default and max: for example blake2b produces hash output from 1 to 32 bytes, and other hashes can only be truncated in which case the default and maximum will be the same.

This will make implementing multihash in various languages much easier.

@jbenet
Copy link
Member

jbenet commented Aug 15, 2016

Thanks @dcbishop

@jbenet
Copy link
Member

jbenet commented Aug 15, 2016

@Kubuxu

We could track that in the csv, yeah.

but the implementations should not try to check and validate maxes on each of the hashes, that will be annoying to track and a source for bugs across implementations. implementations should have a global max they want to impose.

0x66, Murmur3C
0x67, Murmur3F
0x75, crc32
0x76, crc32-C
Copy link
Member

@jbenet jbenet Aug 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should source more of the "non cryptographic hashes", at least the useful ones. List of more:

and

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for non-cryptographic hashes? I don't really think they have their place in the mutihash spec. I like the description in RFC 6920:

The various formats are designed to support, but not require, a
strong link to the referenced object, such that the referenced object
may be authenticated to the same degree as the reference to it.

Unless there is a use /not/ to reference object such that they can be authenticated by the hash, you could of course add MD5, CRC-32 and so on, but I would stay away from it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawal we decided to do this some time ago #30 -- unfortunately, most cryptographically secure hash functions turn into hash functions that are not cryptographically secure. So the distinction would only be historical.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fantastic to have "multihash" being an IETF supported document, but adding hashes to the list like this would make it rather impossible.

The good thing by having the IANA registry is that algorithms in the registry can be clearly marked as being deprecated. This has happened to MD5 in almost all present algorithm registries. See for example here:
http://www.iana.org/assignments/dns-sec-alg-numbers/dns-sec-alg-numbers.xhtml

If going for your own registry, please have one or more extra fields describing these properties to the algorithms, with a possible source reference.

So you can add "status", and "reference" to the csv table.

@dcbishop
Copy link
Author

dcbishop commented Aug 15, 2016

@jbenet

It's worth considering that there are a lot of non-cryptographic hashes out there. Heaps when you include all the variants. CRC alone has 52.

A single byte isn't enough for them.

Might be worth considering a UTF8 like scheme where 0x80 and above indicate more bytes for the hash number. Although that would be backward incompatible as it would move into the size.

Or change the scheme somehow if it's not set in stone at this stage (although I see there are heaps of implementations).

Another approach would be a 0xFF code that indicates the hash is really wrapping another multihash, but with a different lookup table hashes, that would still specify the size to keep compatibility, but that would mean the size would get specified twice (once for the whole wrapped multihash, and again for the actual digest size). Maybe specify the whole 0xF0 and above.

Otherwise just limit to common ones.

https://en.wikipedia.org/wiki/List_of_hash_functions

@Kubuxu
Copy link
Member

Kubuxu commented Aug 15, 2016

@dcbishop We are already standardising varint encoding for multiformats multiformats/unsigned-varint#1

@jbenet I referenced the max of a hash because of discussion on other issue about what should happen if mulithash is/is requested to be longer than output of hash function.

@jbenet
Copy link
Member

jbenet commented Aug 15, 2016

A single byte isn't enough for them.

See #40 (merged). Intention was to do a varint eventually, and now's the time.

@jbenet
Copy link
Member

jbenet commented Aug 15, 2016

I referenced the max of a hash because of discussion on other issue about what should happen if mulithash is/is requested to be longer than output of hash function.

Hmmm. Right. This complicates implementations but we should evaluate it, i guess.

@dcbishop
Copy link
Author

@jbenet @Kubuxu

I've made a wiki page to get all the hashes in one place with some metadata on them.

@jbenet
Copy link
Member

jbenet commented Aug 15, 2016

@dcbishop can you put all that into pages in the repo itself, as markdown? i'm very negative on github wikis because they're much harder to keep up to date, there's little to no review, etc.

@jbenet
Copy link
Member

jbenet commented Aug 26, 2016

Thank you! That looks great!!

But please no wiki-- wiki does not notify on edits and does not lend itself
to review. Can we keep this in a markdown doc in the repo pls?

If you want collab to make sure things merge faster, we can do that. I know
I haven't been able to keep on top of all the PRs timelyly
On Mon, Aug 15, 2016 at 10:09 David C. Bishop notifications@github.com
wrote:

@jbenet https://github.com/jbenet @Kubuxu https://github.com/Kubuxu

I've made a wiki page
https://github.com/multiformats/multihash/wiki/Hashes to get all the
hashes in one place with some metadata on them.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#44 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIcoeEOX7OofhfexG-x3i4ew_TJrJ5Sks5qgHMcgaJpZM4JkSq4
.

@@ -10,3 +10,14 @@ code, name
0x19, shake-256
0x40, blake2b
0x41, blake2s
0x55, md5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

md5 is now 0xd5 as of #76

@KrishnaPG
Copy link

Came to this while looking for the multihash support for CRC32. Seems this PR adds it.

Can this be merged please?

Many hardware vendors, including FIPS, rely on CRC32 and standard way of representing them help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants