jp-hash - Hashing function based on Japanese Phonetics

What is JP-Hash?

JP-Hash is an algorithm which converts any piece of text or other datum into a textual digest, which has the following properties:

length between 8 and 21 characters.
consists mostly of lower-case letters.
includes one digit.
includes one non-alphanumeric character from the set !, #, @, $, %, ^, &, *, ? and /.

By amazing coincidence, these requirements are very similar to common requirements imposed on people are creating or changing a password.

Additionally:

The digest string is based on combinations of vowels from the Japanese language, written in romanized form. This means that many of the digests are memorable and pronounceable, and have a vibe to them that is pleasing to enthusiasts for things Japanese.

How do I get it?

Firefox users can use JP-Hash via a convenient add-on. You get a toolbar button (and Alt-J shortcut) which replaces the content of an input field, or its current selection, with its JP-Hash. The shortcut key for that is Alt-J.

See the source tree for reference implementation source files. Code is given in TXR Lisp, C and Javascript for the browser as well as Node.js.

The self-contained jp-hash.html file should load in any browser, providing a simple UI.

The Firefox add-on source is in the firefox subdirectory. This may be loaded as a temporary add-on. To install it that way, make a local replica of that firefox directory on the system where your browser is running. Then open up about:debugging, click on "This Firefox" in the left pane, and click the "Load Temporary Add-On ..." button in the right pane. Select the manifest.json file in the firefox directory.

What are the details of the algorithm?

First, the input is hashed via the standard SHA256 sum.
Next, the first 18 bytes of the digest are interpreted as an array of 9 (nine) 16-bit words, little endian. This array is referred to as word[0] through word[8].

Six pseudo-Japanese syllables are derived from word[0] through word[5] as follows: each of these word values is reduced to the remainder modulo 97. Then, the remainder is used as an index into the following array of 97 strings. The first letter of the first syllable is then capitalized. These syllables are here referred to as sy[0] through sy[5].

["a", "i", "u", "e", "o", "ya", "yu", "yo", "wa",
 "ka", "ki", "ku", "ke", "ko", "ga", "gi", "gu", "ge", "go",
 "sa", "shi", "su", "se", "so", "za", "ji", "zu", "ze", "zo",
 "ta", "chi", "tsu", "te", "to", "da", "de", "do",
 "na", "ni", "nu", "ne", "no", "ha", "hi", "fu", "he", "ho",
 "pa", "pi", "pu", "pe", "po", "ba", "bi", "bu", "be", "bo",
 "ma", "mi", "mu", "me", "mo", "ra", "ri", "ru", "re", "ro",
 "kya", "kyu", "kyo", "gya", "gyu", "gyo", "sha", "shu", "sho",
 "ja", "ju", "jo", "cha", "chu", "cho", "nya", "nyu", "nyo",
 "hya", "hyu", "hyo", "pya", "pyu", "pyo", "bya", "byu", "byo",
 "mya", "myu", "myo", "rya", "ryu", "ryo"]

A digit dig is chosen using the modulo 10 remainder of word[6] as an index into the digits 0 through 9.
Similarly, a symbol sym is chosen using the modulo 10 remainder of word[7] as an index into the aforementioned list !, #, @, $, %, ^, &, *, ? and /.
The modulo 8 value of word[8] is used to select eight cases (0 to 7) for combining the above values into an output string. The last four of these cases insert the n (letter n) character into certain places of the string. The eight cases follow: each case give a list of strings which are catenated in that order, with no intervening spaces or other separator characters:

Case 0: s[0] s[1] s[2] sym s[3] s[4] s[5] dig

Case 1: sym s[0] s[1] s[2] dig s[3] s[4] s[5]

Case 2: s[0] s[1] sym s[2] s[3] dig s[4] s[5]

Case 3: s[0] s[1] dig s[2] s[3] sym s[4] s[5]

Case 4: s[0] s[1] s[2] "n" sym s[3] s[4] s[5] dig

Case 5: sym s[0] s[1] s[2] dig s[3] s[4] s[5] "n"

Case 6: s[0] s[1] "n" sym s[2] s[3] dig s[4] s[5]

Case 7: s[0] s[1] dig s[2] s[3] sym s[4] s[5] "n"

How many JP-Hash digests are there?

Since there are six syllables chosen from a set of 97, plus two characters each from a set of ten, the initial steps yield a space of 83,297,200,492,900 (83.3 (American) trillion). The 8 cases in step (6) all yield distinct results, and so multiply the space eight-fold to 666,377,603,943,200 possibilities (666.4 trillion).

This is about the size of the space of strings consisting of all combinations of 10 lower-case English letters, plus one more character chosen from a set of five.

It's also similar to the size of the space of all strings of 6 printable ASCII characters followed by a digit.

It is also about the number of combinations expressed by a 49 bit integer. A random string in these space has about that many bits of entropy.

Are JP-Hash digests secure for password use?

JP-Hash is not being promoted as being fit for any specific purpose. In a security setting, each user must perform their own analysis to understand the security risks of using any tool in certain ways and with certain kinds of inputs, in relation to the value being protected. The user assumes all risk.

The following cautionary remarks are provided, with the understanding that they do not constitute a complete, discussion:

If a JP-Hash is being used as a password, the most prudent assumption is that any attacker knows this, and is specifically attacking the space of possible JP-Hashes (which, at 49 bits of entropy, is not very large). To assume that the attacker doesn't know about JP-Hash is "security through obscurity".
If the attacker knows that JP-Hash is being used as a password, which must be assumed, then weak passwords are vulnerable, in spite of generating "strong-looking" JP-Hash strings. Example: the JP-Hash Kera%bage9kerya appears to be of similar complexity to Jasho1mogo?sase. However, the former is the hash of the text letmein, whereas the latter is the hash of stark-theory-azimuth-goblet-13$17. An attacker who knows that the passwords are JP-Hashes can crack the Kera%bage9kerya password by using a file of JP-Hashes of weak passwords which will likely contain an entry for letmein, or, failing that, by a brute force search up to the space of lower-case strings up to seven characters long.
A JP-Hash used as as password must be also be regarded as an ordinary password from the perspective of attacks which are oblivious to the existence of JP-Hash. JP-Hashes are of variable length and may be as short as eight characters. For instance ai9ue/ou is a possible JP-Hash which looks like a short password compared to than kyobyun9jakyu/choko, and will succumb to a brute-force search of the eight-character space.
Converting, to a JP-Hash, a password phrase which has significantly more that 49 bits of entropy constitutes a degradation of security independently of all other considerations.

Are JP-Hash digests secure message digests?

JP-Hash obviously contains too few bits to be suitable as a message digest for security purposes. It's possible that it may be used as an integrity checksum, perhaps comparable to a CRC48. However, it is produced by a slow, wasteful calculation whose result has undesirable properties like variable length.

Example Hashes

These examples come from the testvec file.

a --> Mina4gai@gashan
y --> Shaba%megyu2shize
Mike --> !Tosuda2bukyochon
Romeo --> Potsun&gaso5machi
Sierra --> Nodon&yanu6zuchi
Tango --> Gyoda#hosa6segi
Whiskey --> Muji?pyuna6gyage
sashimi --> Izu0gyubya/gyumyu
ramen --> Byumi$betsu0nyohe
soba --> Arushin^hyapyuryu2
futon --> Kyoriton#kyaseku1

Other Implementations

Klaus Alexander Seiﬆrup has written a Python implementation. This requires Python 3.10+.

License

The JP-Hash reference code is offered under the a one-clause variant of the BSD license. See the copyright headers in the source files.

If you publish altered versions of this algorithm, please don't call it JP-Hash, thanks! If it doesn't pass the testvec, it isn't JP-Hash.