## What is JP-Hash?

JP-Hash is an algorithm which converts any piece of text or other datum into a
textual digest, which has the following properties:

* length between 8 and 21 characters.

* consists mostly of lower-case letters.

* includes one digit.

* includes one non-alphanumeric character from the set
  `!`, `#`, `@`, `$`, `%`, `^`, `&`, `*`, `?` and `/`.

By amazing coincidence, these requirements are very similar to
common requirements imposed on people are creating or
changing a password.

Additionally:

* The digest string is based on combinations of vowels from the Japanese
  language, written in romanized form. This means that many of the digests are
  memorable and pronounceable, and have a vibe to them that is pleasing to
  enthusiasts for things Japanese.

## How do I get it?

Firefox users can use JP-Hash via a convenient
[add-on](https://addons.mozilla.org/firefox/addon/jp-hash/).
You get a toolbar button (and Alt-J shortcut) which replaces the content of an
input field, or its current selection, with its JP-Hash. The shortcut key for
that is Alt-J.

See the [source tree](../tree) for reference implementation source files. Code
is given in TXR Lisp, C and Javascript for the browser as well as Node.js.

The self-contained [`jp-hash.html`](../tree/jp-hash.html) file should load in
any browser, providing a simple UI.

The Firefox add-on source is in the [`firefox`](../tree/firefox) subdirectory.
This may be loaded as a temporary add-on.  To install it that way, make a local
replica of that `firefox` directory on the system where your browser is
running. Then open up `about:debugging`, click on "This Firefox" in the left
pane, and click the "Load Temporary Add-On ..." button in the right pane.
Select the `manifest.json` file in the `firefox` directory.

## What are the details of the algorithm?

1.  First, the input is hashed via the standard SHA256 sum.

2.  Next, the first 18 bytes of the digest are interpreted as an array of 9
    (nine) 16-bit words, little endian.  This array is referred to as `word[0]`
    through `word[8]`.

3.  Six pseudo-Japanese syllables are derived from `word[0]` through `word[5]`
    as follows: each of these word values is reduced to the remainder modulo 97.
    Then, the remainder is used as an index into the following array of 97
    strings.  The first letter of the first syllable is then capitalized.
    These syllables are here referred to as `sy[0]` through `sy[5]`.

        ["a", "i", "u", "e", "o", "ya", "yu", "yo", "wa",
         "ka", "ki", "ku", "ke", "ko", "ga", "gi", "gu", "ge", "go",
         "sa", "shi", "su", "se", "so", "za", "ji", "zu", "ze", "zo",
         "ta", "chi", "tsu", "te", "to", "da", "de", "do",
         "na", "ni", "nu", "ne", "no", "ha", "hi", "fu", "he", "ho",
         "pa", "pi", "pu", "pe", "po", "ba", "bi", "bu", "be", "bo",
         "ma", "mi", "mu", "me", "mo", "ra", "ri", "ru", "re", "ro",
         "kya", "kyu", "kyo", "gya", "gyu", "gyo", "sha", "shu", "sho",
         "ja", "ju", "jo", "cha", "chu", "cho", "nya", "nyu", "nyo",
         "hya", "hyu", "hyo", "pya", "pyu", "pyo", "bya", "byu", "byo",
         "mya", "myu", "myo", "rya", "ryu", "ryo"]

4.  A digit `dig` is chosen using the modulo 10 remainder of `word[6]` as an
    index into the digits `0` through `9`.

5.  Similarly, a symbol `sym` is chosen using the modulo 10 remainder of
    `word[7]` as an index into the aforementioned list `!`, `#`, `@`, `$`, `%`,
    `^`, `&`, `*`, `?` and `/`.

6.  The modulo 8 value of `word[8]` is used to select eight cases (0 to 7) for
    combining the above values into an output string.  The last four of these
    cases insert the `n` (letter n) character into certain places of the string.
    The eight cases follow: each case give a list of strings which are
    catenated in that order, with no intervening spaces or other separator
    characters:

    **Case 0**: `s[0] s[1] s[2] sym s[3] s[4] s[5] dig`

    **Case 1**: `sym s[0] s[1] s[2] dig s[3] s[4] s[5]`

    **Case 2**: `s[0] s[1] sym s[2] s[3] dig s[4] s[5]`

    **Case 3**: `s[0] s[1] dig s[2] s[3] sym s[4] s[5]`

    **Case 4**: `s[0] s[1] s[2] "n" sym s[3] s[4] s[5] dig`

    **Case 5**: `sym s[0] s[1] s[2] dig s[3] s[4] s[5] "n"`

    **Case 6**: `s[0] s[1] "n" sym s[2] s[3] dig s[4] s[5]`

    **Case 7**: `s[0] s[1] dig s[2] s[3] sym s[4] s[5] "n"`


## How many JP-Hash digests are there?

Since there are six syllables chosen from a set of 97, plus two characters each
from a set of ten, the initial steps yield a space of 83,297,200,492,900 (83.3
(American) trillion). The 8 cases in step (6) all yield distinct results, and
so multiply the space eight-fold to 666,377,603,943,200 possibilities (666.4
trillion).

This is about the size of the space of strings consisting of all combinations
of 10 lower-case English letters, plus one more character chosen from a set of five.

It's also similar to the size of the space of all strings of 6 printable ASCII
characters followed by a digit.

It is also about the number of combinations expressed by a 49 bit integer.
A random string in these space has about that many bits of entropy.

## Are JP-Hash digests secure for password use?

JP-Hash is not being promoted as being fit for any specific purpose. In a
security setting, each user must perform their own analysis to understand the
security risks of using any tool in certain ways and with certain kinds of
inputs, in relation to the value being protected. The user assumes all risk.

The following cautionary remarks are provided, with the understanding
that they do not constitute a complete, discussion:

* If a JP-Hash is being used as a password, the most prudent assumption is that
  any attacker knows this, and is specifically attacking the space of possible
  JP-Hashes (which, at 49 bits of entropy, is not very large).
  To assume that the attacker doesn't know about JP-Hash is "security through
  obscurity".

* If the attacker knows that JP-Hash is being used as a password,
  which must be assumed, then weak passwords are vulnerable, in spite
  of generating "strong-looking" JP-Hash strings.
  Example: the JP-Hash `Kera%bage9kerya` appears to be of similar complexity to
  `Jasho1mogo?sase`.  However, the former is the hash of the text `letmein`,
  whereas the latter is the hash of `stark-theory-azimuth-goblet-13$17`.  An
  attacker who knows that the passwords are JP-Hashes can crack the
  `Kera%bage9kerya` password by using a file of JP-Hashes of weak passwords
  which will likely contain an entry for `letmein`, or, failing that, by a
  brute force search up to the space of lower-case strings up to seven
  characters long.

* A JP-Hash used as as password must be also be regarded as an ordinary
  password from the perspective of attacks which are oblivious to the
  existence of JP-Hash.  JP-Hashes are of variable length and may be as short
  as eight characters. For instance `ai9ue/ou` is a possible JP-Hash which
  looks like a short password compared to than `kyobyun9jakyu/choko`, and will
  succumb to a brute-force search of the eight-character space.

* Converting, to a JP-Hash, a password phrase which has significantly more that
  49 bits of entropy constitutes a degradation of security independently of all
  other considerations.

## Are JP-Hash digests secure message digests?

* JP-Hash obviously contains too few bits to be suitable as a message
  digest for security purposes. It's possible that it may be used as
  an integrity checksum, perhaps comparable to a CRC48. However, it is produced
  by a slow, wasteful calculation whose result has undesirable properties like
  variable length.

## Example Hashes

These examples come from the `testvec` file.

```
a --> Mina4gai@gashan
y --> Shaba%megyu2shize
Mike --> !Tosuda2bukyochon
Romeo --> Potsun&gaso5machi
Sierra --> Nodon&yanu6zuchi
Tango --> Gyoda#hosa6segi
Whiskey --> Muji?pyuna6gyage
sashimi --> Izu0gyubya/gyumyu
ramen --> Byumi$betsu0nyohe
soba --> Arushin^hyapyuryu2
futon --> Kyoriton#kyaseku1
```

## Other Implementations

Klaus Alexander Seiﬆrup has written a
[Python implementation](https://codeberg.org/kas/jphash).
This requires Python 3.10+.

## License

The JP-Hash reference code is offered under the a one-clause variant of the BSD
license. See the copyright headers in the source files.

If you publish altered versions of this algorithm, please don't call it
JP-Hash, thanks! If it doesn't pass the `testvec`, it isn't JP-Hash.