README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179

## What is JP-Hash?

JP-Hash is an algorithm which converts any piece of text or other datum into a
textual digest, which has the following properties:

* length between 8 and 21 characters.

* consists mostly of lower-case letters.

* includes one digit.

* includes one non-alphanumeric character from the set
  `!`, `#`, `@`, `$`, `%`, `^`, `&`, `*`, `?` and `/`.

By amazing coincidence, these requirements are very similar to
common requirements imposed on people are creating or
changing a password.

Additionally:

* The digest string is based on combinations of vowels from the Japanese
  language, written in romanized form. This means that many of the digests are
  memorable and pronounceable, and have a vibe to them that is pleasing to
  enthusiasts for things Japanese.

## How do I get it?

See the reference implementation source files. Code is given in
TXR Lisp, C and Javascript for the browser as well as Node.js.

The self-contained <code>jp-hash.html</code> file should load in any browser,
providing a simple UI.

## What are the details of the algorithm?

1.  First, the input is hashed via the standard SHA256 sum.

2.  Next, the first 18 bytes of the digest are interpreted as an array of 9
    (nine) 16-bit words, little endian.  This array is referred to as `word[0]`
    through `word[8]`.

3.  Six pseudo-Japanese syllables are derived from `word[0]` through `word[5]`
    as follows: each of these word values is reduced to the remainder modulo 97.
    Then, the remainder is used as an index into the following array of 97
    strings.  The first letter of the first syllable is then capitalized.
    These syllables are here referred to as `sy[0]` through `sy[5]`.

        ["a", "i", "u", "e", "o", "ya", "yu", "yo", "wa",
         "ka", "ki", "ku", "ke", "ko", "ga", "gi", "gu", "ge", "go",
         "sa", "shi", "su", "se", "so", "za", "ji", "zu", "ze", "zo",
         "ta", "chi", "tsu", "te", "to", "da", "de", "do",
         "na", "ni", "nu", "ne", "no", "ha", "hi", "fu", "he", "ho",
         "pa", "pi", "pu", "pe", "po", "ba", "bi", "bu", "be", "bo",
         "ma", "mi", "mu", "me", "mo", "ra", "ri", "ru", "re", "ro",
         "kya", "kyu", "kyo", "gya", "gyu", "gyo", "sha", "shu", "sho",
         "ja", "ju", "jo", "cha", "chu", "cho", "nya", "nyu", "nyo",
         "hya", "hyu", "hyo", "pya", "pyu", "pyo", "bya", "byu", "byo",
         "mya", "myu", "myo", "rya", "ryu", "ryo"]

4.  A digit `dig` is chosen using the modulo 10 remainder of `word[6]` as an
    index into the digits `0` through `9`.

5.  Similarly, a symbol `sym` is chosen using the modulo 10 remainder of
    `word[7]` as an index into the aforementioned list `!`, `#`, `@`, `$`, `%`,
    `^`, `&`, `*`, `?` and `/`.

6.  The modulo 8 value of `word[8]` is used to select eight cases (0 to 7) for
    combining the above values into an output string.  The last four of these
    cases insert the `n` (letter n) character into certain places of the string.
    The eight cases follow: each case give a list of strings which are
    catenated in that order, with no intervening spaces or other separator
    characters:

    **Case 0**: `s[0] s[1] s[2] sym s[3] s[4] s[5] dig`

    **Case 1**: `sym s[0] s[1] s[2] dig s[3] s[4] s[5]`

    **Case 2**: `s[0] s[1] sym s[2] s[3] dig s[4] s[5]`

    **Case 3**: `s[0] s[1] dig s[2] s[3] sym s[4] s[5]`

    **Case 4**: `s[0] s[1] s[2] "n" sym s[3] s[4] s[5] dig`

    **Case 5**: `sym s[0] s[1] s[2] dig s[3] s[4] s[5] "n"`

    **Case 6**: `s[0] s[1] "n" sym s[2] s[3] dig s[4] s[5]`

    **Case 7**: `s[0] s[1] dig s[2] s[3] sym s[4] s[5] "n"`


## How many JP-Hash digests are there?

Since there are six syllables chosen from a set of 97, plus two characters each
from a set of ten, the initial steps yield a space of 83,297,200,492,900 (83.3
(American) trillion). The 8 cases in step (6) all yield distinct results, and
so multiply the space eight-fold to 666,377,603,943,200 possibilities (666.4
trillion).

This is about the size of the space of strings consisting of all combinations
of 10 lower-case English letters, plus one more character chosen from a set of five.

It's also similar to the size of the space of all strings of 6 printable ASCII
characters followed by a digit.

It is also about the number of combinations expressed by a 49 bit integer.
A random string in these space has about that many bits of entropy.

## Are JP-Hash digests secure for password use?

JP-Hash is not being promoted as being fit for any specific purpose. In a
security setting, each user must perform their own analysis to understand the
security risks of using any tool in certain ways and with certain kinds of
inputs, in relation to the value being protected. The user assumes all risk.

The following cautionary remarks are provided, with the understanding
that they do not constitute a complete, discussion:

* If a JP-Hash is being used as a password, the most prudent assumption is that
  any attacker knows this, and is specifically attacking the space of possible
  JP-Hashes (which, at 49 bits of entropy, is not very large).
  To assume that the attacker doesn't know about JP-Hash is "security through
  obscurity".

* If the attacker knows that JP-Hash is being used as a password,
  which must be assumed, then weak passwords are vulnerable, in spite
  of generating "strong-looking" JP-Hash strings.
  Example: the JP-Hash `Kera%bage9kerya` appears to be of similar complexity to
  `Jasho1mogo?sase`.  However, the former is the hash of the text `letmein`,
  whereas the latter is the hash of `stark-theory-azimuth-goblet-13$17`.  An
  attacker who knows that the passwords are JP-Hashes can crack the
  `Kera%bage9kerya` password by using a file of JP-Hashes of weak passwords
  which will likely contain an entry for `letmein`, or, failing that, by a
  brute force search up to the space of lower-case strings up to seven
  characters long.

* A JP-Hash used as as password must be also be regarded as an ordinary
  password from the perspective of attacks which are oblivious to the
  existence of JP-Hash.  JP-Hashes are of variable length and may be as short
  as eight characters. For instance `ai9ue/ou` is a possible JP-Hash which
  looks like a short password compared to than `kyobyun9jakyu/choko`, and will
  succumb to a brute-force search of the eight-character space.

* Converting, to a JP-Hash, a password phrase which has significantly more that
  49 bits of entropy constitutes a degradation of security independently of all
  other considerations.

## Are JP-Hash digests secure message digests?

* JP-Hash obviously contains too few bits to be suitable as a message
  digest for security purposes. It's possible that it may be used as
  an integrity checksum, perhaps comparable to a CRC48. However, it is produced
  by a slow, wasteful calculation whose result has undesirable properties like
  variable length.

## Example Hashes

These examples come from the `testvec` file.

```
a --> Mina4gai@gashan
y --> Shaba%megyu2shize
Mike --> !Tosuda2bukyochon
Romeo --> Potsun&gaso5machi
Sierra --> Nodon&yanu6zuchi
Tango --> Gyoda#hosa6segi
Whiskey --> Muji?pyuna6gyage
sashimi --> Izu0gyubya/gyumyu
ramen --> Byumi$betsu0nyohe
soba --> Arushin^hyapyuryu2
futon --> Kyoriton#kyaseku1
```

## License

The JP-Hash reference code is offered under the a one-clause variant of the BSD
license. See the copyright headers in the source files.

If you publish altered versions of this algorithm, please don't call it
JP-Hash, thanks! If it doesn't pass the `testvec`, it isn't JP-Hash.