BIP39 Add German Wordlist #1071

SebastianFloKa · 2021-02-22T10:52:48Z

The BIP-0039 German Wordlist is based on spelling rules defined in the “German Duden” and checked along different aspects of quality by native speakers. Words were selected manually and also checked manually to ensure words are sufficiently common and positive. Tools were used to ensure sufficient levenshtein distance between words, prevent conflict with other BIP-0039 wordlists as well as to eliminate homophones inside the wordlist.

There was a first attempt (#721) and a second attempt (#942) for a BIP-0039 German Wordlist. This third attempt intents to combine the requirements from both, the Bitcoin Community within the Geman-speaking area as well as must-have requirements for BIP-0039 Wordlists such as levenshtein distance and no homophones.

Special considerations:

Words can be uniquely determined typing the first 4 characters.
Words contain between 3 to 8 letters per word
No words with 1 letter of difference (no levenshtein distance substitution, addition or permutation lower than 2)
No words already used in other official BIP-0039-Wordlists
No accents or special characters. No Ä, Ö, Ü, ß
All-Caps in order to address nouns not written in lowercase in German and keep number of characters to 26 (A-Z) only.
Orthography based on German spelling reform of 2006 and based on the German Duden 2021
Only singular nouns and plural tantum nouns (if no singular exists).
If a homophone for a word exists, only one of these words is allowed in the wordlist under condition that using grammatical gender ensures unambiguous spelling.
No offensive words and no words implying negative, sad or bad feelings.

SebastianFloKa · 2021-02-22T11:47:07Z

Thanks @DavidMStraub for starting with the first attempt and @cr for the second attempt regarding a BIP-0039 German Wordlist. Hope you will join this PR which main difference is the implementation of levenshtein distance (addition, substitution & permutation not lower than 2).

Supplementary to the basic requirements some more considerations:

This proposal follows @DavidMStraub requirement of nominative nouns. On top countries, cities, persons, names etc. were excluded.
@thomasklemm requested to change to more commonly used words, this should be the case now.
@cr requested to avoid collision with other released BIP-0039-Wordlists which is taken into consideration.
In order to bring in cultural specialty to the BIP-0039 the proposal is written in all-caps. Writing nouns in lower-case-letters is conflicting with common sense of German language. Studies also show that the readability of handwritten Text in all-caps is significantly better, so this lowers the risk of losing money. A positive side-effect is that the number of used characters reduces from 52 to 26. This is an advantage not only for self-filled cold wallets.
Going the extra mile even the levenshtein distance “addition” was reduced to a value lower than 3 for the beginning of a word by exluding words with a related meaning (example Lanze & Pflanze, Sekt & Insekt, etc. are in the list - Mut & Unmut not).
@rodasmith made some requirements for avoiding homophones. The current list even went beyond by excluding words completely from the list if a homophone exists as a noun with same genus (Miene&Mine, Verse&Ferse, Hund&Hunt, Graph&Graf, etc.) Basis: https://de.wiktionary.org/wiki/Verzeichnis:Deutsch/Homophone

thomasklemm

Great work @SebastianFloKa, thanks for opening this PR with an alternative list. I know that a lot of work has gone into it already from #721, and again thanks to everyone who participated in the previous two attempts for a German wordlist. Hope some of you native German speakers could go through the list here too and leave some comments!

Reviewed the wordlist until line 1000 so far, going through the rest later.

bip-0039/german.txt

rodasmith · 2021-02-22T18:29:19Z

ACK. This list does not include any homophones. LGTM

thomasklemm

Looked through the rest of the list, very good work IMO 👍 Just some minor notes on some words.

bip-0039/german.txt

thomasklemm · 2021-02-22T21:15:49Z

Very well-prepared word selection @SebastianFloKa, LGTM 👍 Went through the entire word list word by word, left minor comments on a few words.

If you have the chance and especially if you're a native German speaker, please jump in for a review too.

SebastianFloKa · 2021-03-02T21:39:37Z

Checking with https://www.korrekturen.de/rechtschreibpruefung.shtml following words (beside already mentioned ones: "Gumpe", "Tidehub", "Trebe" & "Zuseher") are marked eventhough they are all properly listed in the https://www.duden.de/. Beside other reasons this seems partly be related to words more common in Austria or Switzerland. I personally think it's good to have some words from different parts of German language region as long as they are understood everywhere - open to discuss.

Allrad
Bauchweh
Gemahl
Kapriole
Kassier
Kubik
Oktagon
Petersil
Vorkehr
Zuhause

@thomasklemm in particular and maybe @neox5 wants to have a look as well: Shall we replace all of above words or would you say we can / should keep some of them?

SebastianFloKa · 2021-03-02T22:03:43Z

In case we would replace all the words highlighted by @thomasklemm except for "Fresko" & "Tidenhub" as well as all the 10 words marked by the spellchecker (Allrad, .... , Zuhause) there are 31 words to be replaced in the next loop.
Due to working for 2 1/2 years on this project now with changing "special considerations" many words felt out of the list over time. Therefore a big "rerun" against levenshtein collision, other worklists, homophones etc. was made and a bunch of acceptable words was found. Here are 31 proposals:

BART
BEIN
BLECH
BUSCH
FUNKE
GELD
HALLO
HARZ
HECHT
HOLZ
KREUZ
KURS
LIEBE
LUST
MUSE
NATUR
PORTO
PROBE
PUMPE
PUNKT
RASEN
REIHE
REST
RIND
RITUS
RUHM
STROH
TALER
TREUE
WANNE
WUNDER

Due to the inter correlation it might be necessary to have some backup words:

AKKU
ALGE
BELAG
BUNDHOSE
DEMO
DOKU
ENDE
EURO
FANG
FIEBER
MATHE
NETTO
PORE
SEEHUND
SOLD
SPESEN
VIEH
WEITE
WOGE
VISIER

So if you prefer to replace some other words from the initial list with above backup words is fine as well.

TZocker · 2021-03-08T20:17:08Z

Vorschläge:

Amsel
Ostern
Fernweh
Simulant
Fern
Walnuss
Lorbeere
Misteln
Wichtel
Holz
Zunge
Zug
Mettigel
Maihock
Mai
Kraut
Wurst

DivineDominion · 2021-03-10T06:31:28Z

If replacements are needed, I'd like to suggest Drossel, so we have all of the well-known bird names of "Amsel, Drossel, Fink und Star". Would definitely prefer these over nautical vocabulary :)

nisc · 2021-03-10T19:22:48Z

Guys thank you for your service, but I can't hide that I'm mostly following this conversation because it reliably makes me giggle.

PS: After thinking it through, I would probably not include any of the Breze* words. There are too many regional variations, which will lead at least to confusion, but maybe even to emotion and anger ("why did they dare to include this inferior spelling in my seed phrase?").

SebastianFloKa · 2021-04-02T16:48:38Z

thanks @TZocker @nisc @DivineDominion for joining and your input. New proposal with implementations also with the initial ones of @thomasklemm will follow soon.

SebastianFloKa · 2021-04-04T19:05:54Z

Would definitely prefer these over nautical vocabulary
@DivineDominion Are there any specific words you'd like to see replaced? Not sure which one is meant with "nautical".

DivineDominion · 2021-04-06T08:12:37Z

@SebastianFloKa "Luv" and maybe "Tidehub", as pointed out by others in comments above

SebastianFloKa · 2021-04-06T16:17:33Z

@TZocker I checked your proposals against criterias:

Vorschläge:

Amsel --> NOK - Levensthein substitution collision with AMPEL
Ostern --> NOK - Indication by first 4 letters not ensured against "Osterei"
Fernweh --> OK - we have "Heimweh" already, but OK to go for Fernweh as well.
Simulant --> OK
Fern --> Not a noun
Walnuss --> OK
Lorbeere --> OK, typically used as plural, but OK.
Misteln --> NOK - plural not singular / singular with levenshtein collision
Wichtel --> NOK - Levenshtein substitution collision with Wachtel
Holz --> OK - already in proposal list above
Zunge --> NOK - levenshtein substituition collision with Junge
Zug --> NOK - levenshtein addition first 3 letters collision (Zugriff, Zugzwang, etc.)
Mettigel --> NOK - not listed in the German Duden
Maihock --> NOK - not listed in the German Duden
Mai --> NOK - levenshtein addition first 3 letters collision
Kraut --> NOK - levenshtein substitution collision with Kraft
Wurst --> NOK - levenshtein substitution collision with Durst

@DivineDominion
Beside "Amsel" (see "Ampel" above) also "Star" shows a levenshtein substitution collision ("Stau"). "Fink" already in the list, "Drossel" OK to add. Will remove "Luv" & "Tidehub" completely.

@nisc all "Breze*" words will be removed

Co-authored-by: Thomas Klemm <github@tklemm.eu>

@thomasklemm

Improvement loop mainly based on feedback of @thomasklemm but also @TZocker & @DivineDominion & @nisc

neox5 · 2021-04-06T16:59:55Z

Vorschläge:
Daumen
Nagel
Schrift
Orange
Triangel

If you could share your tools for checking, I would do the checks by myself! So you don't have to do all the work by yourself 😉

thomasklemm · 2021-04-08T09:18:34Z

@SebastianFloKa Thanks for incorporating all the feedback to the word list. IMO it's really good work, has had many iterations already and can get merged.

@SebastianFloKa You should see a "Resolve conversation" button next to each individual conversation and can close the ones that are now resolved (Only PR author and repo maintainers seems to see it according to https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations, so I can't mark my own comments as resolved).

To all German native speakers reading this PR: It would be really good if you can take the time to go through the list and leave your comments or a 👍 on the PR.

SebastianFloKa · 2021-04-08T13:47:58Z

@neox5 thanks

Vorschläge:

Daumen - NOK - Levenshtein-substitution collision with Gaumen
Nagel - NOK - Levenshtein-substitution collision with Nadel & first 4 letters same with Nagetier
Schrift - NOK - first 4 letters same with Schrank
Orange - NOK - colission with existing word in BIP0039 English Wordlist

Regarding the tool:

There are actually many separate tools and it would require significant redesign to make it useable for others (background-understanding of abbreviations). But there are ready to use solutions as far as I know shared in other wordlist conversations.
Attached a list of backupwords that probably fulfill the requirements in case you want to replace certain words - tbc in individual cases.
210408-BIP39-German-Wordlist_backupwords.txt
Will replace "Zunahme" (close to Zuname) with "Zufluss" & long word "Artistik" with shorter "Artikel"

"Zunahme" too close to "Zuname" and other improvements

b068931cc450442b63f5b3d276ea4297 · 2021-04-10T15:34:01Z

I had already finished a draft of it independently in December and unfortunately only now managed to publish it. However, it also contains words that appear in other word lists. Maybe the comparison helps anyway: https://github.com/dys2p/wordlists-de/blob/main/de_2048.md

b068931cc450442b63f5b3d276ea4297 · 2021-04-10T16:07:14Z

I just had a quick look at the list, you could shorten some words even more e.g.
GOLDADER -> GOLD
REISFELD -> REIS or REISE
REICHTUM -> REICH

SebastianFloKa · 2021-04-10T21:32:25Z

@b068931cc450442b63f5b3d276ea4297 Hi, thanks for participating.

unfortunately you are right, your list contains many collisions with other BIP0039 Wordlists (278 collisions in total). Beside this it contains

7 collisions with unambigousnes of first 4 letters (baum/baumarkt, dringend/drinnen, etc.),
some words are in the list and a homophone exists with same genus (Leib/Laib, Lied/Lid, etc.)
and main point is that there are many levenshtein errors. I haven't checked in detail but they show up quite obviously (bett/brett, bezug/ bezog, anzug/abzug, etc.)

I just had a quick look at the list, you could shorten some words even more e.g.

GOLDADER --> GOLD --> NOK - levenshtein substitution error with GELD
REISFELD --> REIS --> NOK - levenshtein addition error with PREIS
--> or REISE --> NOK - collision with our "extra mile requirement" to avoid levenshtein addition on the first letters if too similar regarding the meaning of a word (ABREISE)
REICHTUM -> REICH --> NOK - levenshtein substitution error with TEICH

Question: Do you have the ability to filter your list for singular nouns only and share it? Then a countercheck might make sense.

b068931cc450442b63f5b3d276ea4297 · 2021-04-11T17:14:06Z

@SebastianFloKa Thank you. In which form do we want to do this best and why do you actually only want nouns?

@b068931cc450442b63f5b3d276ea4297

Improvement loop related to input of @b068931cc450442b63f5b3d276ea4297: replacing uncommon / difficult words + reducing "homophone risky words" + reducing amount of words starting with AB*** Thanks for approving if OK or leaving comments if NOK. Also @thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith

SebastianFloKa · 2021-04-12T13:53:46Z

@SebastianFloKa Thank you. In which form do we want to do this best

See latest proposal - leave comments in case you disagree or approve if OK.

and why do you actually only want nouns?

There are advantages for brainwallets but these can be negleted as brainwallets aren't recommended except for special situations. But mainly it's the same reason why the effort regarding levenshtein distance is done: to reduce room for misinterpretation which might cause loss of money.

Somebody aware of this special consideration would very likely recognize if a wrong word that violates this structure (adjective, verb, etc.) was written down accidentially.
Somebody not aware of this special consideration might get cautious if a "non singular noun" occure whereas the other 23 words are singular nouns (even if few words in the wordlist exists as verb etc. as well). Not guaranteed that the error will be recognized, of course, but chances are increased.
Same for reconstructing certain words from a partly damaged wallet: It significantly reduces choices if you need to search for a singular noun only.

@thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith @b068931cc450442b63f5b3d276ea4297
Thumbs up (or approve changes) in case you agree to latest modifications in the list (or leave comments if not) and we could then go for a third-party-check concerning levenshtein distance etc.

b068931cc450442b63f5b3d276ea4297 · 2021-04-12T17:37:27Z

@SebastianFloKa I think that there should be another fundamental discussion about whether it makes sense to omit verbs and adjectives or not. The other word lists also work with adjectives and verbs and the omission only unnecessarily restricts the possible words.

Somebody aware of this special consideration would very likely recognize if a wrong word that violates this structure (adjective, verb, etc.) was written down accidentially.
Somebody not aware of this special consideration might get cautious if a "non singular noun" occure whereas the other 23 words are singular nouns (even if few words in the wordlist exists as verb etc. as well). Not guaranteed that the error will be recognized, of course, but chances are increased.

By making the words as familiar as possible and known to everyone, you probably also reduce the risk of people making mistakes. Words like kurz, lang, rot, blau, laufen, gehen, stehen are known to elementary school students while words (are only examples) like akazie, amnestie, anagramm, annexion and anode are far less known.

Same for reconstructing certain words from a partly damaged wallet: It significantly reduces choices if you need to search for a singular noun only.

This is not true, because the used words are always n of 2048 possible words of the list (if only this list was used). So it doesn't matter if there were only nouns or nouns, verbs and adjectives.

@thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith What do you guys say, should only nouns be used or also adjectives and verbs in their base form?

rodasmith · 2021-04-12T19:21:28Z

I'm satisfied with the outcome of the earlier conversation in #942 that concluded to use nouns only, avoiding confusion around capitalization. Here's an excerpt from that conversation:

One of the advantages of not having interflections would be for the "standard user" to reduce the risk of accidentially misinterpreting a word. Example: Your Seed starts with "FUCHS, GELB, LAUFEN" and then follows a "ANGLE" (1. pers. singular). Some people might either oversee this and assume the noun "ANGEL" is foressen or at least they are confused if there might be a typo (no matter if all caps or not). Another constellation could be when two more expected words are 1 levenshtein addition or subraction away. Example "FANGE" (1. pers. singular) and the expected "FANG" (noun) or "FANGEN" (verb) aside. This is the advantage I see behind reducing the amount of wordclasses (and limit to infinitives): it reduces the risk of misinterpretation when copying, writing down, communicating or reading a seed.

The decision seemed good then and I don't see any reason to revisit it.

b068931cc450442b63f5b3d276ea4297 · 2021-04-12T19:52:50Z

to use nouns only, avoiding confusion around capitalization.

Capitalization is not a problem because the words in the lists are always all lowercase (except @SebastianFloKa who writes everything in capital letters). I think this is also an important point that should be discussed again. I do not share the opinion of @SebastianFloKa:

I think there's a misunderstanding. I meant to say "all caps" instead of "capital letters", sorry for this. I think our intentions are very close together as I also don't want people to mix lower case and upper case - agree 100%. But writing nouns in lower case is quite uncommon for german speakers whereas filling out templates in upper case (all caps) is much more common (official documents etc.). From your example:
klage farbe anzahl initiative stieg banane seide holt gesagt ahnen
KLAGE FARBE ANZAHL INITIATIVE STIEG BANANE SEIDE HOLT GESAGT AHNEN

All previous lists are in lowercase and with nouns, verbs and adjectives.

TZocker · 2021-04-12T22:18:55Z

@SebastianFloKa ich bin der Meinung das es nicht als so entscheiden ist ob Verben etc. auch verwendet werden, wir sollten uns an die anderen Bips richten. Wir bekommen dadurch weitere Alternativen. Levenshtein Kollision wird vieles verhindern. Merksätze wären dann möglich....

Sry bei meinem Vorschlag habe das mit Levenshtein nicht verstanden. Sry....
Ebenso würde ich deiner Bemerkung mit Lorbeere folgen, dort die Mehrzahl zu verwenden.

Würde eher darauf wert legen den Sprachschatz auf das Niveau von einem 12 Jährigen zu reduzieren.
Und eingedeutschte Wörter wie trainer/viper etc. vermeiden, um mit anderen Bips nicht in Konflikt zukommen.
Ebenso die Einheimischen Tiere bevorzugen. Genauso die alten Begrifflichkeiten reduzieren.

Wörter wie Zwinger und Ritze sollten evtl. noch ersetzt werden (Vieldeutigkeit).

MFG

SebastianFloKa · 2021-04-19T21:28:05Z

This is not true, because the used words are always n of 2048 possible words of the list (if only this list was used). So it doesn't matter if there were only nouns or nouns, verbs and adjectives.

Somebody reconstructing a partially destroyed wallet will appreciate less choices of word categories once it comes to guessing hard to read words (e.g. from housefire etc.). Not a superimportant advantage, true, but mentioned anyway.

wir sollten uns an die anderen Bips richten. [we should focus on other Bips]

Well, simply doing the same thing would mean an enormous amount of levenshtein errors (English wordlist) or unintended 9 letters per word (Italian wordlist) etc., so you probably mean to focus on the positive progress of other wordlists.
It was the need of the people that the authors of the BIP39 mnemonic seed had in mind when designing this solution rather than accepting current realities of that time. I’m therefore convinced (but in the end it’s their decision) that the authors value the peoples cultural background in orthography (like writing nouns in “capital latin letters”) more than simply following given structures from other language lists that accidentally doesn’t have such a background (Latin languages & English use “lowercase latin characters” for all). Asian wordlists for example deviate from “lowercase latin characters” for exact that reason.

Q: Is there any advantage for people (people, not for the IT behind that can handle capital letters) in the German language area to write words in the more uncommon “all lowercase” that we might haven’t taken into consideration yet?

Q: Do you require to have adjectives and/or verbs in the list or is it because we might not find sufficient easy nouns?
I’m asking because I’m generally open to adjectives & verbs (were included in first two proposals 2 years ago), just had the impression it has quite some advantages for the community / users to go for nouns only. And particularly when taking levenshtein into consideration many verbs fall apart anyway. Example: “leben”: kleben, loben, heben, weben, geben, beben, Leber, Segen, etc.

About certain words:
Foreign words:
@TZocker Picking out all foreign words is almost impossible, even Onkel is actually a foreign word. Words sounding too “foreignish” were already eliminated.
Trainer: is a borderline word, I would have said this is still OK as it is mentioned in the Duden as very common. But TBD.
Viper: Is mentioned in the Duden as “mittelhochdeutsch”, pronounced German (differently pronounced in English) and with latin background long ago. For me OK – TBD
Zwinger / Ritze: Not aware of inappropriate meaning, particularly Zwinger, but OK to discuss/change.

Generally:
Q: Isn’t it acceptable if once in a while somebody would look up a word in case he/she doesn’t remember the exact meaning or definition of a word and if he/she is really interested in? Even including verbs & adjectives it's impossible to ensure that really everybody will be aware of the exact meaning of every single word of the list.
Quality of words is subjectively driven topic – Example: I would have said Akazie is less risky to spell incorrectly compared to e.g. Pyjama from your list. But generally it's true, your list consits less uncommon words. Therefore the proposal was and is to highlight our “no-go-words” and try to replace them. Thanks to your @b068931cc450442b63f5b3d276ea4297 's supplementary words we even have some more backup words to work with. I will update the backup-words-list soon. Actually 3 of your mentioned words are part of the top ten worst words in the current proposal, based on my subjective perspective, as well.
210419-10 worst words in german wordlist.txt
But if there are too many more, yes, we might have to think about adjectives & verbs, yes.

Proposal:
@b068931cc450442b63f5b3d276ea4297 Could you and other imagine that we keep going through the critical words step by step, filter the really inacceptable ones and try to replace them with better ones from your list + my backup list?

b068931cc450442b63f5b3d276ea4297 · 2021-04-20T19:23:10Z

Somebody reconstructing a partially destroyed wallet will appreciate less choices of word categories once it comes to guessing hard to read words (e.g. from housefire etc.). Not a superimportant advantage, true, but mentioned anyway.

No, the word category does not play a role but only the word list (incl. used characters and the length of the words).

Well, simply doing the same thing would mean an enormous amount of levenshtein errors (English wordlist) or unintended 9 letters per word (Italian wordlist) etc., so you probably mean to focus on the positive progress of other wordlists.
It was the need of the people that the authors of the BIP39 mnemonic seed had in mind when designing this solution rather than accepting current realities of that time. I’m therefore convinced (but in the end it’s their decision) that the authors value the peoples cultural background in orthography (like writing nouns in “capital latin letters”) more than simply following given structures from other language lists that accidentally doesn’t have such a background (Latin languages & English use “lowercase latin characters” for all). Asian wordlists for example deviate from “lowercase latin characters” for exact that reason.

Q: Is there any advantage for people (people, not for the IT behind that can handle capital letters) in the German language area to write words in the more uncommon “all lowercase” that we might haven’t taken into consideration yet?

According to my understanding, the advantage is that you don't have to worry about upper and lower case and therefore write everything in lower case in such lists. This is the case with the other bip39 lists, with diceware lists like https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases and many other projects.

Q: Do you require to have adjectives and/or verbs in the list or is it because we might not find sufficient easy nouns?
I’m asking because I’m generally open to adjectives & verbs (were included in first two proposals 2 years ago), just had the impression it has quite some advantages for the community / users to go for nouns only. And particularly when taking levenshtein into consideration many verbs fall apart anyway. Example: “leben”: kleben, loben, heben, weben, geben, beben, Leber, Segen, etc

Both, if we take them on the lists corresponds to the convetions of the word choice of the other languages and it increases the pool from which we can use words that most 12 year olds know.

Proposal:
We go through the list like this and comment behind it those we consider critical/inappropriate and add adjectives and advertisements. From this we then select the best, write everything in lower case and are done?

Do you use https://bip39validator.readthedocs.io/en/latest/running.html for the tests?

luke-jr · 2021-04-25T19:35:38Z

@slush0 @prusnak @voisine @ebfull

b068931cc450442b63f5b3d276ea4297 · 2021-04-25T19:50:42Z

Neither the list, nor the discussion about it is closed from my point of view. If this list get merged in this form, it would be a missed opportunity.

SebastianFloKa · 2021-04-25T20:45:17Z

thanks @luke-jr and other BIP39 authors + responsibles and "welcome"

@b068931cc450442b63f5b3d276ea4297 no worries, "proposed BIP modification" doesn't mean it's merged.

No, the word category does not play a role but only the word list (incl. used characters and the length of the words).

Have you ever tried to recapture a partially destroyed wallet (e.g. from fire) where e.g. the first letter of a word is illegible as well as some at the end or in the center. A normal user doesn't have a tool to filter for words with certain letters on certain positions. Means the user will have to guess possible words. So it's easier for him to search for a noun only instead of nouns, verbs and adjectives. It's not a must have or the most important feature, but a small advantage.

Do you use https://bip39validator.readthedocs.io/en/latest/running.html for the tests?

no, running my own - but this might be good to work with.

We go through the list like this and comment behind it those we consider critical/inappropriate and add adjectives and advertisements. From this we then select the best, write everything in lower case and are done?

What do you mean with advertisement? Generally OK to go through the list and select inappropriate words, of course. For lower case I'm personally not convinced yet, not sure about the others. It feels very strange for people from german language area to write nouns in lower case plus the other reasons (people write more legible in all caps etc.) - also will this later be part of the BIP39 authors decision as well. I'm fine to continue step by step (as we do since years now), just let me replace the 10 above mentioned words with other nouns first (need a bit of time) and then go through the list again.

b068931cc450442b63f5b3d276ea4297 · 2021-04-29T07:34:39Z

Thank you.

Have you ever tried to recapture a partially destroyed wallet (e.g. from fire) where e.g. the first letter of a word is illegible as well as some at the end or in the center. A normal user doesn't have a tool to filter for words with certain letters on certain positions. Means the user will have to guess possible words. So it's easier for him to search for a noun only instead of nouns, verbs and adjectives. It's not a must have or the most important feature, but a small advantage.

I haven't, but whether it's a noun, verb or adjective doesn't matter at all. Since it is 1 of 2048 that are in the list.

Sorry, I meant verbs and wanted to write an example with werben/Werbung (advertisement) first. With your list, I have already submitted as a pull request what I would remove and what I would add if necessary.

I am currently working on another list, which could help if we want to add verbs and adjectives.

For me it feels strange to see and write everything in capital letters. Even when we write normally, most of the letters used in any normal sentence are lowercase. The contract with the applications I also find a bit far-fetched, I think every person writes in letters, messengers and everywhere much more lowercase and finds it rather strange when someone with capslock writes everything in capital letters.

Eliminated the words with highest complexity and replaced with simpler ones.

SebastianFloKa · 2021-05-04T14:34:19Z

I haven't, but whether it's a noun, verb or adjective doesn't matter at all. Since it is 1 of 2048 that are in the list.

Of course is each word in the list 1of 2048, but in my example the "wordpool" for the user is not the list but all words. Let's have an example: A steelwallet went through housefire, some words are not completely readable anymore, e.g. at one word the second letter is readable as "L", the third is "A", the fourth is "T", the first letter and the ending is unknown (?LATT???). The user has two options: A) Go through the complete list line by line and check if the word might fit. Or the much more realistic scenario B) one will "guees" which word could be meant. In our case the noun "BLATT" might come to your mind and you will check in the wordlist directly under "B" if this is one possible solution. If also verbs & adjectives are included there are more choices to look up and will be more time consuming to figure out which one is intended: "glatt, platt, flattern, etc.".
Again: this is only a minor advantage in favor of "nouns only" supplementary to the other ones mentioned before (so this alone wouldn't justify to go for nouns-only).

The expectation of limiting complexity to a certain age (e.g. 12 year-old) sounds nice, but couldn't find a source for correlation between "age" and "words", means it will stay our subjective decision which words to accept.

Having few words being on a 16 year-old basis would statistically result in every once in a while a wallet created could include one or few words that would need to be looked up by the user (in case even is interested in). So far we said this disadvantage is worth all the advantages gained by nouns-only, it makes sense to go through history of this to get an understanding - but if the community disagrees and requests many words to be replaced and not only few I'm open that the list will be reworked accordingly, of course.

What's your positions on this? Or do you want a survey?

@thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith @b068931cc450442b63f5b3d276ea4297

b068931cc450442b63f5b3d276ea4297 · 2021-05-05T10:25:31Z

If I can still read "?LATT???" from the letters I open the list with the 2048 words, press Ctrl+F and enter "LATT". It really doesn't matter to which word category the word belongs. I don't have to go through line by line, and even if I do, it's easier than picking out a much larger number of nouns from the Duden, for example.

I see no advantages but many disadvantages in choosing a list of nouns only.

The 12 years was just an example. The simpler and more widespread the words are, the better. You can also look at "basic vocabulary" and "extended basic vocabulary", just like the linguistic levels A-B.

So far we said this disadvantage is worth all the advantages gained by nouns-only ...

No I think you are the only one who says/writes that.

nisc · 2021-05-06T07:10:13Z

If I can still read "?LATT???" from the letters I open the list with the 2048 words, press Ctrl+F and enter "LATT".

I think it's a tough call. Most people today wouldn't know how BIP39 works and that there's a pre-defined list of 2048 words, with each word in the 24-word mnemonic representing 11 bits of a 256+8 bit seed ("What is a Bit?").

Other people wouldn't realize that there's a pattern, i.e., that the seed only includes nouns.

I slightly prefer the nouns only version. I think more people see the only-nouns pattern than the 264 bits.

In the end it really doesn't matter too much, though. If people lose a lot of money, they'll seek help. Someone will be able to explain it to them.

luke-jr · 2021-07-02T21:33:20Z

For now, the author(s) of BIP 39 have decided not to accept any further word lists into BIP 39 itself, and encourage adding new ones to the WLIPs repo here: https://github.com/p2w34/wlips

peterhgruber · 2022-01-08T19:53:29Z

thanks for the effort. Two considerations

I strongly advise all lowercase. I understand that german nouns in lowercase might look unfamiliar, but uppercase has distinct and really bad disadvantages. First, words in all caps are much harder to read (as our mind reads more word contours than individual letters) and second from a practical point of view writing all caps e.g. on an iPhone is a hassle.
Is there really such a necessity for excluding words on wordlists in other languages? This leads to choosing "Fotograf" over "Foto" (I assume). If it were the case (as e.g. many wallet apps have no settings for the language), then one would need to be stricter, i.e. excluding all words that have an identical counterpart in any word list when only considering the first four letters (thus excluding the "Fotograf" as well.

joshuakraemer · 2022-10-29T15:20:05Z

Thanks as well! I, as a German, would much prefer all uppercase instead of all lowercase. All lowercase doesn't conform to the rules of orthography, and traditionally uppercase letters are used if only one case is allowed (e.g. in forms or crosswords). Word contours will be wrong with all lowercase, as nouns are normally written with a capital at the beginning. Anyway, in the case of this word list, correctly reading every single letter is probably more important than quickly reading whole words. Maybe all uppercase is even advantageous for this purpose.

PeterTheOne · 2023-10-30T11:07:38Z

Why exclude Umlaut and ß, they are part of the Language? They could of course be considered equal to their non Umlaut counterparts (äöü -> aou and ß -> ss) as is the case with. See other languages wordlists. It just seems like an arbitrary constraint.

SebastianFloKa added 2 commits February 22, 2021 09:50

BIP39 Add German Wordlist

c10c822

bip-0039 special considerations german wordlist

7d41aa6

This was referenced Feb 22, 2021

Add German word list for BIP0039 #721

Closed

Adding BIP-39 wordlist in German (2nd try) #942

Closed

thomasklemm reviewed Feb 22, 2021

View reviewed changes

SebastianFloKa and others added 2 commits April 6, 2021 18:30

Update bip-0039/german.txt

7541786

Co-authored-by: Thomas Klemm <github@tklemm.eu>

Update german.txt

c9b4386

Improvement loop mainly based on feedback of @thomasklemm but also @TZocker & @DivineDominion & @nisc

thomasklemm mentioned this pull request Apr 8, 2021

Allowing initiator of a conversation to resolve it isaacs/github#1952

Open

nisc approved these changes Apr 8, 2021

View reviewed changes

thomasklemm approved these changes Apr 8, 2021

View reviewed changes

Update german.txt

63b7107

"Zunahme" too close to "Zuname" and other improvements

thomasklemm approved these changes Apr 12, 2021

View reviewed changes

rodasmith approved these changes Apr 12, 2021

View reviewed changes

nisc approved these changes Apr 20, 2021

View reviewed changes

luke-jr added the Proposed BIP modification label Apr 25, 2021

Update german.txt

d97608b

Eliminated the words with highest complexity and replaced with simpler ones.

bitcoin deleted a comment from daniel3997 Jun 13, 2021

phuong1143 approved these changes Jun 14, 2021

View reviewed changes

luke-jr closed this Jul 2, 2021

ngima mentioned this pull request Dec 17, 2021

Add German dictionary valora-inc/react-native-bip39#8

Closed

BIP39 Add German Wordlist #1071

BIP39 Add German Wordlist #1071

Uh oh!

Conversation

SebastianFloKa commented Feb 22, 2021

Uh oh!

SebastianFloKa commented Feb 22, 2021

Uh oh!

thomasklemm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rodasmith commented Feb 22, 2021

Uh oh!

thomasklemm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasklemm commented Feb 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SebastianFloKa commented Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SebastianFloKa commented Mar 2, 2021

Uh oh!

TZocker commented Mar 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DivineDominion commented Mar 10, 2021

Uh oh!

nisc commented Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SebastianFloKa commented Apr 2, 2021

Uh oh!

SebastianFloKa commented Apr 4, 2021

Uh oh!

DivineDominion commented Apr 6, 2021

Uh oh!

SebastianFloKa commented Apr 6, 2021

Uh oh!

neox5 commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasklemm commented Apr 8, 2021

Uh oh!

SebastianFloKa commented Apr 8, 2021

Uh oh!

b068931cc450442b63f5b3d276ea4297 commented Apr 10, 2021

Uh oh!

b068931cc450442b63f5b3d276ea4297 commented Apr 10, 2021

Uh oh!

SebastianFloKa commented Apr 10, 2021

Uh oh!

b068931cc450442b63f5b3d276ea4297 commented Apr 11, 2021

Uh oh!

SebastianFloKa commented Apr 12, 2021

Uh oh!

b068931cc450442b63f5b3d276ea4297 commented Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasklemm left a comment •

edited

Loading

thomasklemm commented Feb 22, 2021 •

edited

Loading

SebastianFloKa commented Mar 2, 2021 •

edited

Loading

TZocker commented Mar 8, 2021 •

edited

Loading

nisc commented Mar 10, 2021 •

edited

Loading

neox5 commented Apr 6, 2021 •

edited

Loading

b068931cc450442b63f5b3d276ea4297 commented Apr 12, 2021 •

edited

Loading

TZocker commented Apr 12, 2021 •

edited

Loading