Skip to content

Conversation

SebastianFloKa
Copy link

The BIP-0039 German Wordlist is based on spelling rules defined in the “German Duden” and checked along different aspects of quality by native speakers. Words were selected manually and also checked manually to ensure words are sufficiently common and positive. Tools were used to ensure sufficient levenshtein distance between words, prevent conflict with other BIP-0039 wordlists as well as to eliminate homophones inside the wordlist.

There was a first attempt (#721) and a second attempt (#942) for a BIP-0039 German Wordlist. This third attempt intents to combine the requirements from both, the Bitcoin Community within the Geman-speaking area as well as must-have requirements for BIP-0039 Wordlists such as levenshtein distance and no homophones.

Special considerations:

  1. Words can be uniquely determined typing the first 4 characters.
  2. Words contain between 3 to 8 letters per word
  3. No words with 1 letter of difference (no levenshtein distance substitution, addition or permutation lower than 2)
  4. No words already used in other official BIP-0039-Wordlists
  5. No accents or special characters. No Ä, Ö, Ü, ß
  6. All-Caps in order to address nouns not written in lowercase in German and keep number of characters to 26 (A-Z) only.
  7. Orthography based on German spelling reform of 2006 and based on the German Duden 2021
  8. Only singular nouns and plural tantum nouns (if no singular exists).
  9. If a homophone for a word exists, only one of these words is allowed in the wordlist under condition that using grammatical gender ensures unambiguous spelling.
  10. No offensive words and no words implying negative, sad or bad feelings.

@SebastianFloKa
Copy link
Author

Thanks @DavidMStraub for starting with the first attempt and @cr for the second attempt regarding a BIP-0039 German Wordlist. Hope you will join this PR which main difference is the implementation of levenshtein distance (addition, substitution & permutation not lower than 2).

Supplementary to the basic requirements some more considerations:

  • This proposal follows @DavidMStraub requirement of nominative nouns. On top countries, cities, persons, names etc. were excluded.
  • @thomasklemm requested to change to more commonly used words, this should be the case now.
  • @cr requested to avoid collision with other released BIP-0039-Wordlists which is taken into consideration.
  • In order to bring in cultural specialty to the BIP-0039 the proposal is written in all-caps. Writing nouns in lower-case-letters is conflicting with common sense of German language. Studies also show that the readability of handwritten Text in all-caps is significantly better, so this lowers the risk of losing money. A positive side-effect is that the number of used characters reduces from 52 to 26. This is an advantage not only for self-filled cold wallets.
  • Going the extra mile even the levenshtein distance “addition” was reduced to a value lower than 3 for the beginning of a word by exluding words with a related meaning (example Lanze & Pflanze, Sekt & Insekt, etc. are in the list - Mut & Unmut not).
  • @rodasmith made some requirements for avoiding homophones. The current list even went beyond by excluding words completely from the list if a homophone exists as a noun with same genus (Miene&Mine, Verse&Ferse, Hund&Hunt, Graph&Graf, etc.) Basis: https://de.wiktionary.org/wiki/Verzeichnis:Deutsch/Homophone

Copy link

@thomasklemm thomasklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @SebastianFloKa, thanks for opening this PR with an alternative list. I know that a lot of work has gone into it already from #721, and again thanks to everyone who participated in the previous two attempts for a German wordlist. Hope some of you native German speakers could go through the list here too and leave some comments!

Reviewed the wordlist until line 1000 so far, going through the rest later.

@rodasmith
Copy link

ACK. This list does not include any homophones. LGTM

Copy link

@thomasklemm thomasklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked through the rest of the list, very good work IMO 👍 Just some minor notes on some words.

@thomasklemm
Copy link

thomasklemm commented Feb 22, 2021

Very well-prepared word selection @SebastianFloKa, LGTM 👍 Went through the entire word list word by word, left minor comments on a few words.

If you have the chance and especially if you're a native German speaker, please jump in for a review too.

@SebastianFloKa
Copy link
Author

SebastianFloKa commented Mar 2, 2021

Checking with https://www.korrekturen.de/rechtschreibpruefung.shtml following words (beside already mentioned ones: "Gumpe", "Tidehub", "Trebe" & "Zuseher") are marked eventhough they are all properly listed in the https://www.duden.de/. Beside other reasons this seems partly be related to words more common in Austria or Switzerland. I personally think it's good to have some words from different parts of German language region as long as they are understood everywhere - open to discuss.

Allrad
Bauchweh
Gemahl
Kapriole
Kassier
Kubik
Oktagon
Petersil
Vorkehr
Zuhause

@thomasklemm in particular and maybe @neox5 wants to have a look as well: Shall we replace all of above words or would you say we can / should keep some of them?

@SebastianFloKa
Copy link
Author

In case we would replace all the words highlighted by @thomasklemm except for "Fresko" & "Tidenhub" as well as all the 10 words marked by the spellchecker (Allrad, .... , Zuhause) there are 31 words to be replaced in the next loop.
Due to working for 2 1/2 years on this project now with changing "special considerations" many words felt out of the list over time. Therefore a big "rerun" against levenshtein collision, other worklists, homophones etc. was made and a bunch of acceptable words was found. Here are 31 proposals:

BART
BEIN
BLECH
BUSCH
FUNKE
GELD
HALLO
HARZ
HECHT
HOLZ
KREUZ
KURS
LIEBE
LUST
MUSE
NATUR
PORTO
PROBE
PUMPE
PUNKT
RASEN
REIHE
REST
RIND
RITUS
RUHM
STROH
TALER
TREUE
WANNE
WUNDER

Due to the inter correlation it might be necessary to have some backup words:

AKKU
ALGE
BELAG
BUNDHOSE
DEMO
DOKU
ENDE
EURO
FANG
FIEBER
MATHE
NETTO
PORE
SEEHUND
SOLD
SPESEN
VIEH
WEITE
WOGE
VISIER

So if you prefer to replace some other words from the initial list with above backup words is fine as well.

@TZocker
Copy link

TZocker commented Mar 8, 2021

Vorschläge:

Amsel
Ostern
Fernweh
Simulant
Fern
Walnuss
Lorbeere
Misteln
Wichtel
Holz
Zunge
Zug
Mettigel
Maihock
Mai
Kraut
Wurst

@DivineDominion
Copy link

If replacements are needed, I'd like to suggest Drossel, so we have all of the well-known bird names of "Amsel, Drossel, Fink und Star". Would definitely prefer these over nautical vocabulary :)

@nisc
Copy link

nisc commented Mar 10, 2021

Guys thank you for your service, but I can't hide that I'm mostly following this conversation because it reliably makes me giggle.

PS: After thinking it through, I would probably not include any of the Breze* words. There are too many regional variations, which will lead at least to confusion, but maybe even to emotion and anger ("why did they dare to include this inferior spelling in my seed phrase?").

@SebastianFloKa
Copy link
Author

thanks @TZocker @nisc @DivineDominion for joining and your input. New proposal with implementations also with the initial ones of @thomasklemm will follow soon.

@SebastianFloKa
Copy link
Author

Would definitely prefer these over nautical vocabulary
@DivineDominion Are there any specific words you'd like to see replaced? Not sure which one is meant with "nautical".

@DivineDominion
Copy link

@SebastianFloKa "Luv" and maybe "Tidehub", as pointed out by others in comments above

@SebastianFloKa
Copy link
Author

@TZocker I checked your proposals against criterias:

Vorschläge:

Amsel --> NOK - Levensthein substitution collision with AMPEL
Ostern --> NOK - Indication by first 4 letters not ensured against "Osterei"
Fernweh --> OK - we have "Heimweh" already, but OK to go for Fernweh as well.
Simulant --> OK
Fern --> Not a noun
Walnuss --> OK
Lorbeere --> OK, typically used as plural, but OK.
Misteln --> NOK - plural not singular / singular with levenshtein collision
Wichtel --> NOK - Levenshtein substitution collision with Wachtel
Holz --> OK - already in proposal list above
Zunge --> NOK - levenshtein substituition collision with Junge
Zug --> NOK - levenshtein addition first 3 letters collision (Zugriff, Zugzwang, etc.)
Mettigel --> NOK - not listed in the German Duden
Maihock --> NOK - not listed in the German Duden
Mai --> NOK - levenshtein addition first 3 letters collision
Kraut --> NOK - levenshtein substitution collision with Kraft
Wurst --> NOK - levenshtein substitution collision with Durst

@DivineDominion
Beside "Amsel" (see "Ampel" above) also "Star" shows a levenshtein substitution collision ("Stau"). "Fink" already in the list, "Drossel" OK to add. Will remove "Luv" & "Tidehub" completely.

@nisc all "Breze*" words will be removed

SebastianFloKa and others added 2 commits April 6, 2021 18:30
Co-authored-by: Thomas Klemm <github@tklemm.eu>
Improvement loop mainly based on feedback of @thomasklemm but also @TZocker & @DivineDominion & @nisc
@neox5
Copy link

neox5 commented Apr 6, 2021

Vorschläge:
Daumen
Nagel
Schrift
Orange
Triangel

If you could share your tools for checking, I would do the checks by myself! So you don't have to do all the work by yourself 😉

@thomasklemm
Copy link

@SebastianFloKa Thanks for incorporating all the feedback to the word list. IMO it's really good work, has had many iterations already and can get merged.

@SebastianFloKa You should see a "Resolve conversation" button next to each individual conversation and can close the ones that are now resolved (Only PR author and repo maintainers seems to see it according to https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations, so I can't mark my own comments as resolved).

To all German native speakers reading this PR: It would be really good if you can take the time to go through the list and leave your comments or a 👍 on the PR.

@SebastianFloKa
Copy link
Author

@neox5 thanks

Vorschläge:

Daumen - NOK - Levenshtein-substitution collision with Gaumen
Nagel - NOK - Levenshtein-substitution collision with Nadel & first 4 letters same with Nagetier
Schrift - NOK - first 4 letters same with Schrank
Orange - NOK - colission with existing word in BIP0039 English Wordlist

Regarding the tool:

  • There are actually many separate tools and it would require significant redesign to make it useable for others (background-understanding of abbreviations). But there are ready to use solutions as far as I know shared in other wordlist conversations.

  • Attached a list of backupwords that probably fulfill the requirements in case you want to replace certain words - tbc in individual cases.
    210408-BIP39-German-Wordlist_backupwords.txt

  • Will replace "Zunahme" (close to Zuname) with "Zufluss" & long word "Artistik" with shorter "Artikel"

"Zunahme" too close to "Zuname" and other improvements
@b068931cc450442b63f5b3d276ea4297

I had already finished a draft of it independently in December and unfortunately only now managed to publish it. However, it also contains words that appear in other word lists. Maybe the comparison helps anyway: https://github.com/dys2p/wordlists-de/blob/main/de_2048.md

@b068931cc450442b63f5b3d276ea4297

I just had a quick look at the list, you could shorten some words even more e.g.
GOLDADER -> GOLD
REISFELD -> REIS or REISE
REICHTUM -> REICH

@SebastianFloKa
Copy link
Author

@b068931cc450442b63f5b3d276ea4297 Hi, thanks for participating.

unfortunately you are right, your list contains many collisions with other BIP0039 Wordlists (278 collisions in total). Beside this it contains

  • 7 collisions with unambigousnes of first 4 letters (baum/baumarkt, dringend/drinnen, etc.),
  • some words are in the list and a homophone exists with same genus (Leib/Laib, Lied/Lid, etc.)
  • and main point is that there are many levenshtein errors. I haven't checked in detail but they show up quite obviously (bett/brett, bezug/ bezog, anzug/abzug, etc.)

I just had a quick look at the list, you could shorten some words even more e.g.

GOLDADER --> GOLD --> NOK - levenshtein substitution error with GELD
REISFELD --> REIS --> NOK - levenshtein addition error with PREIS
--> or REISE --> NOK - collision with our "extra mile requirement" to avoid levenshtein addition on the first letters if too similar regarding the meaning of a word (ABREISE)
REICHTUM -> REICH --> NOK - levenshtein substitution error with TEICH

Question: Do you have the ability to filter your list for singular nouns only and share it? Then a countercheck might make sense.

@b068931cc450442b63f5b3d276ea4297

@SebastianFloKa Thank you. In which form do we want to do this best and why do you actually only want nouns?

Improvement loop related to input of @b068931cc450442b63f5b3d276ea4297: replacing uncommon / difficult words + reducing "homophone risky words" + reducing amount of words starting with AB***
Thanks for approving if OK or leaving comments if NOK. Also @thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith
@SebastianFloKa
Copy link
Author

@SebastianFloKa Thank you. In which form do we want to do this best

See latest proposal - leave comments in case you disagree or approve if OK.

and why do you actually only want nouns?

There are advantages for brainwallets but these can be negleted as brainwallets aren't recommended except for special situations. But mainly it's the same reason why the effort regarding levenshtein distance is done: to reduce room for misinterpretation which might cause loss of money.

  • Somebody aware of this special consideration would very likely recognize if a wrong word that violates this structure (adjective, verb, etc.) was written down accidentially.
  • Somebody not aware of this special consideration might get cautious if a "non singular noun" occure whereas the other 23 words are singular nouns (even if few words in the wordlist exists as verb etc. as well). Not guaranteed that the error will be recognized, of course, but chances are increased.
  • Same for reconstructing certain words from a partly damaged wallet: It significantly reduces choices if you need to search for a singular noun only.

@thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith @b068931cc450442b63f5b3d276ea4297
Thumbs up (or approve changes) in case you agree to latest modifications in the list (or leave comments if not) and we could then go for a third-party-check concerning levenshtein distance etc.

@b068931cc450442b63f5b3d276ea4297
Copy link

b068931cc450442b63f5b3d276ea4297 commented Apr 12, 2021

@SebastianFloKa I think that there should be another fundamental discussion about whether it makes sense to omit verbs and adjectives or not. The other word lists also work with adjectives and verbs and the omission only unnecessarily restricts the possible words.

Somebody aware of this special consideration would very likely recognize if a wrong word that violates this structure (adjective, verb, etc.) was written down accidentially.
Somebody not aware of this special consideration might get cautious if a "non singular noun" occure whereas the other 23 words are singular nouns (even if few words in the wordlist exists as verb etc. as well). Not guaranteed that the error will be recognized, of course, but chances are increased.

By making the words as familiar as possible and known to everyone, you probably also reduce the risk of people making mistakes. Words like kurz, lang, rot, blau, laufen, gehen, stehen are known to elementary school students while words (are only examples) like akazie, amnestie, anagramm, annexion and anode are far less known.

Same for reconstructing certain words from a partly damaged wallet: It significantly reduces choices if you need to search for a singular noun only.

This is not true, because the used words are always n of 2048 possible words of the list (if only this list was used). So it doesn't matter if there were only nouns or nouns, verbs and adjectives.

@thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith What do you guys say, should only nouns be used or also adjectives and verbs in their base form?

@rodasmith
Copy link

I'm satisfied with the outcome of the earlier conversation in #942 that concluded to use nouns only, avoiding confusion around capitalization. Here's an excerpt from that conversation:

One of the advantages of not having interflections would be for the "standard user" to reduce the risk of accidentially misinterpreting a word. Example: Your Seed starts with "FUCHS, GELB, LAUFEN" and then follows a "ANGLE" (1. pers. singular). Some people might either oversee this and assume the noun "ANGEL" is foressen or at least they are confused if there might be a typo (no matter if all caps or not). Another constellation could be when two more expected words are 1 levenshtein addition or subraction away. Example "FANGE" (1. pers. singular) and the expected "FANG" (noun) or "FANGEN" (verb) aside. This is the advantage I see behind reducing the amount of wordclasses (and limit to infinitives): it reduces the risk of misinterpretation when copying, writing down, communicating or reading a seed.

The decision seemed good then and I don't see any reason to revisit it.

@b068931cc450442b63f5b3d276ea4297

to use nouns only, avoiding confusion around capitalization.

Capitalization is not a problem because the words in the lists are always all lowercase (except @SebastianFloKa who writes everything in capital letters). I think this is also an important point that should be discussed again. I do not share the opinion of @SebastianFloKa:

I think there's a misunderstanding. I meant to say "all caps" instead of "capital letters", sorry for this. I think our intentions are very close together as I also don't want people to mix lower case and upper case - agree 100%. But writing nouns in lower case is quite uncommon for german speakers whereas filling out templates in upper case (all caps) is much more common (official documents etc.). From your example:
klage farbe anzahl initiative stieg banane seide holt gesagt ahnen
KLAGE FARBE ANZAHL INITIATIVE STIEG BANANE SEIDE HOLT GESAGT AHNEN

All previous lists are in lowercase and with nouns, verbs and adjectives.

@TZocker
Copy link

TZocker commented Apr 12, 2021

@SebastianFloKa ich bin der Meinung das es nicht als so entscheiden ist ob Verben etc. auch verwendet werden, wir sollten uns an die anderen Bips richten. Wir bekommen dadurch weitere Alternativen. Levenshtein Kollision wird vieles verhindern. Merksätze wären dann möglich....

Sry bei meinem Vorschlag habe das mit Levenshtein nicht verstanden. Sry....
Ebenso würde ich deiner Bemerkung mit Lorbeere folgen, dort die Mehrzahl zu verwenden.

Würde eher darauf wert legen den Sprachschatz auf das Niveau von einem 12 Jährigen zu reduzieren.
Und eingedeutschte Wörter wie trainer/viper etc. vermeiden, um mit anderen Bips nicht in Konflikt zukommen.
Ebenso die Einheimischen Tiere bevorzugen. Genauso die alten Begrifflichkeiten reduzieren.

Wörter wie Zwinger und Ritze sollten evtl. noch ersetzt werden (Vieldeutigkeit).

MFG

@SebastianFloKa
Copy link
Author

This is not true, because the used words are always n of 2048 possible words of the list (if only this list was used). So it doesn't matter if there were only nouns or nouns, verbs and adjectives.

Somebody reconstructing a partially destroyed wallet will appreciate less choices of word categories once it comes to guessing hard to read words (e.g. from housefire etc.). Not a superimportant advantage, true, but mentioned anyway.

wir sollten uns an die anderen Bips richten. [we should focus on other Bips]

Well, simply doing the same thing would mean an enormous amount of levenshtein errors (English wordlist) or unintended 9 letters per word (Italian wordlist) etc., so you probably mean to focus on the positive progress of other wordlists.
It was the need of the people that the authors of the BIP39 mnemonic seed had in mind when designing this solution rather than accepting current realities of that time. I’m therefore convinced (but in the end it’s their decision) that the authors value the peoples cultural background in orthography (like writing nouns in “capital latin letters”) more than simply following given structures from other language lists that accidentally doesn’t have such a background (Latin languages & English use “lowercase latin characters” for all). Asian wordlists for example deviate from “lowercase latin characters” for exact that reason.

Q: Is there any advantage for people (people, not for the IT behind that can handle capital letters) in the German language area to write words in the more uncommon “all lowercase” that we might haven’t taken into consideration yet?

Q: Do you require to have adjectives and/or verbs in the list or is it because we might not find sufficient easy nouns?
I’m asking because I’m generally open to adjectives & verbs (were included in first two proposals 2 years ago), just had the impression it has quite some advantages for the community / users to go for nouns only. And particularly when taking levenshtein into consideration many verbs fall apart anyway. Example: “leben”: kleben, loben, heben, weben, geben, beben, Leber, Segen, etc.

About certain words:
Foreign words:
@TZocker Picking out all foreign words is almost impossible, even Onkel is actually a foreign word. Words sounding too “foreignish” were already eliminated.
Trainer: is a borderline word, I would have said this is still OK as it is mentioned in the Duden as very common. But TBD.
Viper: Is mentioned in the Duden as “mittelhochdeutsch”, pronounced German (differently pronounced in English) and with latin background long ago. For me OK – TBD
Zwinger / Ritze: Not aware of inappropriate meaning, particularly Zwinger, but OK to discuss/change.

Generally:
Q: Isn’t it acceptable if once in a while somebody would look up a word in case he/she doesn’t remember the exact meaning or definition of a word and if he/she is really interested in? Even including verbs & adjectives it's impossible to ensure that really everybody will be aware of the exact meaning of every single word of the list.
Quality of words is subjectively driven topic – Example: I would have said Akazie is less risky to spell incorrectly compared to e.g. Pyjama from your list. But generally it's true, your list consits less uncommon words. Therefore the proposal was and is to highlight our “no-go-words” and try to replace them. Thanks to your @b068931cc450442b63f5b3d276ea4297 's supplementary words we even have some more backup words to work with. I will update the backup-words-list soon. Actually 3 of your mentioned words are part of the top ten worst words in the current proposal, based on my subjective perspective, as well.
210419-10 worst words in german wordlist.txt
But if there are too many more, yes, we might have to think about adjectives & verbs, yes.

Proposal:
@b068931cc450442b63f5b3d276ea4297 Could you and other imagine that we keep going through the critical words step by step, filter the really inacceptable ones and try to replace them with better ones from your list + my backup list?

@b068931cc450442b63f5b3d276ea4297

Somebody reconstructing a partially destroyed wallet will appreciate less choices of word categories once it comes to guessing hard to read words (e.g. from housefire etc.). Not a superimportant advantage, true, but mentioned anyway.

No, the word category does not play a role but only the word list (incl. used characters and the length of the words).

Well, simply doing the same thing would mean an enormous amount of levenshtein errors (English wordlist) or unintended 9 letters per word (Italian wordlist) etc., so you probably mean to focus on the positive progress of other wordlists.
It was the need of the people that the authors of the BIP39 mnemonic seed had in mind when designing this solution rather than accepting current realities of that time. I’m therefore convinced (but in the end it’s their decision) that the authors value the peoples cultural background in orthography (like writing nouns in “capital latin letters”) more than simply following given structures from other language lists that accidentally doesn’t have such a background (Latin languages & English use “lowercase latin characters” for all). Asian wordlists for example deviate from “lowercase latin characters” for exact that reason.

Q: Is there any advantage for people (people, not for the IT behind that can handle capital letters) in the German language area to write words in the more uncommon “all lowercase” that we might haven’t taken into consideration yet?

According to my understanding, the advantage is that you don't have to worry about upper and lower case and therefore write everything in lower case in such lists. This is the case with the other bip39 lists, with diceware lists like https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases and many other projects.

Q: Do you require to have adjectives and/or verbs in the list or is it because we might not find sufficient easy nouns?
I’m asking because I’m generally open to adjectives & verbs (were included in first two proposals 2 years ago), just had the impression it has quite some advantages for the community / users to go for nouns only. And particularly when taking levenshtein into consideration many verbs fall apart anyway. Example: “leben”: kleben, loben, heben, weben, geben, beben, Leber, Segen, etc

Both, if we take them on the lists corresponds to the convetions of the word choice of the other languages and it increases the pool from which we can use words that most 12 year olds know.

Proposal:
We go through the list like this and comment behind it those we consider critical/inappropriate and add adjectives and advertisements. From this we then select the best, write everything in lower case and are done?

Do you use https://bip39validator.readthedocs.io/en/latest/running.html for the tests?

@luke-jr
Copy link
Member

luke-jr commented Apr 25, 2021

@b068931cc450442b63f5b3d276ea4297

Neither the list, nor the discussion about it is closed from my point of view. If this list get merged in this form, it would be a missed opportunity.

@SebastianFloKa
Copy link
Author

thanks @luke-jr and other BIP39 authors + responsibles and "welcome"

@b068931cc450442b63f5b3d276ea4297 no worries, "proposed BIP modification" doesn't mean it's merged.

No, the word category does not play a role but only the word list (incl. used characters and the length of the words).

Have you ever tried to recapture a partially destroyed wallet (e.g. from fire) where e.g. the first letter of a word is illegible as well as some at the end or in the center. A normal user doesn't have a tool to filter for words with certain letters on certain positions. Means the user will have to guess possible words. So it's easier for him to search for a noun only instead of nouns, verbs and adjectives. It's not a must have or the most important feature, but a small advantage.

Do you use https://bip39validator.readthedocs.io/en/latest/running.html for the tests?

no, running my own - but this might be good to work with.

We go through the list like this and comment behind it those we consider critical/inappropriate and add adjectives and advertisements. From this we then select the best, write everything in lower case and are done?

What do you mean with advertisement? Generally OK to go through the list and select inappropriate words, of course. For lower case I'm personally not convinced yet, not sure about the others. It feels very strange for people from german language area to write nouns in lower case plus the other reasons (people write more legible in all caps etc.) - also will this later be part of the BIP39 authors decision as well. I'm fine to continue step by step (as we do since years now), just let me replace the 10 above mentioned words with other nouns first (need a bit of time) and then go through the list again.

@b068931cc450442b63f5b3d276ea4297

Thank you.

Have you ever tried to recapture a partially destroyed wallet (e.g. from fire) where e.g. the first letter of a word is illegible as well as some at the end or in the center. A normal user doesn't have a tool to filter for words with certain letters on certain positions. Means the user will have to guess possible words. So it's easier for him to search for a noun only instead of nouns, verbs and adjectives. It's not a must have or the most important feature, but a small advantage.

I haven't, but whether it's a noun, verb or adjective doesn't matter at all. Since it is 1 of 2048 that are in the list.

Sorry, I meant verbs and wanted to write an example with werben/Werbung (advertisement) first. With your list, I have already submitted as a pull request what I would remove and what I would add if necessary.

I am currently working on another list, which could help if we want to add verbs and adjectives.

For me it feels strange to see and write everything in capital letters. Even when we write normally, most of the letters used in any normal sentence are lowercase. The contract with the applications I also find a bit far-fetched, I think every person writes in letters, messengers and everywhere much more lowercase and finds it rather strange when someone with capslock writes everything in capital letters.

Eliminated the words with highest complexity and replaced with simpler ones.
@SebastianFloKa
Copy link
Author

I haven't, but whether it's a noun, verb or adjective doesn't matter at all. Since it is 1 of 2048 that are in the list.

Of course is each word in the list 1of 2048, but in my example the "wordpool" for the user is not the list but all words. Let's have an example: A steelwallet went through housefire, some words are not completely readable anymore, e.g. at one word the second letter is readable as "L", the third is "A", the fourth is "T", the first letter and the ending is unknown (?LATT???). The user has two options: A) Go through the complete list line by line and check if the word might fit. Or the much more realistic scenario B) one will "guees" which word could be meant. In our case the noun "BLATT" might come to your mind and you will check in the wordlist directly under "B" if this is one possible solution. If also verbs & adjectives are included there are more choices to look up and will be more time consuming to figure out which one is intended: "glatt, platt, flattern, etc.".
Again: this is only a minor advantage in favor of "nouns only" supplementary to the other ones mentioned before (so this alone wouldn't justify to go for nouns-only).

The expectation of limiting complexity to a certain age (e.g. 12 year-old) sounds nice, but couldn't find a source for correlation between "age" and "words", means it will stay our subjective decision which words to accept.

Having few words being on a 16 year-old basis would statistically result in every once in a while a wallet created could include one or few words that would need to be looked up by the user (in case even is interested in). So far we said this disadvantage is worth all the advantages gained by nouns-only, it makes sense to go through history of this to get an understanding - but if the community disagrees and requests many words to be replaced and not only few I'm open that the list will be reworked accordingly, of course.

What's your positions on this? Or do you want a survey?

@thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith @b068931cc450442b63f5b3d276ea4297

@b068931cc450442b63f5b3d276ea4297

If I can still read "?LATT???" from the letters I open the list with the 2048 words, press Ctrl+F and enter "LATT". It really doesn't matter to which word category the word belongs. I don't have to go through line by line, and even if I do, it's easier than picking out a much larger number of nouns from the Duden, for example.

I see no advantages but many disadvantages in choosing a list of nouns only.

The 12 years was just an example. The simpler and more widespread the words are, the better. You can also look at "basic vocabulary" and "extended basic vocabulary", just like the linguistic levels A-B.

So far we said this disadvantage is worth all the advantages gained by nouns-only ...

No I think you are the only one who says/writes that.

@nisc
Copy link

nisc commented May 6, 2021

If I can still read "?LATT???" from the letters I open the list with the 2048 words, press Ctrl+F and enter "LATT".

I think it's a tough call. Most people today wouldn't know how BIP39 works and that there's a pre-defined list of 2048 words, with each word in the 24-word mnemonic representing 11 bits of a 256+8 bit seed ("What is a Bit?").

Other people wouldn't realize that there's a pattern, i.e., that the seed only includes nouns.

I slightly prefer the nouns only version. I think more people see the only-nouns pattern than the 264 bits.

In the end it really doesn't matter too much, though. If people lose a lot of money, they'll seek help. Someone will be able to explain it to them.

@bitcoin bitcoin deleted a comment from daniel3997 Jun 13, 2021
@bitcoin bitcoin deleted a comment from daniel3997 Jun 13, 2021
@luke-jr
Copy link
Member

luke-jr commented Jul 2, 2021

For now, the author(s) of BIP 39 have decided not to accept any further word lists into BIP 39 itself, and encourage adding new ones to the WLIPs repo here: https://github.com/p2w34/wlips

@peterhgruber
Copy link

thanks for the effort. Two considerations

  1. I strongly advise all lowercase. I understand that german nouns in lowercase might look unfamiliar, but uppercase has distinct and really bad disadvantages. First, words in all caps are much harder to read (as our mind reads more word contours than individual letters) and second from a practical point of view writing all caps e.g. on an iPhone is a hassle.
  2. Is there really such a necessity for excluding words on wordlists in other languages? This leads to choosing "Fotograf" over "Foto" (I assume). If it were the case (as e.g. many wallet apps have no settings for the language), then one would need to be stricter, i.e. excluding all words that have an identical counterpart in any word list when only considering the first four letters (thus excluding the "Fotograf" as well.

@joshuakraemer
Copy link

Thanks as well! I, as a German, would much prefer all uppercase instead of all lowercase. All lowercase doesn't conform to the rules of orthography, and traditionally uppercase letters are used if only one case is allowed (e.g. in forms or crosswords). Word contours will be wrong with all lowercase, as nouns are normally written with a capital at the beginning. Anyway, in the case of this word list, correctly reading every single letter is probably more important than quickly reading whole words. Maybe all uppercase is even advantageous for this purpose.

@PeterTheOne
Copy link

Why exclude Umlaut and ß, they are part of the Language? They could of course be considered equal to their non Umlaut counterparts (äöü -> aou and ß -> ss) as is the case with. See other languages wordlists. It just seems like an arbitrary constraint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.