-
Notifications
You must be signed in to change notification settings - Fork 17
Closed
Description
There are two issues with wordcloud when listening to classical music (cf. attached capture).
- Tags tend to have many short words or abbreviations which are meaningless when taken out of context. For example : "iv" (fourth movement), "ma" (as in "allegro ma non troppo"), "d" (as in "Fugue in d minor"), "op" (as in "Opus 2"), or any digit or number, which are very common (e.g. "Symphony no. 5" — also notice the frequency of "no" in the attached capture).
- Some terms use accented characters, mainly from French, which are not rendered properly (e.g. "étude" becomes "tude", "exécution" becomes "excution"). Although I haven't checked, this would also be the case with widely used German titles containing accented characters (e.g. "Verklärte Nacht", "Die Zauberflöte", "Götterdämmerung").
Proposed solutions:
- Easy: add an option to ignore numbers and words shorter than a given length. More involved: add an option to ignore italian musical terms (cf. http://www.musictheory.org.uk/res-musical-terms/italian-musical-terms.php).
- Add support for accented characters/Unicode.
Wonderful stuff otherwise!
Metadata
Metadata
Assignees
Labels
No labels