-
Notifications
You must be signed in to change notification settings - Fork 859
Labels
DatasetBugs, issues or improvements connected to the MCV datasets and linked processesBugs, issues or improvements connected to the MCV datasets and linked processesDemographics:accentIssues related to demographics of a contributor - accentIssues related to demographics of a contributor - accent
Description
Is your feature request related to a problem? Please describe.
- We can have predefined accents. People can also write their own free-form. One can select multiple accents.
- Multiple accents are separated by comma (,) in the metadata files
- On the other hand, predefined and/or free-form accent info can also contain (many do contain) comma
- Thus it becomes nearly impossible (i.e. without writing a parser) to separate different accent values
Describe the solution you'd like
Just use a special character like "|" or "@" to separate multiple accent values
Additional context
Here is a partial accents data where you can see many comma separated locations (it is usual for accents):
309 en preset southatlandtic South Atlantic (Falkland Islands, Saint Helena)
310 en preset african Southern African (South Africa, Zimbabwe, Namibia)
312 en preset bermuda West Indies and Bermuda (Bahamas, Bermuda, Jamaica, Trinidad)
314 es preset nortepeninsular España: Norte peninsular (Asturias, Castilla y León, Cantabria, País Vasco, Navarra, Aragón, La Rioja, Guadalajara, Cuenca)
315 es preset centrosurpeninsular España: Centro-Sur peninsular (Madrid, Toledo, Castilla-La Mancha)
316 es preset surpeninsular España: Sur peninsular (Andalucia, Extremadura, Murcia)
320 es preset caribe Caribe: Cuba, Venezuela, Puerto Rico, República Dominicana, Panamá, Colombia caribeña, México caribeño, Costa del golfo de México
321 es preset andino Andino-Pacífico: Colombia, Perú, Ecuador, oeste de Bolivia y Venezuela andina
322 es preset rioplatense Rioplatense: Argentina, Uruguay, este de Bolivia, Paraguay
Here is one recording from a user with multiple accents selected :
España: Centro-Sur peninsular (Madrid, Toledo, Castilla-La Mancha),España: Norte peninsular (Asturias, Castilla y León, Cantabria, País Vasco, Navarra, Aragón, La Rioja, Guadalajara, Cuenca)
This might be split easily if it were like:
España: Centro-Sur peninsular (Madrid, Toledo, Castilla-La Mancha)|España: Norte peninsular (Asturias, Castilla y León, Cantabria, País Vasco, Navarra, Aragón, La Rioja, Guadalajara, Cuenca)
ftyers
Metadata
Metadata
Assignees
Labels
DatasetBugs, issues or improvements connected to the MCV datasets and linked processesBugs, issues or improvements connected to the MCV datasets and linked processesDemographics:accentIssues related to demographics of a contributor - accentIssues related to demographics of a contributor - accent