Skip to content

Encoding not handled correctly for natural earth data #739

@jtbraun

Description

@jtbraun

Prior to version 3.x of the natural earth data, the strings inside the *.dbf files were encoded as Windows-1252 as documented here: http://www.naturalearthdata.com/features/

Starting with the 3.x versions, the *.dbf files are encoded with UTF-8, as mentioned here: nvkelso/natural-earth-vector#89

At some point the zip files began including a .cpg file (like ne_10m_admin_0_map_subunits.cpg), whose contents specify the character encoding (UTF-8 in the example given).

In my opinion, since cartopy.io.sharereader.natural_earth() does the magic downloading of the natural earth data, it should also look for and unzip/cache the *.cpg file and the *.VERSION.txt file. It should look for the *.cpg file for the encoding, and if that doesn't exist it should read the version and compare it against 3.x and assume Windows-1252 or UTF-8.

Then, pyshp (shapefile.py) needs to be modified to allow the encoding to be specified. Today it auto-assumes utf-8 under sys.vertion_info[0] == 3, and assumes nothing (passes the bytes back/forth) for sys.version_info[0] != 3. (see GeospatialPython/pyshp#46)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions