-
Notifications
You must be signed in to change notification settings - Fork 467
Description
Environment
Python Version:: Python3.10.6 (and 3.11.0rc1 behaves the same)
Nikola Version: v8.2.3 (and latest git clone of commit 483ffd8 Wed Mar 1 22:52:02 2023 behaves the same)
Operating System: Linux Pop_OS 22.04 LTS (derived from Ubuntu 22.04)
Description:
Hi, good people! I am happy to look into contributing a PR to fix this myself, but wanted to discuss it first before wading in. Thanks!
Symptom
Adding a WebP image (with '.webp' extension) to my gallery causes nikola build
to fail partway through, with a traceback (below). This happens even on a fresh site, created using nikola init -dq site
.
After the failure, running nikola serve
displays the gallery correctly, including all the WebP images, but the gallery RSS feed is empty. Because this is so close to working, I'm hopeful that a small fix might be possible.
I notice that galleries with an empty file named "image.webp" provoke this error, while gallery files named "image", with no extension, or "image.nonsense", are correctly ignored. So I infer that webp images are expected to work in galleries. (and the section below, "Any other image types we should handle?", seems to corroborate.)
The expected behavior is:
- WebP images ought to "just work", both at site generation time, and at gallery viewing time, along with their gallery RSS feed.
- For any gallery image file with an unrecognized mime type, options for how we should behave are enumerated below in section "Be robust to files with unrecognized mime type".
The Traceback
$ nikola build
Scanning posts........done!
. render_galleries:output/galleries/demo/sp.webp
. render_galleries:output/galleries/demo/index.html
. render_galleries:output/galleries/demo/rss.xml
TaskError - taskid:render_galleries:output/galleries/demo/rss.xml
PythonAction Error
Traceback (most recent call last):
File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/doit/action.py", line 461, in execute
returned_value = self.py_callable(*self.args, **kwargs)
File "/home/jhartley/Dropbox/jon/data/code/3rdparty/nikola/nikola/nikola/plugins/task/galleries.py", line 773, in gallery_rss
data = rss_obj.to_xml(encoding='utf-8')
File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 39, in to_xml
self.write_xml(f, encoding)
File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 34, in write_xml
self.publish(handler)
File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 380, in publish
item.publish(handler)
File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 440, in publish
self.enclosure.publish(handler)
File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 221, in publish
_element(handler, "enclosure", None,
File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 47, in _element
handler.startElement(name, d)
File "/usr/lib/python3.10/xml/sax/saxutils.py", line 170, in startElement
self._write(' %s=%s' % (name, quoteattr(value)))
File "/usr/lib/python3.10/xml/sax/saxutils.py", line 60, in quoteattr
data = escape(data, entities)
File "/usr/lib/python3.10/xml/sax/saxutils.py", line 27, in escape
data = data.replace("&", "&")
AttributeError: 'NoneType' object has no attribute 'replace'
The problem stems from slightly earlier, nikola/plugins/task/galleries.py, line 753:
mimetypes.guess_type(img)[0]
Where 'img' is the filename of the WebP image, ending with extension '.webp'. .guess_type()
doesn't recognize the MIME type of this, and returns (None, None). The first of those None values causes problems a little later in the code (see the traceback), where it is used in XML/RSS generation.
Analysis
The docs for the standard library function .guess_type()
say that it recognizes the MIME types that are registered with IANA. They list 'image/webp' in a draft RFC, so this isn't yet an official MIME type.
Sure enough, 'webp' is not included in the files consulted by the Python mimetypes package, which can be listed using:
>>> mimetypes.knownfiles
['/etc/mime.types', '/etc/httpd/mime.types', '/etc/httpd/conf/mime.types', '/etc/apache/mime.types', '/etc/apache2/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/lib/netscape/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/etc/mime.types']
So it's not surprising that guess_type() returns None.
Possible Solutions
I know nothing, so these might be bad ideas. Each checklist item is described below.
- Inject a hardcoded 'image/web' mimetype at runtime, before .guess_type() gets called (Merged as #3674)
- Add a test of gallery RSS content (Merged as #3676)
- Be robust to files with unrecognized mime types
- Check other image types we should handle. (Done in #3672.)
- Review other calls to
mimetype.guess_type()
. - Review whether "producing an RSS enclosure" is duplicated code.
- Call
mimetypes.init()
(Rejected) - Consider manually adding the webp mime type to my machine (Rejected)
- Consider automatically adding the webp mime type on Nikola install (Rejected)
Inject a hardcoded 'image/web' mimetype at runtime, before .guess_type() gets called
It sounds like the mimetypes module has ways to inject custom mimetypes into its internal list. We could add ".webp" -> "image/webp" before .guess_type()
gets called.
Add a test of gallery RSS content
A new test of RSS content would be useful, currently I think this is only verified in the CI baseline tests. Once such a test exists in the Python test suite, it (or a minor variation on it) could be used to verify the proposed fallback for unknown mimetypes (see the next item below).
- The new test is in Add a test for gallery RSS content #3676.
Be robust to files with unrecognized mime type
Regardless of whether we decide to support WebP images here or not, we should be robust to .guess_type()
returning (None, None) for files with unrecognized mimetype. There must be a more graceful way to handle this than ending the build with an unhandled exception. But what should that be?
- Fall back to a more generic "image" mimetype? Does such a thing exist? Or even more generally, an octet stream? This does exist, but presumably would not display in a browser. What would the practical upshot be for gallery viewers?
- Display a placeholder broken image instead? What would be the best way to do this?
- Don't include this image in the gallery at all?
- Log the error so the site owner can diagnose what's going on?
Check other image types we should handle.
Are there other images types that might work in a gallery but are impacted by this problem? If I'm going to look at this, I might as well systematically test them.
- Done: I tested all other image types in galleries in Testing supported image formats in a gallery #3672.
Although that did uncover some other problems, tracked there, none of the other image types break the build like this. They all have a recognized mimetype.
Review other calls to mimetype.guess_type()
The code contains three calls to mimetype.guess_type()
:
- nikola/plugins/task/galleries.py, line 753: The code under discussion, for producing RSS enclosures for images in a gallery.
- nikola/nikola.py, line 374: Produces RSS enclosures for posts.
- nikola/plugins/command/auto/__init__.py, line 574: This code is doing something else, not related to RSS as far as I can see.
Do calls (2) and (3) also need guarding against (None, None) return values?
Review whether "producing an RSS enclosure" is duplicated code.
Do calls (1) and (2) above represent duplicated code, to produce an RSS enclosure, that should be extracted and shared in a function somehow? On first glance I don't think so, but plan to review the idea and would be interested to hear any opinions.
Rejected ideas
Rejected idea: Call mimetypes.init()
My first glance at the mimetypes docs pages made me think we ought to be calling mimetypes.init()
before calling .guess_type()
. But later I noticed the docs says:
If the module has not been initialized, [functions] will call init() if they rely on the information init() sets up.
So we don't need to call .init()
explicitly.
Rejected idea: Consider manually adding the webp mime type to my machine
I see StackOverflow questions about not-entirely-dissimilar problems in other software, which are also caused by the lack of a known MIME type for WebP images. A suggested solution there is to add the 'image/webp' MIME type to one of the 'knownfiles' listed above.
This sounds like a temporary workaround for users who hit this today. But editing these files presumably conflicts with the distributions management of them, which will have bad side effects (e.g. updates could overwrite the user's changes.)
Rejected idea: Consider automatically adding the webp mime type on Nikola install
Instead of suggesting to users that they change the contents of the 'knownfiles' listed above, we could do it automatically for them on Nikola install. This sounds like a bad idea which could cause problems for us in the future (e.g. when our changes are overwritten by an automated update of the changed file, leaving the user in a broken state again.)
Also, this would probably require admin/root privs, which most Nikola installs (hopefully) do not have.
Thanks! ❤️
Huge thanks for all the amazing Nikola goodness! I've been using and loving it for years!