Skip to content

Commit 5949b6c

Browse files
committed
fix: do not treat escaped <a> elements as hyperlinks in HTM-053
Fix the regex used to report "file:" hyperlinks as `HTM-053` (informative) to only consider HTML elements and not plain text. This regex-based parsing is still brittle, but we'll refactor this whole package later. For now this simple fix will do. Fixes #1182
1 parent 5ee72e7 commit 5949b6c

File tree

7 files changed

+56
-1
lines changed

7 files changed

+56
-1
lines changed

src/main/java/com/adobe/epubcheck/ctc/FileLinkSearch.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
* ========================================================<br/>
2121
*/
2222
public class FileLinkSearch extends TextSearch {
23-
private static final Pattern fileLinkPattern = Pattern.compile("href=[\"']file://");
23+
private static final Pattern fileLinkPattern = Pattern.compile("<a\\s([^<>]*\\s)?href=[\"']file://");
2424

2525
public FileLinkSearch(EPUBVersion version, ZipFile zip, Report report)
2626
{

src/test/resources/epub3/content-publication.feature

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,12 @@ Feature: EPUB 3 ▸ Content Documents ▸ Full Publication Checks
6666
When checking EPUB 'content-xhtml-link-to-local-file-valid'
6767
Then info HTM-053 is reported
6868
And no errors or warnings are reported
69+
70+
Scenario: Do not report escaped hyperlinks to resources in the local file system
71+
See issue #1182
72+
When checking EPUB 'content-xhtml-link-to-local-file-escaped-valid'
73+
Then info HTM-053 is reported 0 times
74+
And no errors or warnings are reported
6975

7076
Scenario: Report a hyperlink to a resource missing from the publication
7177
When checking EPUB 'content-xhtml-link-to-missing-doc-error'
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
<!DOCTYPE html>
2+
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
3+
<head>
4+
<meta charset="utf-8"/>
5+
<title>Minimal EPUB</title>
6+
</head>
7+
<body>
8+
<h1>Loomings</h1>
9+
<p>Call me Ishmael.</p>
10+
&lt;a class="external" href="file:///C:/path/file.pdf"&gt;link to local file&lt;/a&lt;
11+
</body>
12+
</html>
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
<!DOCTYPE html>
2+
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en" lang="en">
3+
<head>
4+
<meta charset="utf-8"/>
5+
<title>Minimal Nav</title>
6+
</head>
7+
<body>
8+
<nav epub:type="toc">
9+
<ol>
10+
<li><a href="content_001.xhtml">content 001</a></li>
11+
</ol>
12+
</nav>
13+
</body>
14+
</html>
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" xml:lang="en" unique-identifier="q">
3+
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
4+
<dc:title id="title">Minimal EPUB 3.0</dc:title>
5+
<dc:language>en</dc:language>
6+
<dc:identifier id="q">NOID</dc:identifier>
7+
<meta property="dcterms:modified">2017-06-14T00:00:01Z</meta>
8+
</metadata>
9+
<manifest>
10+
<item id="content_001" href="content_001.xhtml" media-type="application/xhtml+xml"/>
11+
<item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
12+
</manifest>
13+
<spine>
14+
<itemref idref="content_001" />
15+
</spine>
16+
</package>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
<?xml version="1.0" encoding="UTF-8" ?>
2+
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
3+
<rootfiles>
4+
<rootfile full-path="EPUB/package.opf" media-type="application/oebps-package+xml"/>
5+
</rootfiles>
6+
</container>
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
application/epub+zip

0 commit comments

Comments
 (0)