CHANGES.rst (10274B)
1 Change Log 2 ---------- 3 4 1.1 5 ~~~ 6 7 UNRELEASED 8 9 Breaking changes: 10 11 * Drop support for Python 3.3. (#358) 12 * Drop support for Python 3.4. (#421) 13 14 Deprecations: 15 16 * Deprecate the ``html5lib`` sanitizer (``html5lib.serialize(sanitize=True)`` and 17 ``html5lib.filters.sanitizer``). We recommend users migrate to `Bleach 18 <https://github.com/mozilla/bleach>`. Please let us know if Bleach doesn't suffice for your 19 use. (#443) 20 21 Other changes: 22 23 * Try to import from ``collections.abc`` to remove DeprecationWarning and ensure 24 ``html5lib`` keeps working in future Python versions. (#403) 25 * Drop optional ``datrie`` dependency. (#442) 26 27 28 1.0.1 29 ~~~~~ 30 31 Released on December 7, 2017 32 33 Breaking changes: 34 35 * Drop support for Python 2.6. (#330) (Thank you, Hugo, Will Kahn-Greene!) 36 * Remove ``utils/spider.py`` (#353) (Thank you, Jon Dufresne!) 37 38 Features: 39 40 * Improve documentation. (#300, #307) (Thank you, Jon Dufresne, Tom Most, 41 Will Kahn-Greene!) 42 * Add iframe seamless boolean attribute. (Thank you, Ritwik Gupta!) 43 * Add itemscope as a boolean attribute. (#194) (Thank you, Jonathan Vanasco!) 44 * Support Python 3.6. (#333) (Thank you, Jon Dufresne!) 45 * Add CI support for Windows using AppVeyor. (Thank you, John Vandenberg!) 46 * Improve testing and CI and add code coverage (#323, #334), (Thank you, Jon 47 Dufresne, John Vandenberg, Sam Sneddon, Will Kahn-Greene!) 48 * Semver-compliant version number. 49 50 Bug fixes: 51 52 * Add support for setuptools < 18.5 to support environment markers. (Thank you, 53 John Vandenberg!) 54 * Add explicit dependency for six >= 1.9. (Thank you, Eric Amorde!) 55 * Fix regexes to work with Python 3.7 regex adjustments. (#318, #379) (Thank 56 you, Benedikt Morbach, Ville Skyttä, Mark Vasilkov!) 57 * Fix alphabeticalattributes filter namespace bug. (#324) (Thank you, Will 58 Kahn-Greene!) 59 * Include license file in generated wheel package. (#350) (Thank you, Jon 60 Dufresne!) 61 * Fix annotation-xml typo. (#339) (Thank you, Will Kahn-Greene!) 62 * Allow uppercase hex chararcters in CSS colour check. (#377) (Thank you, 63 Komal Dembla, Hugo!) 64 65 66 1.0 67 ~~~ 68 69 Released and unreleased on December 7, 2017. Badly packaged release. 70 71 72 0.999999999/1.0b10 73 ~~~~~~~~~~~~~~~~~~ 74 75 Released on July 15, 2016 76 77 * Fix attribute order going to the tree builder to be document order 78 instead of reverse document order(!). 79 80 81 0.99999999/1.0b9 82 ~~~~~~~~~~~~~~~~ 83 84 Released on July 14, 2016 85 86 * **Added ordereddict as a mandatory dependency on Python 2.6.** 87 88 * Added ``lxml``, ``genshi``, ``datrie``, ``charade``, and ``all`` 89 extras that will do the right thing based on the specific 90 interpreter implementation. 91 92 * Now requires the ``mock`` package for the testsuite. 93 94 * Cease supporting DATrie under PyPy. 95 96 * **Remove PullDOM support, as this hasn't ever been properly 97 tested, doesn't entirely work, and as far as I can tell is 98 completely unused by anyone.** 99 100 * Move testsuite to ``py.test``. 101 102 * **Fix #124: move to webencodings for decoding the input byte stream; 103 this makes html5lib compliant with the Encoding Standard, and 104 introduces a required dependency on webencodings.** 105 106 * **Cease supporting Python 3.2 (in both CPython and PyPy forms).** 107 108 * **Fix comments containing double-dash with lxml 3.5 and above.** 109 110 * **Use scripting disabled by default (as we don't implement 111 scripting).** 112 113 * **Fix #11, avoiding the XSS bug potentially caused by serializer 114 allowing attribute values to be escaped out of in old browser versions, 115 changing the quote_attr_values option on serializer to take one of 116 three values, "always" (the old True value), "legacy" (the new option, 117 and the new default), and "spec" (the old False value, and the old 118 default).** 119 120 * **Fix #72 by rewriting the sanitizer to apply only to treewalkers 121 (instead of the tokenizer); as such, this will require amending all 122 callers of it to use it via the treewalker API.** 123 124 * **Drop support of charade, now that chardet is supported once more.** 125 126 * **Replace the charset keyword argument on parse and related methods 127 with a set of keyword arguments: override_encoding, transport_encoding, 128 same_origin_parent_encoding, likely_encoding, and default_encoding.** 129 130 * **Move filters._base, treebuilder._base, and treewalkers._base to .base 131 to clarify their status as public.** 132 133 * **Get rid of the sanitizer package. Merge sanitizer.sanitize into the 134 sanitizer.htmlsanitizer module and move that to sanitizer. This means 135 anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no 136 code changes.** 137 138 * **Rename treewalkers.lxmletree to .etree_lxml and 139 treewalkers.genshistream to .genshi to have a consistent API.** 140 141 * Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, 142 utils) to be underscore prefixed to clarify their status as private. 143 144 145 0.9999999/1.0b8 146 ~~~~~~~~~~~~~~~ 147 148 Released on September 10, 2015 149 150 * Fix #195: fix the sanitizer to drop broken URLs (it threw an 151 exception between 0.9999 and 0.999999). 152 153 154 0.999999/1.0b7 155 ~~~~~~~~~~~~~~ 156 157 Released on July 7, 2015 158 159 * Fix #189: fix the sanitizer to allow relative URLs again (as it did 160 prior to 0.9999/1.0b5). 161 162 163 0.99999/1.0b6 164 ~~~~~~~~~~~~~ 165 166 Released on April 30, 2015 167 168 * Fix #188: fix the sanitizer to not throw an exception when sanitizing 169 bogus data URLs. 170 171 172 0.9999/1.0b5 173 ~~~~~~~~~~~~ 174 175 Released on April 29, 2015 176 177 * Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how 178 this sounds, this has no known security implications. No known version 179 of IE (5.5 to current), Firefox (3 to current), Safari (6 to current), 180 Chrome (1 to current), or Opera (12 to current) will run any script 181 provided in these attributes. 182 183 * Pass error message to the ParseError exception in strict parsing mode. 184 185 * Allow data URIs in the sanitizer, with a whitelist of content-types. 186 187 * Add support for Python implementations that don't support lone 188 surrogates (read: Jython). Fixes #2. 189 190 * Remove localization of error messages. This functionality was totally 191 unused (and untested that everything was localizable), so we may as 192 well follow numerous browsers in not supporting translating technical 193 strings. 194 195 * Expose treewalkers.pprint as a public API. 196 197 * Add a documentEncoding property to HTML5Parser, fix #121. 198 199 200 0.999 201 ~~~~~ 202 203 Released on December 23, 2013 204 205 * Fix #127: add work-around for CPython issue #20007: .read(0) on 206 http.client.HTTPResponse drops the rest of the content. 207 208 * Fix #115: lxml treewalker can now deal with fragments containing, at 209 their root level, text nodes with non-ASCII characters on Python 2. 210 211 212 0.99 213 ~~~~ 214 215 Released on September 10, 2013 216 217 * No library changes from 1.0b3; released as 0.99 as pip has changed 218 behaviour from 1.4 to avoid installing pre-release versions per 219 PEP 440. 220 221 222 1.0b3 223 ~~~~~ 224 225 Released on July 24, 2013 226 227 * Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any 228 implementation using it should be moved to 229 ``NonRecursiveTreeWalker``, as everything bundled with html5lib has 230 for years. 231 232 * Fix #67 so that ``BufferedStream`` to correctly returns a bytes 233 object, thereby fixing any case where html5lib is passed a 234 non-seekable RawIOBase-like object. 235 236 237 1.0b2 238 ~~~~~ 239 240 Released on June 27, 2013 241 242 * Removed reordering of attributes within the serializer. There is now 243 an ``alphabetical_attributes`` option which preserves the previous 244 behaviour through a new filter. This allows attribute order to be 245 preserved through html5lib if the tree builder preserves order. 246 247 * Removed ``dom2sax`` from DOM treebuilders. It has been replaced by 248 ``treeadapters.sax.to_sax`` which is generic and supports any 249 treewalker; it also resolves all known bugs with ``dom2sax``. 250 251 * Fix treewalker assertions on hitting bytes strings on 252 Python 2. Previous to 1.0b1, treewalkers coped with mixed 253 bytes/unicode data on Python 2; this reintroduces this prior 254 behaviour on Python 2. Behaviour is unchanged on Python 3. 255 256 257 1.0b1 258 ~~~~~ 259 260 Released on May 17, 2013 261 262 * Implementation updated to implement the `HTML specification 263 <http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May 264 2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867). 265 266 * Python 3.2+ supported in a single codebase using the ``six`` library. 267 268 * Removed support for Python 2.5 and older. 269 270 * Removed the deprecated Beautiful Soup 3 treebuilder. 271 ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that 272 since it doesn't support namespaces, foreign content like SVG and 273 MathML is parsed incorrectly. 274 275 * Removed ``simpletree`` from the package. The default tree builder is 276 now ``etree`` (using the ``xml.etree.cElementTree`` implementation if 277 available, and ``xml.etree.ElementTree`` otherwise). 278 279 * Removed the ``XHTMLSerializer`` as it never actually guaranteed its 280 output was well-formed XML, and hence provided little of use. 281 282 * Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no 283 longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will 284 return the default DOM treebuilder, which uses ``xml.dom.minidom``. 285 286 * Optional heuristic character encoding detection now based on 287 ``charade`` for Python 2.6 - 3.3 compatibility. 288 289 * Optional ``Genshi`` treewalker support fixed. 290 291 * Many bugfixes, including: 292 293 * #33: null in attribute value breaks XML AttValue; 294 295 * #4: nested, indirect descendant, <button> causes infinite loop; 296 297 * `Google Code 215 298 <http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly 299 detect seekable streams; 300 301 * `Google Code 206 302 <http://code.google.com/p/html5lib/issues/detail?id=206>`_: add 303 support for <video preload=...>, <audio preload=...>; 304 305 * `Google Code 205 306 <http://code.google.com/p/html5lib/issues/detail?id=205>`_: add 307 support for <video poster=...>; 308 309 * `Google Code 202 310 <http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode 311 file breaks InputStream. 312 313 * Source code is now mostly PEP 8 compliant. 314 315 * Test harness has been improved and now depends on ``nose``. 316 317 * Documentation updated and moved to https://html5lib.readthedocs.io/. 318 319 320 0.95 321 ~~~~ 322 323 Released on February 11, 2012 324 325 326 0.90 327 ~~~~ 328 329 Released on January 17, 2010 330 331 332 0.11.1 333 ~~~~~~ 334 335 Released on June 12, 2008 336 337 338 0.11 339 ~~~~ 340 341 Released on June 10, 2008 342 343 344 0.10 345 ~~~~ 346 347 Released on October 7, 2007 348 349 350 0.9 351 ~~~ 352 353 Released on March 11, 2007 354 355 356 0.2 357 ~~~ 358 359 Released on January 8, 2007