RenderingOverview.rst (17622B)
1 .. _rendering-overview: 2 3 Rendering Overview 4 ================== 5 6 This document is an overview of the steps to render a webpage, and how HTML 7 gets transformed and broken down, step by step, into commands that can execute 8 on the GPU. 9 10 If you're coming into the graphics team with not a lot of background 11 in browsers, start here :) 12 13 .. contents:: 14 15 High level overview 16 ------------------- 17 18 .. image:: RenderingOverviewSimple.png 19 :width: 100% 20 21 Layout 22 ~~~~~~ 23 Starting at the left in the above image, we have a document 24 represented by a DOM - a Document Object Model. A Javascript engine 25 will execute JS code, either to make changes to the DOM, or to respond to 26 events generated by the DOM (or do both). 27 28 The DOM is a high level description and we don't know what to draw or 29 where until it is combined with a Cascading Style Sheet (CSS). 30 Combining these two and figuring out what, where and how to draw 31 things is the responsibility of the Layout team. The 32 DOM is converted into a hierarchical Frame Tree, which nests visual 33 elements (boxes). Each element points to some node in a Style Tree 34 that describes what it should look like -- color, transparency, etc. 35 The result is that now we know exactly what to render where, what goes 36 on top of what (layering and blending) and at what pixel coordinate. 37 This is the Display List. 38 39 The Display List is a light-weight data structure because it's shallow 40 -- it mostly points back to the Frame Tree. There are two problems 41 with this. First, we want to cross process boundaries at this point. 42 Everything up until now happens in a Content Process (of which there are 43 several). Actual GPU rendering happens in a GPU Process (on some 44 platforms). Second, everything up until now was written in C++; but 45 WebRender is written in Rust. Thus the shallow Display List needs to 46 be serialized in a completely self-contained binary blob that will 47 survive Interprocess Communication (IPC) and a language switch (C++ to 48 Rust). The result is the WebRender Display List. 49 50 WebRender 51 ~~~~~~~~~ 52 53 The GPU process receives the WebRender Display List blob and 54 de-serializes it into a Scene. This Scene contains more than the 55 strictly visible elements; for example, to anticipate scrolling, we 56 might have several paragraphs of text extending past the visible page. 57 58 For a given viewport, the Scene gets culled and stripped down to a 59 Frame. This is also where we start preparing data structures for GPU 60 rendering, for example getting some font glyphs into an atlas for 61 rasterizing text. 62 63 The final step takes the Frame and submits commands to the GPU to 64 actually render it. The GPU will execute the commands and composite 65 the final page. 66 67 Software 68 ~~~~~~~~ 69 70 The above is the new WebRender-enabled way to do things. But in the 71 schematic you'll note a second branch towards the bottom: this is the 72 legacy code path which does not use WebRender (nor Rust). In this 73 case, the Display List is converted into a Layer Tree. The purpose of 74 this Tree is to try and avoid having to re-render absolutely 75 everything when the page needs to be refreshed. For example, when 76 scrolling we should be able to redraw the page by mostly shifting 77 things around. However that requires those 'things' to still be around 78 from last time we drew the page. In other words, visual elements that 79 are likely to be static and reusable need to be drawn into their own 80 private "page" (a cache). Then we can recombine (composite) all of 81 these when redrawing the actual page. 82 83 Figuring out which elements would be good candidates for this, and 84 striking a balance between good performance versus excessive memory 85 use, is the purpose of the Layer Tree. Each 'layer' is a cached image 86 of some element(s). This logic also takes occlusion into account, eg. 87 don't allocate and render a layer for elements that are known to be 88 completely obscured by something in front of them. 89 90 Redrawing the page by combining the Layer Tree with any newly 91 rasterized elements is the job of the Compositor. 92 93 94 Even when a layer cannot be reused in its entirety, it is likely 95 that only a small part of it was invalidated. Thus there is an 96 elaborate system for tracking dirty rectangles, starting an update by 97 copying the area that can be salvaged, and then redrawing only what 98 cannot. 99 100 In fact, this idea can be extended to delta-tracking of display lists 101 themselves. Traversing the layout tree and building a display list is 102 also not cheap, so the code tries to partially invalidate and rebuild 103 the display list incrementally when possible. 104 This optimization is used both for non-WebRender and WebRender in 105 fact. 106 107 108 Asynchronous Panning And Zooming 109 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 110 Earlier we mentioned that a Scene might contain more elements than are 111 strictly necessary for rendering what's visible (the Frame). The 112 reason for that is Asynchronous Panning and Zooming, or APZ for short. 113 The browser will feel much more responsive if scrolling & zooming can 114 short-circuit all of these data transformations and IPC boundaries, 115 and instead directly update an offset of some layer and recomposite. 116 (Think of late-latching in a VR context) 117 118 This simple idea introduces a lot of complexity: how much extra do you 119 rasterize, and in which direction? How much memory can we afford? 120 What about Javascript that responds to scroll events and perhaps does 121 something 'interesting' with the page in return? What about nested 122 frames or nested scrollbars? What if we scroll so much that we go 123 past the boundaries of the Scene that we know about? 124 125 See :ref:`apz` for all that and more. 126 127 A Few More Details 128 ~~~~~~~~~~~~~~~~~~ 129 130 Here's another schematic which basically repeats the previous one, but 131 showing a little bit more detail. Note that the direction is reversed 132 -- the data flow starts at the right. Sorry about that :) 133 134 .. image:: RenderingOverviewDetail.png 135 :width: 100% 136 137 Some things to note: 138 139 - there are multiple content processes, currently 4 of them. This is 140 for security reasons (sandboxing), stability (isolate crashes) and 141 performance (multi-core machines); 142 - ideally each "webpage" would run in its own process for security; 143 this is being developed under the term 'fission'; 144 - there is only a single GPU process, if there is one at all; 145 some platforms have it as part of the Parent; 146 - not shown here is the Extension process that isolates WebExtensions; 147 - for non-WebRender, rasterization happens in the Content Process, and 148 we send entire Layers to the GPU/Compositor process (via shared 149 memory, only using actual IPC for its metadata like width & height); 150 - if the GPU process crashes (a bug or a driver issue) we can simply 151 restart it, resend the display list, and the browser itself doesn't crash; 152 - the browser UI is just another set of DOM+JS, albeit one that runs 153 with elevated privileges. That is, its JS can do things that 154 normal JS cannot. It lives in the Parent Process, which then uses 155 IPC to get it rendered, same as regular Content. (the IPC arrow also 156 goes to WebRender Display List but is omitted to reduce clutter); 157 - UI events get routed to APZ first, to minimize latency. By running 158 inside the GPU process, we may have access to data such 159 as rasterized clipping masks that enables finer grained hit testing; 160 - the GPU process talks back to the content process; in particular, 161 when APZ scrolls out of bounds, it asks Content to enlarge/shift the 162 Scene with a new "display port"; 163 - we still use the GPU when we can for compositing even in the 164 non-WebRender case; 165 166 167 WebRender In Detail 168 ------------------- 169 170 Converting a display list into GPU commands is broken down into a 171 number of steps and intermediate data structures. 172 173 174 .. image:: RenderingOverviewTrees.png 175 :width: 75% 176 :align: center 177 178 .. 179 180 *Each element in the picture tree points to exactly one node in the spatial 181 tree. Only a few of these links are shown for clarity (the dashed lines).* 182 183 The Picture Tree 184 ~~~~~~~~~~~~~~~~ 185 186 The incoming display list uses "stacking contexts". For example, to 187 render some text with a drop shadow, a display list will contain three 188 items: 189 190 - "enable shadow" with some parameters such as shadow color, blur size, and offset; 191 - the text item; 192 - "pop all shadows" to deactivate shadows; 193 194 WebRender will break this down into two distinct elements, or 195 "pictures". The first represents the shadow, so it contains a copy of the 196 text item, but modified to use the shadow's color, and to shift the 197 text by the shadow's offset. The second picture contains the original text 198 to draw on top of the shadow. 199 200 The fact that the first picture, the shadow, needs to be blurred, is a 201 "compositing" property of the picture which we'll deal with later. 202 203 Thus, the stack-based display list gets converted into a list of pictures 204 -- or more generally, a hierarchy of pictures, since items are nested 205 as per the original HTML. 206 207 Example visual elements are a TextRun, a LineDecoration, or an Image 208 (like a .png file). 209 210 Compared to 3D rendering, the picture tree is similar to a scenegraph: it's a 211 parent/child hierarchy of all the drawable elements that make up the "scene", in 212 this case the webpage. One important difference is that the transformations are 213 stored in a separate tree, the spatial tree. 214 215 The Spatial Tree 216 ~~~~~~~~~~~~~~~~ 217 218 The nodes in the spatial tree represent coordinate transforms. Every time the 219 DOM hierarchy needs child elements to be transformed relative to their parent, 220 we add a new Spatial Node to the tree. All those child elements will then point 221 to this node as their "local space" reference (aka coordinate frame). In 222 traditional 3D terms, it's a scenegraph but only containing transform nodes. 223 224 The nodes are called frames, as in "coordinate frame": 225 226 - a Reference Frame corresponds to a ``<div>``; 227 - a Scrolling Frame corresponds to a scrollable part of the page; 228 - a Sticky Frame corresponds to some fixed position CSS style. 229 230 Each element in the picture tree then points to a spatial node inside this tree, 231 so by walking up and down the tree we can find the absolute position of where 232 each element should render (traversing down) and how large each element needs to 233 be (traversing up). Originally the transform information was part of the 234 picture tree, as in a traditional scenegraph, but visual elements and their 235 transforms were split apart for technical reasons. 236 237 Some of these nodes are dynamic. A scroll-frame can obviously scroll, but a 238 Reference Frame might also use a property binding to enable a live link with 239 JavaScript, for dynamic updates of (currently) the transform and opacity. 240 241 Axis-aligned transformations (scales and translations) are considered "simple", 242 and are conceptually combined into a single "CoordinateSystem". When we 243 encounter a non-axis-aligned transform, we start a new CoordinateSystem. We 244 start in CoordinateSystem 0 at the root, and would bump this to CoordinateSystem 245 1 when we encounter a Reference Frame with a rotation or 3D transform, for 246 example. This would then be the CoordinateSystem index for all its children, 247 until we run into another (nested) non-simple transform, and so on. Roughly 248 speaking, as long as we're in the same CoordinateSystem, the transform stack is 249 simple enough that we have a reasonable chance of being able to flatten it. That 250 lets us directly rasterize text at its final scale for example, optimizing 251 away some of the intermediate pictures (offscreen textures). 252 253 The layout code positions elements relative to their parent. Thus to position 254 the element on the actual page, we need to walk the Spatial Tree all the way to 255 the root and apply each transform; the result is a ``LayoutToWorldTransform``. 256 257 One final step transforms from World to Device coordinates, which deals with 258 DPI scaling and such. 259 260 .. csv-table:: 261 :header: "WebRender term", "Rough analogy" 262 263 Spatial Tree, Scenegraph -- transforms only 264 Picture Tree, Scenegraph -- drawables only (grouping) 265 Spatial Tree Rootnode, World Space 266 Layout space, Local/Object Space 267 Picture, RenderTarget (sort of; see RenderTask below) 268 Layout-To-World transform, Local-To-World transform 269 World-To-Device transform, World-To-Clipspace transform 270 271 272 The Clip Tree 273 ~~~~~~~~~~~~~ 274 275 Finally, we also have a Clip Tree, which contains Clip Shapes. For 276 example, a rounded corner div will produce a clip shape, and since 277 divs can be nested, you end up with another tree. By pointing at a Clip Shape, 278 visual elements will be clipped against this shape plus all parent shapes above it 279 in the Clip Tree. 280 281 As with CoordinateSystems, a chain of simple 2D clip shapes can be collapsed 282 into something that can be handled in the vertex shader, at very little extra 283 cost. More complex clips must be rasterized into a mask first, which we then 284 sample from to ``discard`` in the pixel shader as needed. 285 286 In summary, at the end of scene building the display list turned into 287 a picture tree, plus a spatial tree that tells us what goes where 288 relative to what, plus a clip tree. 289 290 RenderTask Tree 291 ~~~~~~~~~~~~~~~ 292 293 Now in a perfect world we could simply traverse the picture tree and start 294 drawing things: one drawcall per picture to render its contents, plus one 295 drawcall to draw the picture into its parent. However, recall that the first 296 picture in our example is a "text shadow" that needs to be blurred. We can't 297 just rasterize blurry text directly, so we need a number of steps or "render 298 passes" to get the intended effect: 299 300 .. image:: RenderingOverviewBlurTask.png 301 :align: right 302 :height: 400px 303 304 - rasterize the text into an offscreen rendertarget; 305 - apply one or more downscaling passes until the blur radius is reasonable; 306 - apply a horizontal Gaussian blur; 307 - apply a vertical Gaussian blur; 308 - use the result as an input for whatever comes next, or blit it to 309 its final position on the page (or more generally, on the containing 310 parent surface/picture). 311 312 In the general case, which passes we need and how many of them depends 313 on how the picture is supposed to be composited (CSS filters, SVG 314 filters, effects) and its parameters (very large vs. small blur 315 radius, say). 316 317 Thus, we walk the picture tree and build a render task tree: each high 318 level abstraction like "blur me" gets broken down into the necessary 319 render passes to get the effect. The result is again a tree because a 320 render pass can have multiple input dependencies (eg. blending). 321 322 (Cfr. games, this has echoes of the Frostbite Framegraph in that it 323 dynamically builds up a renderpass DAG and dynamically allocates storage 324 for the outputs). 325 326 If there are complicated clip shapes that need to be rasterized first, 327 so their output can be sampled as a texture for clip/discard 328 operations, that would also end up in this tree as a dependency... (I think?). 329 330 Once we have the entire tree of dependencies, we analyze it to see 331 which tasks can be combined into a single pass for efficiency. We 332 ping-pong rendertargets when we can, but sometimes the dependencies 333 cut across more than one level of the rendertask tree, and some 334 copying is necessary. 335 336 Once we've figured out the passes and allocated storage for anything 337 we wish to persist in the texture cache, we finally start rendering. 338 339 When rasterizing the elements into the Picture's offscreen texture, we'd 340 position them by walking the transform hierarchy as far up as the picture's 341 transform node, resulting in a ``Layout To Picture`` transform. The picture 342 would then go onto the page using a ``Picture To World`` coordinate transform. 343 344 Caching 345 ``````` 346 347 Just as with layers in the software rasterizer, it is not always necessary to 348 redraw absolutely everything when parts of a document change. The webrender 349 equivalent of layers is Slices -- a grouping of pictures that are expected to 350 render and update together. Slices are automatically created based on 351 heuristics and layout hints/flags. 352 353 Implementation wise, slices reuse a lot of the existing machinery for Pictures; 354 in fact they're implemented as a "Virtual picture" of sorts. The similarities 355 make sense: both need to allocate offscreen textures in a cache, both will 356 position and render all their children into it, and both then draw themselves 357 into their parent as part of the parent's draw. 358 359 If a slice isn't expected to change much, we give it a TileCacheInstance. It is 360 itself made up of Tiles, where each tile will track what's in it, what's 361 changing, and if it needs to be invalidated and redrawn or not as a result. 362 Thus the "damage" from changes can be localized to single tiles, while we 363 salvage the rest of the cache. If tiles keep seeing a lot of invalidations, 364 they will recursively divide themselves in a quad-tree like structure to try and 365 localize the invalidations. (And conversely, they'll recombine children if 366 nothing is invalidating them "for a while"). 367 368 Interning 369 ````````` 370 371 To spot invalidated tiles, we need a fast way to compare its contents from the 372 previous frame with the current frame. To speed this up, we use interning; 373 similar to string-interning, this means that each ``TextRun``, ``Decoration``, 374 ``Image`` and so on is registered in a repository (a ``DataStore``) and 375 consequently referred to by its unique ID. Cache contents can then be encoded as a 376 list of IDs (one such list per internable element type). Diffing is then just a 377 fast list comparison. 378 379 380 Callbacks 381 ````````` 382 GPU text rendering assumes that the individual font-glyphs are already 383 available in a texture atlas. Likewise SVG is not being rendered on 384 the GPU. Both inputs are prepared during scene building; glyph 385 rasterization via a thread pool from within Rust itself, and SVG via 386 opaque callbacks (back to C++) that produce blobs.