ranking.rst (12579B)
1 ======= 2 Ranking 3 ======= 4 5 Before results appear in the UrlbarView, they are fetched from providers. 6 7 Each `UrlbarProvider <https://firefox-source-docs.mozilla.org/browser/urlbar/overview.html#urlbarprovider>`_ 8 implements its own internal ranking and returns sorted results. 9 10 Externally all the results are ranked by the `UrlbarMuxer <https://searchfox.org/mozilla-central/source/browser/components/urlbar/UrlbarMuxerStandard.sys.mjs>`_ 11 according to an hardcoded list of groups and sub-groups. 12 13 .. NOTE:: Preferences can influence the groups order, for example by putting 14 Firefox Suggest before Search Suggestions. 15 16 The Places provider returns history and bookmark results using an internal 17 ranking algorithm called **Frecency**. 18 19 20 Frecency implementation 21 ======================= 22 23 Frecency is a term derived from `frequency` and `recency`, its scope is to 24 provide a ranking algorithm that gives importance both to how often a page is 25 accessed and when it was last visited. Additionally, it accounts for the type of 26 each visit through a bonus system. 27 28 Frecency uses a double exponential decay. 29 30 Given a url, we identify a reference time, t\ :sub:`ref`, that will be either 31 the last visit date, or the last bookmark addition date (for pages not having 32 visits). 33 34 All the dates we use will be expressed as the number of days from the epoch. 35 36 Since calculating a score from all of the visits every time a new visit is added 37 would be expensive, a sample of the last 38 ``places.frecency.pages.numSampledVisits`` visits is used. 39 40 For each visit a *weight* is determined out of four buckets: 41 42 1. Very High (``places.frecency.pages.veryHighWeight``): This is used when a 43 page is in the high bucket and has interesting interaction data. 44 2. High (``places.frecency.pages.highWeight``): This is for bookmarked or typed 45 pages. Due to the current definition of typed, this includes URLs clicked on 46 the Firefox UI (not the web), except searches. 47 3. Medium (``places.frecency.pages.mediumWeight``): This is for normal browsing, 48 like clicking links on pages or downloads. 49 4. Low (``places.frecency.pages.lowWeight``): This is for non-typed redirect 50 sources, visits across HTML frames (non top level) or reloads. 51 52 The score for the single visit is then calculated as: 53 54 .. math:: 55 score_{visit} = weight_{visit} * e^{-\lambda(t_{ref}-t_{visit})} 56 57 With half-life defined as: 58 59 .. math:: 60 \lambda = \frac{\ln 2}{30} 61 62 The total score is the average score, multiplied by the total number of visits: 63 64 .. math:: 65 score_{total} = \frac{\sum score_{visit}}{samplecount} * visitcount 66 67 To keep into account recent, this score is decayed daily using the already 68 defined half-life. To avoid having to multiply each score daily by an 69 exponential, we instead store in the database (`moz_places`) the number of days 70 from epoch after which the calculated score would become 1, by decaying it every 71 day. 72 73 .. math:: 74 frecency = t_{ref} + \frac{\ln(score)}{\lambda} 75 76 Interactions 77 ------------ 78 79 Page view duration, keypress count, and scroll distance are tracked when a page 80 loads. All data is stored locally and never transmitted. 81 82 A page visit is classified as **interesting** if either criterion is met: 83 84 - Page view time ≥ ``places.frecency.pages.interactions.viewTimeSeconds``. 85 - Page view time ≥ ``places.frecency.pages.interactions.viewTimeIfManyKeypressesSeconds`` 86 AND keypress count ≥ ``places.frecency.pages.interactions.manyKeypresses``. 87 88 When a visit is classified as interesting, its scoring bucket is promoted. For 89 example, a visit initially categorized in the medium bucket is promoted to the 90 high bucket if it has an interesting interaction. 91 92 Interactions are stored separately from visits. To associate them, the query 93 checks whether a visit was recorded within ``places.frecency.pages.interactions.maxVisitGapSeconds`` 94 of a recorded interaction. If a matching visit exists within this time window, 95 the interaction is paired with it. Otherwise, the interaction becomes a 96 **virtual visit**. 97 98 A single visit can generate multiple interactions. When a user switches away 99 from a tab, the current interaction ends if the user does not return within 100 ``browser.places.interactions.breakupIfNoUpdatesForSeconds``. Returning to the 101 tab after this threshold creates a new interaction, even though the original 102 visit remains the same. 103 104 How frecency for a page is calculated 105 ------------------------------------- 106 107 .. mermaid:: 108 :align: center 109 :caption: Frecency calculation flow 110 111 flowchart TD 112 start([Page URL Input]) 113 subgraph gather ["Gather"] 114 visits[Gather recent visits<br/>up to sample limit] 115 interactions[Gather interesting<br/>interactions] 116 pair[Match interactions<br/>with visits by timestamp] 117 virtual{Unpaired<br/>interaction?} 118 generate[Create virtual visit<br/>from interaction] 119 combine[Combine visits<br/>with virtual visits] 120 hasVisits{Any visits<br/>collected?} 121 isBookmarked{Is URL<br/>bookmarked?} 122 bookmarkFallback[High<br/>Bucket] 123 noData[frecency = 0] 124 end 125 subgraph classify ["Classify"] 126 visitType{Visit<br/>Type?} 127 bookmark_typed[Bookmark/Typed] 128 regular_click[Regular Link] 129 redirect_auto[Redirect/Sponsored] 130 bookmarkInteraction{Interesting<br/>interaction?} 131 regularInteraction{Interesting<br/>interaction?} 132 score1[Very High<br/>Bucket] 133 score2[High<br/>Bucket] 134 score3[Medium<br/>Bucket] 135 score4[Low<br/>Bucket] 136 end 137 calcDecay[Apply exponential decay<br/>based on visit age] 138 aggregate[Sum decayed scores and<br/> normalize by visit count] 139 project[frecency = days until<br/>score decays to 1] 140 %% Main 141 start --> visits & interactions 142 visits --> pair 143 interactions --> pair 144 pair --> virtual 145 virtual -->|Yes| generate 146 virtual -->|No| combine 147 generate --> combine 148 combine --> hasVisits 149 hasVisits -->|Yes| visitType 150 hasVisits -->|No| isBookmarked 151 isBookmarked -->|Yes| bookmarkFallback 152 isBookmarked -->|No| noData 153 bookmarkFallback --> calcDecay 154 %% Classification 155 visitType -->|Bookmark/Typed| bookmark_typed 156 visitType -->|Regular| regular_click 157 visitType -->|Redirect| redirect_auto 158 bookmark_typed --> bookmarkInteraction 159 regular_click --> regularInteraction 160 redirect_auto --> score4 161 bookmarkInteraction -->|Yes| score1 162 bookmarkInteraction -->|No| score2 163 regularInteraction -->|Yes| score2 164 regularInteraction -->|No| score3 165 %% Decay 166 score1 & score2 & score3 & score4 --> calcDecay 167 calcDecay --> aggregate 168 %% Final 169 aggregate --> project 170 %% Styling 171 style score1 fill:#4caf50,stroke:#2e7d32,stroke-width:3px,color:#fff 172 style score2 fill:#8bc34a,stroke:#558b2f,stroke-width:2px 173 style score3 fill:#ffc107,stroke:#f57f17,stroke-width:2px 174 style bookmarkFallback fill:#8bc34a,stroke:#558b2f,stroke-width:2px 175 style score4 fill:#ff9800,stroke:#e65100,stroke-width:2px 176 style project fill:#2196f3,stroke:#0d47a1,stroke-width:3px,color:#fff 177 style noData fill:#2196f3,stroke:#0d47a1,stroke-width:3px,color:#fff 178 179 1. **Gather visits and interactions:** For a given URL, collect recent visits 180 (up to the sample limit) and any interesting interactions. 181 2. **Match and create virtual visits:** Pair each interesting interaction with 182 its corresponding visit by timestamp. If an interaction has no paired visit, 183 create a "virtual visit" from that interaction. Combine all actual visits 184 with these virtual visits for scoring. 185 3. **Check visits exist:** If no virtual or non-virtual visits were collected, 186 check whether the URL is bookmarked. If bookmarked, assign it to the High 187 bucket, otherwise set frecency to 0. 188 4. **Classify visits into buckets:** Categorize each visit based on its type and 189 whether it has an interesting interaction. 190 5. **Apply exponential decay:** Reduce each visit's score based on its age 191 using the time difference between the most recent visit in the sample vs. 192 the visit timestamp. 193 6. **Aggregate scores:** Sum all decayed scores and normalize by dividing by the 194 total visit count to get an average decayed score. 195 7. **Project to frecency value:** Convert the aggregated score into a timestamp 196 representing the number of days from the Unix epoch until the score would 197 decay to 1. This final value is the store frecency value. 198 199 When frecency for a page is calculated 200 -------------------------------------- 201 202 Operations that may influence the frecency score are: 203 204 * Adding visits 205 * Removing visits 206 * Adding bookmarks 207 * Removing bookmarks 208 * Changing the url of a bookmark 209 * An interesting interaction is recorded 210 211 Frecency is recalculated: 212 213 * Immediately, when a new visit is added. The user expectation here is that the 214 page appears in search results after being visited. This is also valid for 215 any History API that allows to add visits. 216 * In background on idle times, in any other case. In most cases having a 217 temporarily stale value is not a problem, the main concern would be privacy 218 when removing history of a page, but removing whole history will either 219 completely remove the page or, if it's bookmarked, it will still be relevant. 220 In this case, when a change influencing frecency happens, the ``recalc_frecency`` 221 database field for the page is set to ``1``. 222 223 Recalculation is done by the `PlacesFrecencyRecalculator <https://searchfox.org/mozilla-central/source/toolkit/components/places/PlacesFrecencyRecalculator.sys.mjs>`_ module. 224 The Recalculator is notified when ``PlacesUtils.history.shouldStartFrecencyRecalculation`` 225 value changes from false to true, that means there's values to recalculate. 226 A DeferredTask is armed, that will look for a user idle opportunity 227 in the next 5 minutes, otherwise it will run when that time elapses. 228 Once all the outdated values have been recalculated 229 ``PlacesUtils.history.shouldStartFrecencyRecalculation`` is set back to false 230 until the next operation invalidating a frecency. 231 The recalculation task is also armed on the ``idle-daily`` notification. 232 233 When the task is executed, it recalculates frecency of a chunk of pages. If 234 there are more pages left to recalculate, the task is re-armed. After frecency 235 of a page is recalculated, its ``recalc_frecency`` field is set back to ``0``. 236 237 Adaptive Input History 238 ====================== 239 240 Input History (also known as Adaptive History) is a feature that allows to 241 find urls that the user previously picked. To do so, it associates search strings 242 with picked urls. 243 244 Adaptive results are usually presented before frecency derived results, making 245 them appear as having an infinite frecency. 246 247 When the user types a given string, and picks a result from the address bar, that 248 relation is stored and increases a use_count field for the given string. 249 The use_count field asymptotically approaches a max of ``10`` (the update is 250 done as ``use_count * .9 + 1``). 251 252 On querying, all the search strings that start with the input string are matched, 253 a rank is calculated per each page as ``ROUND(MAX(use_count) * (1 + (input = :search_string)), 1)``, 254 so that results perfectly matching the search string appear at the top. 255 Results with the same rank are additionally sorted by descending frecency. 256 257 On daily idles, when frecency is decayed, also input history gets decayed, in 258 particular the use_count field is multiplied by a decay rate of ``0.975``. 259 After decaying, any entry that has a ``use_count < 0.975^90 (= 0.1)`` is removed, 260 thus entries are removed if unused for 90 days. 261 262 Why legacy frecency scoring was replaced 263 ======================================== 264 265 :doc:`ranking-legacy` had the following issues: 266 267 1. The score depended on the time at which it is calculated, making the user 268 experience more unstable. For example removing a bookmark (affects the score 269 negatively) will recalculate the score to a value higher than another 270 untouched bookmarked url, just because it was recalculated later. 271 2. All the scores must be decayed every day to keep recency into account. It is 272 an expensive operation. 273 3. It has far too many coefficients, making it impossible to study for 274 optimizing those coefficients. 275 4. Changing the coefficients doesn't recalculate all the scores, making 276 experimentation more complicated.