tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

ranking-legacy.rst (9479B)


      1 ================
      2 Ranking (Legacy)
      3 ================
      4 
      5 .. NOTE:: This documentation is kept for historical purposes.
      6  The frecency algorithm was changed in Firefox 147. For current documentation,
      7  see `ranking.rst <https://firefox-source-docs.mozilla.org/browser/urlbar/ranking.html>`_
      8 
      9 
     10 Before results appear in the UrlbarView, they are fetched from providers.
     11 
     12 Each `UrlbarProvider <https://firefox-source-docs.mozilla.org/browser/urlbar/overview.html#urlbarprovider>`_
     13 implements its own internal ranking and returns sorted results.
     14 
     15 Externally all the results are ranked by the `UrlbarMuxer <https://searchfox.org/mozilla-central/source/browser/components/urlbar/UrlbarMuxerStandard.sys.mjs>`_
     16 according to a hardcoded list of groups and sub-groups.
     17 
     18 .. NOTE:: Preferences can influence the groups order, for example by putting
     19  Firefox Suggest before Search Suggestions.
     20 
     21 The Places provider, responsible to return history and bookmark results, uses
     22 an internal ranking algorithm called Frecency.
     23 
     24 Frecency implementation
     25 =======================
     26 
     27 Frecency is a term derived from `frequency` and `recency`, its scope is to provide a
     28 ranking algorithm that gives importance both to how often a page is accessed and
     29 when it was last visited.
     30 Additionally, it accounts for the type of each visit through a bonus system.
     31 
     32 To account for `recency`, a bucketing system is implemented.
     33 If a page has been visited later than the bucket cutoff, it gets the weight
     34 associated with that bucket:
     35 
     36 - Up to 4 days old - weight 100 - ``places.frecency.firstBucketCutoff/Weight``
     37 - Up to 14 days old - weight 70 - ``places.frecency.secondBucketCutoff/Weight``
     38 - Up to 31 days old - weight 50 - ``places.frecency.thirdBucketCutoff/Weight``
     39 - Up to 90 days old - weight 30 - ``places.frecency.fourthBucketCutoff/Weight``
     40 - Anything else - weight 10 - ``places.frecency.defaultBucketWeight``
     41 
     42 To account for `frequency`, the total number of visits to a page is used to
     43 calculate the final score.
     44 
     45 The type of each visit is taken into account using specific bonuses:
     46 
     47 Default bonus
     48  Any unknown type gets a default bonus. This is expected to be unused.
     49  Pref ``places.frecency.defaultVisitBonus`` current value: 0.
     50 Embed
     51  Used for embedded/framed visits not due to user actions. These visits today
     52  are stored in memory and never participate to frecency calculation.
     53  Thus this is currently unused.
     54  Pref ``places.frecency.embedVisitBonus`` current value: 0.
     55 Framed Link
     56  Used for cross-frame visits due to user action.
     57  Pref ``places.frecency.framedLinkVisitBonus`` current value: 0.
     58 Download
     59  Used for download visits. It’s important to support link coloring for these
     60  visits, but they are not necessarily useful address bar results (the Downloads
     61  view can do a better job with  these), so their frecency can be low.
     62  Pref ``places.frecency.downloadVisitBonus`` current value: 0.
     63 Reload
     64  Used for reload visits (refresh same page). Low because it should not be
     65  possible to influence frecency by multiple reloads.
     66  Pref ``places.frecency.reloadVisitBonus`` current value: 0.
     67 Redirect Source
     68  Used when the page redirects to another one.
     69  It’s a low value because we give more importance to the final destination,
     70  that is what the user actually visits, especially for permanent redirects.
     71  Pref ``places.frecency.redirectSourceVisitBonus`` current value: 25.
     72 Temporary Redirect
     73  Used for visits resulting from a temporary redirect (HTTP 307).
     74  Pref ``places.frecency.tempRedirectVisitBonus`` current value: 40.
     75 Permanent Redirect
     76  Used for visits resulting from a permanent redirect (HTTP 301). This is the
     77  new supposed destination for a url, thus the bonus is higher than temporary.
     78  In this case it may be advisable to just pick the bonus for the source visit.
     79  Pref ``places.frecency.permRedirectVisitBonus`` current value: 50.
     80 Bookmark
     81  Used for visits generated from bookmark views.
     82  Pref ``places.frecency.bookmarkVisitBonus`` current value: 75.
     83 Link
     84  Used for normal visits, for example when clicking on a link.
     85  Pref ``places.frecency.linkVisitBonus`` current value: 100.
     86 Typed
     87  Intended to be used for pages typed by the user, in reality it is used when
     88  the user picks a url from the UI (history views or the Address Bar).
     89  Pref ``places.frecency.typedVisitBonus`` current value: 2000.
     90 
     91 The above bonuses are applied to visits, in addition to that there are also a
     92 few bonuses applied in case a page is not visited at all, both of these bonuses
     93 can be applied at the same time:
     94 
     95 Unvisited bookmarked page
     96  Used for pages that are bookmarked but unvisited.
     97  Pref ``places.frecency.unvisitedBookmarkBonus`` current value: 140.
     98 Unvisited typed page
     99  Used for pages that were typed and now are bookmarked (otherwise they would
    100  be orphans).
    101  Pref ``places.frecency.unvisitedTypedBonus`` current value: 200.
    102 
    103 Two special frecency values are also defined:
    104 
    105 - ``-1`` represents a just inserted entry in the database, whose score has not
    106  been calculated yet.
    107 - ``0`` represents an entry for which a new value should not be calculated,
    108  because it has a poor user value (e.g. place: queries) among search results.
    109 
    110 Finally, because calculating a score from all of the visits every time a new
    111 visit is added would be expensive, only a sample of the last 10
    112 (pref ``places.frecency.numVisits``) visits is used.
    113 
    114 How frecency for a page is calculated
    115 -------------------------------------
    116 
    117 .. mermaid::
    118    :align: center
    119    :caption: Frecency calculation flow
    120 
    121    flowchart TD
    122        start[URL]
    123        a0{Has visits?}
    124        a1[Get last 10 visit]
    125        a2[bonus = unvisited_bonus + bookmarked + typed]
    126        a3{bonus > 0?}
    127        end0[Frecency = 0]
    128        end1["frecency = age_bucket_weight * (bonus / 100)"]
    129        a4[Sum points of all sampled visits]
    130        a5{points > 0?}
    131        end2[frecency = -1]
    132        end3["Frecency = visit_count * (points / sample_size)"]
    133        subgraph sub [Per each visit]
    134            sub0[bonus = visit_type_bonus]
    135            sub1{bookmarked?}
    136            sub2[add bookmark bonus]
    137            sub3["score = age_bucket_weight * (bonus / 100)"]
    138            sub0 --> sub1
    139            sub1 -- yes --> sub2
    140            sub1 -- no --> sub3
    141            sub2 --> sub3;
    142        end
    143        start --> a0
    144        a0 -- no --> a2
    145        a2 --> a3
    146        a3 -- no --> end0
    147        a3 -- yes --> end1
    148        a0 -- yes --> a1
    149        a1 --> sub
    150        sub --> a4
    151        a4 --> a5
    152        a5 -- no --> end2
    153        a5 -- yes --> end3
    154 
    155 1. If the page is visited, get a sample of ``NUM_VISITS`` most recent visits.
    156 2. For each visit get a transition bonus, depending on the visit type.
    157 3. If the page is bookmarked, add to the bonus an additional bookmark bonus.
    158 4. If the bonus is positive, get a bucket weight depending on the visit date.
    159 5. Calculate points for the visit as ``age_bucket_weight * (bonus / 100)``.
    160 6. Sum points for all the sampled visits.
    161 7. If the points sum is zero, return a ``-1`` frecency, it will still appear in the UI.
    162   Otherwise, frecency is ``visitCount * points / NUM_VISITS``.
    163 8. If the page is unvisited and not bookmarked, or it’s a bookmarked place-query,
    164   return a ``0`` frecency, to hide it from the UI.
    165 9. If it’s bookmarked, add the bookmark bonus.
    166 10. If it’s also a typed page, add the typed bonus.
    167 11. Frecency is ``age_bucket_weight * (bonus / 100)``
    168 
    169 When frecency for a page is calculated
    170 --------------------------------------
    171 
    172 Operations that may influence the frecency score are:
    173 
    174 * Adding visits
    175 * Removing visits
    176 * Adding bookmarks
    177 * Removing bookmarks
    178 * Changing the url of a bookmark
    179 
    180 Frecency is recalculated:
    181 
    182 * Immediately, when a new visit is added. The user expectation here is that the
    183  page appears in search results after being visited. This is also valid for
    184  any History API that allows to add visits.
    185 * In background on idle times, in any other case. In most cases having a
    186  temporary stale value is not a problem, the main concern would be privacy
    187  when removing history of a page, but removing whole history will either
    188  completely remove the page or, if it's bookmarked, it will still be relevant.
    189  In this case, when a change influencing frecency happens, the ``recalc_frecency``
    190  database field for the page is set to ``1``.
    191 
    192 Recalculation is done by the `PlacesFrecencyRecalculator <https://searchfox.org/mozilla-central/source/toolkit/components/places/PlacesFrecencyRecalculator.sys.mjs>`_ module.
    193 The Recalculator is notified when ``PlacesUtils.history.shouldStartFrecencyRecalculation``
    194 value changes from false to true, that means there's values to recalculate.
    195 A DeferredTask is armed, that will look for a user idle opportunity
    196 in the next 5 minutes, otherwise it will run when that time elapses.
    197 Once all the outdated values have been recalculated
    198 ``PlacesUtils.history.shouldStartFrecencyRecalculation`` is set back to false
    199 until the next operation invalidating a frecency.
    200 The recalculation task is also armed on the ``idle-daily`` notification.
    201 
    202 When the task is executed, it recalculates frecency of a chunk of pages. If
    203 there are more pages left to recalculate, the task is re-armed. After frecency
    204 of a page is recalculated, its ``recalc_frecency`` field is set back to ``0``.
    205 
    206 Frecency is also decayed daily during the ``idle-daily`` notification, by
    207 multiplying all the scores by a decay rate  of ``0.975`` (half-life of 28 days).
    208 This guarantees entries not receiving new visits or bookmarks lose relevancy.