tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

payload-evolution.md (9649B)


Handling the evolution of Sync payloads

(Note that this document has been written in the format of an application-services ADR but the relelvant teams decided that ultimately the best home for this doc is in mozilla-central)

Technical Story:

Context and Problem Statement

Sync exists on all platforms (Desktop, Android, iOS), all channels (Nightly, Beta, Release, ESR) and is heavily used across all Firefox features. Whenever there are feature changes or requests that potentially involve schema changes, there are not a lot of good options to ensure sync doesn’t break for any specific client. Since sync data is synced from all channels, we need to make sure each client can handle the new data and that all channels can support the new schema. Issues like credit card failing on android and desktop release channels due to schema change on desktop Nightly are examples of such cases we can run into. This document describes our decision on how we will support payload evolution over time.

Note that even though this document exists in the application-services repository, it should be considered to apply to all sync implementations, whether in this repository, in mozilla-central, or anywhere else.

Definitions

is not yet understood by "recent" versions. The most common example would be a Nightly version of Firefox with a new feature not yet on the release channel.

or have support for new features in "new" versions, but which we still want to support without breakage and without the user perceiving data-loss. This is typically accepted to mean the current ESR version or later, but taking into account the slow update when new ESRs are released.

Decision Drivers

might be considered acceptable if absolutely necessary.

"old" versions just because they might have a problem in the future. That is, we want to avoid a policy that dictates versions more than (say) 2 years old will break when syncing "just in case"

asks coming down the line which require this capability.

Considered Options

do not know about and (b) never changing the semantics of existing data.

Decision Outcome

Chosen option: A backwards compatible schema policy because it is very flexible and the only option meeting the decision drivers.

Pros and Cons of the Options

A backwards compatible schema policy

A summary of this option is a policy by which:

does not understand. The next time that engine needs to upload that record to the storage server, it must arrange to add all such "unknown" fields back into the payload.

passwords engine would identify the "root" of the payload, addresses and creditcards would identify the entry sub-object in the payload, while the history engine would probably identify both the root of the payload and the visits array.

that "new" clients must support both new fields and fields which are considered deprecated by these "new" clients because they are still used by "recent" versions.

The pros and cons:

specifically is to support the round-tripping of "unknown" fields, in the hope that by the time actual schema changes are proposed, this round-trip capability will then be on all "recent" versions)

some evolution tasks become complicated. For example, consider a hypothetical change where we wanted to change from "street/city/state" fields into a free-form "address" field. New Firefox versions would need to populate both new and old fields when writing to the server, and handle the fact that only the old versions might be updated when it sees an incoming record written by a "recent" or "old" versions of Firefox. However, this should be rare.

is informal and requires good judgement as changes are proposed.

A policy which prevents "recent" clients from syncing, or editing data

Proposals which fit into this category might have been implemented by (say) adding a version number to the schema, and if clients did not fully understand the schema it would either prevent syncing the record, or sync it but not allow editing it, or similar.

This was rejected because:

still need most of the chosen option anyway - specifically, we could still never deprecate fields etc.

to do in a satisfactory way.

allowing a Nightly to sync would effectively break Release/Mobile Firefox versions.

A formal schema-driven process.

Ideally we could formally describe schemas, but we can't come up with anything here which works with the constraints of supporting older clients - we simply can't update older released Firefoxes so they know how to work with the new schemas. We also couldn't come up with a solution where a schema is downloaded dynamically which also allowed the semantics (as opposed to simply validity) of new fields to be described.

Consider the sync payloads frozen and never change them.

A process where payloads are frozen was rejected because:

(ie, freezing "old" schemas but then creating "new" schemas only understood by newer clients) could not be conceived in a way that still met the requirements, particularly around data-loss for older clients. For example, adding a credit-card on a Nightly version but having it be completely unavailable on a release firefox isn't acceptable.

Use separate collections for new data

We could store the new data in a separate collection. For example define a bookmarks2 collection where each record has the same guid as one in bookmarks alongside any new fields. Newer clients use both collections to sync.

The pros and cons:

We can currently write to a single collection in an atomic way, but don't have a way to write to multiple collections.

For example if we added a new collection for a single field, then the attacker could guess if that field was set or not based on the size of the encrypted record.

for example adding a field to a history record visit.

Links <!-- optional -->

which may be of interest for historical context, but should not be considered part of this ADR.