tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

JSON_FORMAT.md (21935B)


XPCShell JSON Data Format Documentation

This document describes the JSON file formats created by fetch-xpcshell-data.js.

Overview

The script generates two types of JSON files for each date or try commit:

  1. Contributing
  2. Contributing

Both formats use string tables and index-based lookups to minimize file size.


Test Timing Data Format

Top-Level Structure

{
  "metadata": { ... },
  "tables": { ... },
  "taskInfo": { ... },
  "testInfo": { ... },
  "testRuns": [ ... ]
}

metadata

Contains information about the data collection:

{
  "date": "2025-10-14",              // Date of the data (for date-based queries)
  "revision": "abc123...",           // Try commit revision (for try-based queries)
  "pushId": 12345,                   // Treeherder push ID (for try-based queries)
  "startTime": 1760400000,           // Unix timestamp (seconds) used as base for relative timestamps
  "generatedAt": "2025-10-15T14:24:33.451Z",  // ISO timestamp when file was created
  "jobCount": 3481,                  // Number of jobs fetched
  "processedJobCount": 3481          // Number of jobs successfully processed
}

tables

String tables for efficient storage. All strings are deduplicated and stored once, sorted by frequency (most frequently used first for better compression):

{
  "jobNames": [                      // Job names (e.g., "test-linux1804-64/opt-xpcshell")
    "test-linux1804-64/opt-xpcshell",
    "test-macosx1015-64/debug-xpcshell",
    ...
  ],
  "testPaths": [                     // Test file paths (e.g., "dom/indexedDB/test/unit")
    "dom/indexedDB/test/unit",
    "toolkit/components/extensions/test/xpcshell",
    ...
  ],
  "testNames": [                     // Test filenames (e.g., "test_foo.js")
    "test_foo.js",
    "test_bar.js",
    ...
  ],
  "repositories": [                  // Repository names
    "mozilla-central",
    "autoland",
    "try",
    ...
  ],
  "statuses": [                      // Test run statuses
    "PASS-PARALLEL",
    "PASS-SEQUENTIAL",
    "SKIP",
    "FAIL-PARALLEL",
    "TIMEOUT-SEQUENTIAL",
    "CRASH",
    "EXPECTED-FAIL",
    ...
  ],
  "taskIds": [                       // TaskCluster task IDs with retry (always includes .retryId)
    "YJJe4a0CRIqbAmcCo8n63w.0",      // Retry 0
    "XPPf5b1DRJrcBndDp9o74x.1",      // Retry 1
    ...
  ],
  "messages": [                      // Test messages (for SKIP and FAIL statuses)
    "skip-if: os == 'linux'",
    "disabled due to bug 123456",
    "Expected 5, got 10",              // Failure message
    ...
  ],
  "crashSignatures": [               // Crash signatures (only for crashed tests)
    "mozilla::dom::Something::Crash",
    "EMPTY: no crashing thread identified",
    ...
  ],
  "components": [                    // Bugzilla components (Product :: Component format)
    "Core :: Storage: IndexedDB",
    "Testing :: XPCShell Harness",
    "Firefox :: General",
    ...
  ],
  "commitIds": [                     // Commit IDs from repository (extracted from profile.meta.sourceURL)
    "f37a6863f87aeeb870b16223045ea7614b1ba0a7",
    "abc123def456789012345678901234567890abcd",
    ...
  ]
}

taskInfo

Maps task IDs to their associated job names, repositories, and commit IDs. These are parallel arrays indexed by taskIdId:

{
  "repositoryIds": [0, 1, 0, 2, ...],  // Index into tables.repositories
  "jobNameIds": [0, 0, 1, 1, ...],     // Index into tables.jobNames
  "commitIds": [0, 1, 0, null, ...]    // Index into tables.commitIds (null if not available)
}

Example lookup:

const taskIdId = 5;
const taskId = tables.taskIds[taskIdId];           // "YJJe4a0CRIqbAmcCo8n63w.0"
const repository = tables.repositories[taskInfo.repositoryIds[taskIdId]];  // "mozilla-central"
const jobName = tables.jobNames[taskInfo.jobNameIds[taskIdId]];           // "test-linux1804-64/opt-xpcshell"
const commitIdIdx = taskInfo.commitIds[taskIdId];
const commitId = commitIdIdx !== null ? tables.commitIds[commitIdIdx] : null;  // "f37a6863f87a..." or null

testInfo

Maps test IDs to their test paths, names, and components. These are parallel arrays indexed by testId:

{
  "testPathIds": [0, 0, 1, 2, ...],    // Index into tables.testPaths
  "testNameIds": [0, 1, 2, 3, ...],    // Index into tables.testNames
  "componentIds": [5, 5, 12, null, ...] // Index into tables.components (null if unknown)
}

Example lookup:

const testId = 10;
const testPath = tables.testPaths[testInfo.testPathIds[testId]];  // "dom/indexedDB/test/unit"
const testName = tables.testNames[testInfo.testNameIds[testId]];  // "test_foo.js"
const fullPath = testPath ? `${testPath}/${testName}` : testName;
const componentId = testInfo.componentIds[testId];
const component = componentId !== null ? tables.components[componentId] : "Unknown";  // "Core :: Storage: IndexedDB"

testRuns

A 2D sparse array structure: testRuns[testId][statusId]

Each testRuns[testId][statusId] contains data for all runs of that test with that specific status. If a test never had a particular status, that array position contains null:

[
  // testId 0
  [
    // statusId 0 (e.g., "PASS-PARALLEL")
    {
      "taskIdIds": [5, 12, 18, ...],       // Indices into tables.taskIds
      "durations": [1234, 1456, 1289, ...], // Test durations in milliseconds
      "timestamps": [0, 15, 23, ...]        // Differential compressed timestamps (seconds relative to metadata.startTime)
    },
    // statusId 1 - this test never had that status
    null,
    // statusId 2 (e.g., "SKIP")
    {
      "taskIdIds": [45, 67, ...],
      "durations": [0, 0, ...],
      "timestamps": [100, 200, ...],
      "messageIds": [5, 5, ...]            // Present for SKIP and FAIL statuses - indices into tables.messages (null if no message)
    },
    // statusId 3 (e.g., "FAIL-PARALLEL")
    {
      "taskIdIds": [78, ...],
      "durations": [1234, ...],
      "timestamps": [250, ...],
      "messageIds": [12, ...]              // Present for SKIP and FAIL statuses - indices into tables.messages (null if no message)
    },
    // statusId 4 (e.g., "CRASH")
    {
      "taskIdIds": [89, ...],
      "durations": [5678, ...],
      "timestamps": [300, ...],
      "crashSignatureIds": [2, ...],       // Only present for CRASH status - indices into tables.crashSignatures (null if none)
      "minidumps": ["12345678-abcd-1234-abcd-1234567890ab", ...]   // Only present for CRASH status - minidump IDs or null
    }
  ],
  // testId 1
  [ ... ],
  ...
]

Timestamp decompression:

// Timestamps are differentially compressed
let currentTime = metadata.startTime;  // Base timestamp in seconds
const decompressedTimestamps = statusGroup.timestamps.map(diff => {
    currentTime += diff;
    return currentTime;
});

Example: Get all runs of a specific test:

const testId = 10;
const testGroup = testRuns[testId];

for (let statusId = 0; statusId < testGroup.length; statusId++) {
    const statusGroup = testGroup[statusId];
    if (!statusGroup) continue;  // This test never had this status

    const status = tables.statuses[statusId];
    console.log(`Status: ${status}, Runs: ${statusGroup.taskIdIds.length}`);

    // Decompress timestamps
    let currentTime = metadata.startTime;
    for (let i = 0; i < statusGroup.taskIdIds.length; i++) {
        currentTime += statusGroup.timestamps[i];
        const taskId = tables.taskIds[statusGroup.taskIdIds[i]];
        const duration = statusGroup.durations[i];
        console.log(`  Task: ${taskId}, Duration: ${duration}ms, Time: ${currentTime}`);
    }
}

Resource Usage Data Format

Top-Level Structure

{
  "jobNames": [ ... ],
  "repositories": [ ... ],
  "machineInfos": [ ... ],
  "jobs": { ... }
}

Lookup Tables

{
  "jobNames": [                      // Base job names without chunk numbers
    "test-linux1804-64/opt-xpcshell",
    "test-macosx1015-64/debug-xpcshell",
    ...
  ],
  "repositories": [                  // Repository names
    "mozilla-central",
    "autoland",
    ...
  ],
  "machineInfos": [                  // Machine specifications (memory in GB, rounded to 1 decimal)
    {
      "logicalCPUs": 8,
      "physicalCPUs": 4,
      "mainMemory": 15.6             // GB
    },
    {
      "logicalCPUs": 16,
      "physicalCPUs": 8,
      "mainMemory": 31.4
    },
    ...
  ]
}

jobs

Parallel arrays containing resource usage data for each job, sorted by start time:

{
  "jobNameIds": [0, 0, 1, 1, ...],                              // Indices into jobNames array
  "chunks": [1, 2, 1, 2, ...],                                  // Chunk numbers (null if job name has no chunk)
  "taskIds": ["YJJe4a0CRIqbAmcCo8n63w", "XPPf5b1DRJrcBndDp9o74x.1", ...], // Task IDs (format: "taskId" for retry 0, "taskId.retryId" for retry > 0)
  "repositoryIds": [0, 0, 1, 1, ...],                           // Indices into repositories array
  "startTimes": [0, 150, 23, 45, ...],       // Differential compressed timestamps (seconds)
  "machineInfoIds": [0, 0, 1, 1, ...],       // Indices into machineInfos array
  "maxMemories": [1234567890, ...],          // Maximum memory used (bytes)
  "idleTimes": [12345, ...],                 // Time with <50% of one core used (milliseconds)
  "singleCoreTimes": [45678, ...],           // Time using ~1 core (0.75-1.25 cores, milliseconds)
  "cpuBuckets": [                            // CPU usage time distribution (milliseconds per bucket)
    [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],  // Job 0: [0-10%, 10-20%, ..., 90-100%]
    [150, 250, 350, 450, 550, 650, 750, 850, 950, 1050],  // Job 1
    ...
  ]
}

CPU Buckets Explanation:

Idle Time Calculation:

Single Core Time Calculation:

Start Time Decompression:

let currentTime = 0;  // Start times are relative to each other
const decompressedStartTimes = jobs.startTimes.map(diff => {
    currentTime += diff;
    return currentTime;
});

Example: Get full information for a job:

const jobIndex = 5;
const jobName = jobNames[jobs.jobNameIds[jobIndex]];
const chunk = jobs.chunks[jobIndex];  // May be null
const fullJobName = chunk !== null ? `${jobName}-${chunk}` : jobName;
const taskId = jobs.taskIds[jobIndex];
const repository = repositories[jobs.repositoryIds[jobIndex]];
const machineInfo = machineInfos[jobs.machineInfoIds[jobIndex]];

// Decompress start time
let currentTime = 0;
for (let i = 0; i <= jobIndex; i++) {
    currentTime += jobs.startTimes[i];
}
const startTime = currentTime;  // seconds since epoch

const maxMemoryGB = jobs.maxMemories[jobIndex] / (1024 * 1024 * 1024);
const idleTimeSeconds = jobs.idleTimes[jobIndex] / 1000;
const singleCoreTimeSeconds = jobs.singleCoreTimes[jobIndex] / 1000;
const cpuDistribution = jobs.cpuBuckets[jobIndex];
const totalTime = cpuDistribution.reduce((sum, val) => sum + val, 0);
const idlePercent = (idleTimeSeconds * 1000 / totalTime) * 100;

Data Compression Techniques

The format uses several compression techniques to minimize file size:

  1. For 16-core machine: 4.6875% - 7.8125%
  2. For 16-core machine: 4.6875% - 7.8125%
  3. For 16-core machine: 4.6875% - 7.8125%
  4. For 16-core machine: 4.6875% - 7.8125%
  5. For 16-core machine: 4.6875% - 7.8125%
  6. For 16-core machine: 4.6875% - 7.8125%
  7. For 16-core machine: 4.6875% - 7.8125%

Index File Format

The index.json file lists all available dates:

{
  "dates": [
    "2025-10-15",
    "2025-10-14",
    "2025-10-13",
    ...
  ]
}

Dates are sorted in descending order (newest first).


Notes

- Test timing data: Always includes retry suffix (e.g., "YJJe4a0CRIqbAmcCo8n63w.0") - Resource usage data: Omits .0 for retry 0 (e.g., "YJJe4a0CRIqbAmcCo8n63w"), includes suffix for retries > 0 (e.g., "YJJe4a0CRIqbAmcCo8n63w.1")


Aggregated Files Format

When running with --days N where N > 1, two aggregated files are generated:

  1. The data structure is optimized for sequential access patterns used by the dashboards
  1. The data structure is optimized for sequential access patterns used by the dashboards

Detailed File (xpcshell-issues-with-taskids.json)

Differences from Daily Files

1. Metadata Changes

{
  "metadata": {
    "startDate": "2025-11-12",           // First date in the range (earliest)
    "endDate": "2025-12-02",             // Last date in the range (most recent)
    "days": 21,                          // Number of days aggregated
    "startTime": 1731456000,             // Unix timestamp for startDate at 00:00:00 UTC
    "generatedAt": "...",
    "totalTestCount": 4506,              // Total number of unique tests
    "testsWithFailures": 3614,           // Number of tests that had at least one non-passing run
    "aggregatedFrom": [...]              // Array of source filenames
  }
}

Additional fields:

2. Passing Test Runs Are Aggregated

Daily files store individual runs for all statuses:

{
  "taskIdIds": [123, 456, 789],
  "durations": [1500, 1600, 1550],
  "timestamps": [3600, 3600, 7200]
}

Aggregated file stores only counts per hour for passing statuses (status starts with "PASS"):

{
  "counts": [150, 200, 180, 145, ...],
  "hours": [0, 5, 1, 2, 8, ...]
}

Where:

Decompressing hours:

let currentHour = 0;
const absoluteHours = [];
for (const delta of hours) {
  currentHour += delta;
  absoluteHours.push(currentHour);
}
// absoluteHours[i] is now the hour number (0 = startTime, 1 = startTime + 1 hour, etc.)

Example: Calculate pass rate for a test on day 5:

const testId = 0;
const day = 5; // 5 days after startDate

// Find pass status
const passStatusId = data.tables.statuses.findIndex(s => s.startsWith("PASS"));
const passGroup = data.testRuns[testId]?.[passStatusId];

// Count passes in day 5 (hours 120-143)
const dayStartHour = day * 24;
const dayEndHour = (day + 1) * 24;
let passCount = 0;
let currentHour = 0;
if (passGroup) {
  for (let i = 0; i < passGroup.hours.length; i++) {
    currentHour += passGroup.hours[i];
    if (currentHour >= dayStartHour && currentHour < dayEndHour) {
      passCount += passGroup.counts[i];
    }
  }
}

// For fail count, need to count timestamps in that day's range
const dayStartSeconds = day * 86400;
const dayEndSeconds = (day + 1) * 86400;

3. All Test Runs Aggregated by Hour

Both passing and non-passing test runs are aggregated by hour. The difference is in what data is preserved:

Passing tests (status starts with "PASS"):

{
  "counts": [150, 200, 180],
  "hours": [0, 5, 1]
}

Non-passing tests (FAIL, CRASH, TIMEOUT, SKIP, etc.):

{
  "taskIdIds": [
    [45, 67],      // Task IDs that failed in hour 0 with message 23
    [89, 12, 56],  // Task IDs that failed in hour 5 with message 23
    [34]           // Task IDs that failed in hour 6 with message 24
  ],
  "hours": [0, 5, 1],
  "messageIds": [23, 23, 24],
  "crashSignatureIds": [5, 5, 6],
  "minidumps": [
    ["abc123", "def456"],    // Minidumps for crashes in hour 0
    ["ghi789", null, "jkl"],  // Minidumps for crashes in hour 5
    [null]                    // Minidumps for crashes in hour 6
  ]
}

Key differences from daily files:

4. String Tables Are Merged

All string tables are merged and deduplicated across all input days. A string that appears in multiple daily files will only appear once in the aggregated file.

5. TaskInfo Only Contains Failed Tasks

Since passing runs don't store taskIdIds, the taskInfo object only contains mappings for tasks that appear in non-passing test runs. This significantly reduces the size of these arrays.

6. Platform-Irrelevant Tests Are Filtered

SKIP tests with messages starting with "run-if" are filtered out during aggregation. These represent tests that are not relevant on certain platforms (e.g., "run-if = os == 'win'") and are not actual issues. The dashboard would filter these out anyway, so excluding them reduces file size.

Use Cases

Show pass/fail trends over time:

Investigate specific failures:

Calculate overall pass rate:

const testId = 0;
const passStatusId = data.tables.statuses.findIndex(s => s.startsWith("PASS"));
const failStatusId = data.tables.statuses.indexOf("FAIL");

// Total passes
const totalPasses = data.testRuns[testId]?.[passStatusId]?.counts.reduce((a, b) => a + b, 0) ?? 0;

// Total fails - count all taskIds across all buckets
const failGroup = data.testRuns[testId]?.[failStatusId];
const totalFails = failGroup?.taskIdIds.reduce((sum, arr) => sum + arr.length, 0) ?? 0;

const passRate = totalPasses / (totalPasses + totalFails);

Small File (xpcshell-issues.json)

This file omits task IDs and minidumps to minimize file size for fast dashboard loading.

Differences from xpcshell-issues-with-taskids.json

1. No taskInfo or taskIds

The taskInfo object and tables.taskIds array are completely omitted since all runs are aggregated.

2. Reduced String Tables

Only includes tables needed for aggregated data:

{
  "tables": {
    "testPaths": [...],
    "testNames": [...],
    "statuses": [...],
    "messages": [...],           // Kept for failure details
    "crashSignatures": [...],    // Kept for crash details
    "components": [...]
    // No jobNames, repositories, or taskIds
  }
}

3. No Task IDs - Only Counts

All status groups use counts instead of task ID arrays:

{
  "counts": [5, 12, 8, 3],
  "hours": [0, 5, 1, 2],
  "messageIds": [23, 23, 24, 24],           // For failures with different messages
  "crashSignatureIds": [5, 6, 5, 6]         // For crashes with different signatures
  // Note: taskIdIds and minidumps are NOT included in this file
}

Failures with different messages or crash signatures are bucketed separately, preserving distinct failure modes.

Task IDs and minidumps are omitted to reduce size. They are available in the detailed file.

Example: A test that fails 5 times in hour 10 with message A and 3 times with message B will have two entries:

{
  "counts": [5, 3],
  "hours": [10, 0],  // Both in same hour, so second delta is 0
  "messageIds": [23, 24]
}