JSON_FORMAT.md (21935B)

XPCShell JSON Data Format Documentation

This document describes the JSON file formats created by fetch-xpcshell-data.js.

Overview

The script generates two types of JSON files for each date or try commit:

Both formats use string tables and index-based lookups to minimize file size.

Test Timing Data Format

Top-Level Structure

{
  "metadata": { ... },
  "tables": { ... },
  "taskInfo": { ... },
  "testInfo": { ... },
  "testRuns": [ ... ]
}

metadata

Contains information about the data collection:

{
  "date": "2025-10-14",              // Date of the data (for date-based queries)
  "revision": "abc123...",           // Try commit revision (for try-based queries)
  "pushId": 12345,                   // Treeherder push ID (for try-based queries)
  "startTime": 1760400000,           // Unix timestamp (seconds) used as base for relative timestamps
  "generatedAt": "2025-10-15T14:24:33.451Z",  // ISO timestamp when file was created
  "jobCount": 3481,                  // Number of jobs fetched
  "processedJobCount": 3481          // Number of jobs successfully processed
}

tables

String tables for efficient storage. All strings are deduplicated and stored once, sorted by frequency (most frequently used first for better compression):

{
  "jobNames": [                      // Job names (e.g., "test-linux1804-64/opt-xpcshell")
    "test-linux1804-64/opt-xpcshell",
    "test-macosx1015-64/debug-xpcshell",
    ...
  ],
  "testPaths": [                     // Test file paths (e.g., "dom/indexedDB/test/unit")
    "dom/indexedDB/test/unit",
    "toolkit/components/extensions/test/xpcshell",
    ...
  ],
  "testNames": [                     // Test filenames (e.g., "test_foo.js")
    "test_foo.js",
    "test_bar.js",
    ...
  ],
  "repositories": [                  // Repository names
    "mozilla-central",
    "autoland",
    "try",
    ...
  ],
  "statuses": [                      // Test run statuses
    "PASS-PARALLEL",
    "PASS-SEQUENTIAL",
    "SKIP",
    "FAIL-PARALLEL",
    "TIMEOUT-SEQUENTIAL",
    "CRASH",
    "EXPECTED-FAIL",
    ...
  ],
  "taskIds": [                       // TaskCluster task IDs with retry (always includes .retryId)
    "YJJe4a0CRIqbAmcCo8n63w.0",      // Retry 0
    "XPPf5b1DRJrcBndDp9o74x.1",      // Retry 1
    ...
  ],
  "messages": [                      // Test messages (for SKIP and FAIL statuses)
    "skip-if: os == 'linux'",
    "disabled due to bug 123456",
    "Expected 5, got 10",              // Failure message
    ...
  ],
  "crashSignatures": [               // Crash signatures (only for crashed tests)
    "mozilla::dom::Something::Crash",
    "EMPTY: no crashing thread identified",
    ...
  ],
  "components": [                    // Bugzilla components (Product :: Component format)
    "Core :: Storage: IndexedDB",
    "Testing :: XPCShell Harness",
    "Firefox :: General",
    ...
  ],
  "commitIds": [                     // Commit IDs from repository (extracted from profile.meta.sourceURL)
    "f37a6863f87aeeb870b16223045ea7614b1ba0a7",
    "abc123def456789012345678901234567890abcd",
    ...
  ]
}

taskInfo

Maps task IDs to their associated job names, repositories, and commit IDs. These are parallel arrays indexed by taskIdId:

{
  "repositoryIds": [0, 1, 0, 2, ...],  // Index into tables.repositories
  "jobNameIds": [0, 0, 1, 1, ...],     // Index into tables.jobNames
  "commitIds": [0, 1, 0, null, ...]    // Index into tables.commitIds (null if not available)
}

Example lookup:

const taskIdId = 5;
const taskId = tables.taskIds[taskIdId];           // "YJJe4a0CRIqbAmcCo8n63w.0"
const repository = tables.repositories[taskInfo.repositoryIds[taskIdId]];  // "mozilla-central"
const jobName = tables.jobNames[taskInfo.jobNameIds[taskIdId]];           // "test-linux1804-64/opt-xpcshell"
const commitIdIdx = taskInfo.commitIds[taskIdId];
const commitId = commitIdIdx !== null ? tables.commitIds[commitIdIdx] : null;  // "f37a6863f87a..." or null

testInfo

Maps test IDs to their test paths, names, and components. These are parallel arrays indexed by testId:

{
  "testPathIds": [0, 0, 1, 2, ...],    // Index into tables.testPaths
  "testNameIds": [0, 1, 2, 3, ...],    // Index into tables.testNames
  "componentIds": [5, 5, 12, null, ...] // Index into tables.components (null if unknown)
}

Example lookup:

const testId = 10;
const testPath = tables.testPaths[testInfo.testPathIds[testId]];  // "dom/indexedDB/test/unit"
const testName = tables.testNames[testInfo.testNameIds[testId]];  // "test_foo.js"
const fullPath = testPath ? `${testPath}/${testName}` : testName;
const componentId = testInfo.componentIds[testId];
const component = componentId !== null ? tables.components[componentId] : "Unknown";  // "Core :: Storage: IndexedDB"

testRuns

A 2D sparse array structure: testRuns[testId][statusId]

First dimension: testId (index into testInfo arrays)
Second dimension: statusId (index into tables.statuses)

Each testRuns[testId][statusId] contains data for all runs of that test with that specific status. If a test never had a particular status, that array position contains null:

[
  // testId 0
  [
    // statusId 0 (e.g., "PASS-PARALLEL")
    {
      "taskIdIds": [5, 12, 18, ...],       // Indices into tables.taskIds
      "durations": [1234, 1456, 1289, ...], // Test durations in milliseconds
      "timestamps": [0, 15, 23, ...]        // Differential compressed timestamps (seconds relative to metadata.startTime)
    },
    // statusId 1 - this test never had that status
    null,
    // statusId 2 (e.g., "SKIP")
    {
      "taskIdIds": [45, 67, ...],
      "durations": [0, 0, ...],
      "timestamps": [100, 200, ...],
      "messageIds": [5, 5, ...]            // Present for SKIP and FAIL statuses - indices into tables.messages (null if no message)
    },
    // statusId 3 (e.g., "FAIL-PARALLEL")
    {
      "taskIdIds": [78, ...],
      "durations": [1234, ...],
      "timestamps": [250, ...],
      "messageIds": [12, ...]              // Present for SKIP and FAIL statuses - indices into tables.messages (null if no message)
    },
    // statusId 4 (e.g., "CRASH")
    {
      "taskIdIds": [89, ...],
      "durations": [5678, ...],
      "timestamps": [300, ...],
      "crashSignatureIds": [2, ...],       // Only present for CRASH status - indices into tables.crashSignatures (null if none)
      "minidumps": ["12345678-abcd-1234-abcd-1234567890ab", ...]   // Only present for CRASH status - minidump IDs or null
    }
  ],
  // testId 1
  [ ... ],
  ...
]

Timestamp decompression:

// Timestamps are differentially compressed
let currentTime = metadata.startTime;  // Base timestamp in seconds
const decompressedTimestamps = statusGroup.timestamps.map(diff => {
    currentTime += diff;
    return currentTime;
});

Example: Get all runs of a specific test:

const testId = 10;
const testGroup = testRuns[testId];

for (let statusId = 0; statusId < testGroup.length; statusId++) {
    const statusGroup = testGroup[statusId];
    if (!statusGroup) continue;  // This test never had this status

    const status = tables.statuses[statusId];
    console.log(`Status: ${status}, Runs: ${statusGroup.taskIdIds.length}`);

    // Decompress timestamps
    let currentTime = metadata.startTime;
    for (let i = 0; i < statusGroup.taskIdIds.length; i++) {
        currentTime += statusGroup.timestamps[i];
        const taskId = tables.taskIds[statusGroup.taskIdIds[i]];
        const duration = statusGroup.durations[i];
        console.log(`  Task: ${taskId}, Duration: ${duration}ms, Time: ${currentTime}`);
    }
}

Resource Usage Data Format

Top-Level Structure

{
  "jobNames": [ ... ],
  "repositories": [ ... ],
  "machineInfos": [ ... ],
  "jobs": { ... }
}

Lookup Tables

{
  "jobNames": [                      // Base job names without chunk numbers
    "test-linux1804-64/opt-xpcshell",
    "test-macosx1015-64/debug-xpcshell",
    ...
  ],
  "repositories": [                  // Repository names
    "mozilla-central",
    "autoland",
    ...
  ],
  "machineInfos": [                  // Machine specifications (memory in GB, rounded to 1 decimal)
    {
      "logicalCPUs": 8,
      "physicalCPUs": 4,
      "mainMemory": 15.6             // GB
    },
    {
      "logicalCPUs": 16,
      "physicalCPUs": 8,
      "mainMemory": 31.4
    },
    ...
  ]
}

jobs

Parallel arrays containing resource usage data for each job, sorted by start time:

{
  "jobNameIds": [0, 0, 1, 1, ...],                              // Indices into jobNames array
  "chunks": [1, 2, 1, 2, ...],                                  // Chunk numbers (null if job name has no chunk)
  "taskIds": ["YJJe4a0CRIqbAmcCo8n63w", "XPPf5b1DRJrcBndDp9o74x.1", ...], // Task IDs (format: "taskId" for retry 0, "taskId.retryId" for retry > 0)
  "repositoryIds": [0, 0, 1, 1, ...],                           // Indices into repositories array
  "startTimes": [0, 150, 23, 45, ...],       // Differential compressed timestamps (seconds)
  "machineInfoIds": [0, 0, 1, 1, ...],       // Indices into machineInfos array
  "maxMemories": [1234567890, ...],          // Maximum memory used (bytes)
  "idleTimes": [12345, ...],                 // Time with <50% of one core used (milliseconds)
  "singleCoreTimes": [45678, ...],           // Time using ~1 core (0.75-1.25 cores, milliseconds)
  "cpuBuckets": [                            // CPU usage time distribution (milliseconds per bucket)
    [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],  // Job 0: [0-10%, 10-20%, ..., 90-100%]
    [150, 250, 350, 450, 550, 650, 750, 850, 950, 1050],  // Job 1
    ...
  ]
}

CPU Buckets Explanation:

Array of 10 values representing time spent in each CPU usage range
Bucket 0: 0-10% CPU usage
Bucket 1: 10-20% CPU usage
...
Bucket 9: 90-100% CPU usage
Values are in milliseconds

Idle Time Calculation:

Idle = CPU usage < (50% of one core)
For 8-core machine: idle = CPU usage < 6.25%
For 16-core machine: idle = CPU usage < 3.125%

Single Core Time Calculation:

Single core = CPU usage between 0.75 and 1.25 cores
For 8-core machine: 9.375% - 15.625%
For 16-core machine: 4.6875% - 7.8125%

Start Time Decompression:

let currentTime = 0;  // Start times are relative to each other
const decompressedStartTimes = jobs.startTimes.map(diff => {
    currentTime += diff;
    return currentTime;
});

Example: Get full information for a job:

const jobIndex = 5;
const jobName = jobNames[jobs.jobNameIds[jobIndex]];
const chunk = jobs.chunks[jobIndex];  // May be null
const fullJobName = chunk !== null ? `${jobName}-${chunk}` : jobName;
const taskId = jobs.taskIds[jobIndex];
const repository = repositories[jobs.repositoryIds[jobIndex]];
const machineInfo = machineInfos[jobs.machineInfoIds[jobIndex]];

// Decompress start time
let currentTime = 0;
for (let i = 0; i <= jobIndex; i++) {
    currentTime += jobs.startTimes[i];
}
const startTime = currentTime;  // seconds since epoch

const maxMemoryGB = jobs.maxMemories[jobIndex] / (1024 * 1024 * 1024);
const idleTimeSeconds = jobs.idleTimes[jobIndex] / 1000;
const singleCoreTimeSeconds = jobs.singleCoreTimes[jobIndex] / 1000;
const cpuDistribution = jobs.cpuBuckets[jobIndex];
const totalTime = cpuDistribution.reduce((sum, val) => sum + val, 0);
const idlePercent = (idleTimeSeconds * 1000 / totalTime) * 100;

Data Compression Techniques

The format uses several compression techniques to minimize file size:

For 16-core machine: 4.6875% - 7.8125%
For 16-core machine: 4.6875% - 7.8125%
For 16-core machine: 4.6875% - 7.8125%
For 16-core machine: 4.6875% - 7.8125%
For 16-core machine: 4.6875% - 7.8125%
For 16-core machine: 4.6875% - 7.8125%
For 16-core machine: 4.6875% - 7.8125%

Index File Format

The index.json file lists all available dates:

{
  "dates": [
    "2025-10-15",
    "2025-10-14",
    "2025-10-13",
    ...
  ]
}

Dates are sorted in descending order (newest first).

Notes

All timestamps in test timing data are in seconds
All durations are in milliseconds
Memory values in machineInfos are in GB (rounded to 1 decimal place)
Memory values in jobs.maxMemories are in bytes
The testRuns array is sparse - testRuns[testId][statusId] may be null if that test never had that status
Task ID formats differ between files:

- Test timing data: Always includes retry suffix (e.g., "YJJe4a0CRIqbAmcCo8n63w.0") - Resource usage data: Omits .0 for retry 0 (e.g., "YJJe4a0CRIqbAmcCo8n63w"), includes suffix for retries > 0 (e.g., "YJJe4a0CRIqbAmcCo8n63w.1")

Component mapping: Components are fetched from the TaskCluster index gecko.v2.mozilla-central.latest.source.source-bugzilla-info and mapped to test paths. The component ID in testInfo.componentIds may be null if the test path is not found in the mapping
Components are formatted as "Product :: Component" (e.g., "Core :: Storage: IndexedDB")
The data structure is optimized for sequential access patterns used by the dashboards

Aggregated Files Format

When running with --days N where N > 1, two aggregated files are generated:

The data structure is optimized for sequential access patterns used by the dashboards

The data structure is optimized for sequential access patterns used by the dashboards

Detailed File (xpcshell-issues-with-taskids.json)

Differences from Daily Files

1. Metadata Changes

{
  "metadata": {
    "startDate": "2025-11-12",           // First date in the range (earliest)
    "endDate": "2025-12-02",             // Last date in the range (most recent)
    "days": 21,                          // Number of days aggregated
    "startTime": 1731456000,             // Unix timestamp for startDate at 00:00:00 UTC
    "generatedAt": "...",
    "totalTestCount": 4506,              // Total number of unique tests
    "testsWithFailures": 3614,           // Number of tests that had at least one non-passing run
    "aggregatedFrom": [...]              // Array of source filenames
  }
}

Additional fields:

startDate, endDate, days indicate the date range
startTime is the base timestamp for the entire aggregated period (00:00:00 UTC on startDate)
testsWithFailures counts tests with any non-passing status
aggregatedFrom lists all source files that were merged

2. Passing Test Runs Are Aggregated

Daily files store individual runs for all statuses:

{
  "taskIdIds": [123, 456, 789],
  "durations": [1500, 1600, 1550],
  "timestamps": [3600, 3600, 7200]
}

Aggregated file stores only counts per hour for passing statuses (status starts with "PASS"):

{
  "counts": [150, 200, 180, 145, ...],
  "hours": [0, 5, 1, 2, 8, ...]
}

Where:

counts[i] = total number of passing runs in that hour
hours[i] = differential compressed hour offset (hours since previous bucket)
No taskIdIds or durations arrays
Typically sparse - only hours with passing runs are included

Decompressing hours:

let currentHour = 0;
const absoluteHours = [];
for (const delta of hours) {
  currentHour += delta;
  absoluteHours.push(currentHour);
}
// absoluteHours[i] is now the hour number (0 = startTime, 1 = startTime + 1 hour, etc.)

Example: Calculate pass rate for a test on day 5:

const testId = 0;
const day = 5; // 5 days after startDate

// Find pass status
const passStatusId = data.tables.statuses.findIndex(s => s.startsWith("PASS"));
const passGroup = data.testRuns[testId]?.[passStatusId];

// Count passes in day 5 (hours 120-143)
const dayStartHour = day * 24;
const dayEndHour = (day + 1) * 24;
let passCount = 0;
let currentHour = 0;
if (passGroup) {
  for (let i = 0; i < passGroup.hours.length; i++) {
    currentHour += passGroup.hours[i];
    if (currentHour >= dayStartHour && currentHour < dayEndHour) {
      passCount += passGroup.counts[i];
    }
  }
}

// For fail count, need to count timestamps in that day's range
const dayStartSeconds = day * 86400;
const dayEndSeconds = (day + 1) * 86400;

3. All Test Runs Aggregated by Hour

Both passing and non-passing test runs are aggregated by hour. The difference is in what data is preserved:

Passing tests (status starts with "PASS"):

{
  "counts": [150, 200, 180],
  "hours": [0, 5, 1]
}

Non-passing tests (FAIL, CRASH, TIMEOUT, SKIP, etc.):

{
  "taskIdIds": [
    [45, 67],      // Task IDs that failed in hour 0 with message 23
    [89, 12, 56],  // Task IDs that failed in hour 5 with message 23
    [34]           // Task IDs that failed in hour 6 with message 24
  ],
  "hours": [0, 5, 1],
  "messageIds": [23, 23, 24],
  "crashSignatureIds": [5, 5, 6],
  "minidumps": [
    ["abc123", "def456"],    // Minidumps for crashes in hour 0
    ["ghi789", null, "jkl"],  // Minidumps for crashes in hour 5
    [null]                    // Minidumps for crashes in hour 6
  ]
}

Key differences from daily files:

taskIdIds is an array of arrays - one array per (hour, message, crashSignature) bucket
minidumps is an array of arrays - parallel to taskIdIds, preserving minidump for each task
hours provides differentially compressed hour offsets
Durations are removed
Individual timestamps are removed - only the hour bucket is preserved
Failures with different messages or crash signatures are in separate buckets

4. String Tables Are Merged

All string tables are merged and deduplicated across all input days. A string that appears in multiple daily files will only appear once in the aggregated file.

5. TaskInfo Only Contains Failed Tasks

Since passing runs don't store taskIdIds, the taskInfo object only contains mappings for tasks that appear in non-passing test runs. This significantly reduces the size of these arrays.

6. Platform-Irrelevant Tests Are Filtered

SKIP tests with messages starting with "run-if" are filtered out during aggregation. These represent tests that are not relevant on certain platforms (e.g., "run-if = os == 'win'") and are not actual issues. The dashboard would filter these out anyway, so excluding them reduces file size.

Use Cases

Show pass/fail trends over time:

Passing runs: Use counts and hours arrays
Failing runs: Count taskIds in buckets within day ranges using hours

Investigate specific failures:

Task IDs preserved for all non-passing runs
Can identify which tasks/jobs/repos had failures
Can see error messages, crash signatures, and minidumps

Calculate overall pass rate:

const testId = 0;
const passStatusId = data.tables.statuses.findIndex(s => s.startsWith("PASS"));
const failStatusId = data.tables.statuses.indexOf("FAIL");

// Total passes
const totalPasses = data.testRuns[testId]?.[passStatusId]?.counts.reduce((a, b) => a + b, 0) ?? 0;

// Total fails - count all taskIds across all buckets
const failGroup = data.testRuns[testId]?.[failStatusId];
const totalFails = failGroup?.taskIdIds.reduce((sum, arr) => sum + arr.length, 0) ?? 0;

const passRate = totalPasses / (totalPasses + totalFails);

Small File (xpcshell-issues.json)

This file omits task IDs and minidumps to minimize file size for fast dashboard loading.

Differences from xpcshell-issues-with-taskids.json

1. No taskInfo or taskIds

The taskInfo object and tables.taskIds array are completely omitted since all runs are aggregated.

2. Reduced String Tables

Only includes tables needed for aggregated data:

{
  "tables": {
    "testPaths": [...],
    "testNames": [...],
    "statuses": [...],
    "messages": [...],           // Kept for failure details
    "crashSignatures": [...],    // Kept for crash details
    "components": [...]
    // No jobNames, repositories, or taskIds
  }
}

3. No Task IDs - Only Counts

All status groups use counts instead of task ID arrays:

{
  "counts": [5, 12, 8, 3],
  "hours": [0, 5, 1, 2],
  "messageIds": [23, 23, 24, 24],           // For failures with different messages
  "crashSignatureIds": [5, 6, 5, 6]         // For crashes with different signatures
  // Note: taskIdIds and minidumps are NOT included in this file
}

Failures with different messages or crash signatures are bucketed separately, preserving distinct failure modes.

Task IDs and minidumps are omitted to reduce size. They are available in the detailed file.

Example: A test that fails 5 times in hour 10 with message A and 3 times with message B will have two entries:

{
  "counts": [5, 3],
  "hours": [10, 0],  // Both in same hour, so second delta is 0
  "messageIds": [23, 24]
}

	tor-browser The Tor Browser
	git clone https://git.dasho.dev/tor-browser.git
	Log \| Files \| Refs \| README \| LICENSE

tor-browser

XPCShell JSON Data Format Documentation

Overview

Test Timing Data Format

Top-Level Structure

metadata

tables

taskInfo

testInfo

testRuns

Resource Usage Data Format

Top-Level Structure

Lookup Tables

jobs

Data Compression Techniques

Index File Format

Notes

Aggregated Files Format

Detailed File (xpcshell-issues-with-taskids.json)

Differences from Daily Files

1. Metadata Changes

2. Passing Test Runs Are Aggregated

3. All Test Runs Aggregated by Hour

4. String Tables Are Merged

5. TaskInfo Only Contains Failed Tasks

6. Platform-Irrelevant Tests Are Filtered

Use Cases

Small File (xpcshell-issues.json)

Differences from xpcshell-issues-with-taskids.json

1. No taskInfo or taskIds

2. Reduced String Tables

3. No Task IDs - Only Counts