benchmark_stats.ts

Benchmark-specific statistical analysis.
Uses the general stats utilities from stats.ts for timing/performance analysis.
All timing values are in nanoseconds.

Declarations
#

6 declarations

view source

benchmark_stats_compare
#

benchmark_stats.ts view source

(a: BenchmarkStatsComparable, b: BenchmarkStatsComparable, options?: BenchmarkCompareOptions | undefined): BenchmarkComparison

Compare two benchmark results for practical and statistical significance.
Uses percentage difference for effect magnitude classification, with Welch's
t-test for statistical confidence. Cohen's d is computed as an informational
metric but does not drive classification — its thresholds (0.2/0.5/0.8) are
calibrated for social science and produce false positives in benchmarking
where within-run variance is tight.

`a`

first benchmark stats (or any object with required properties)

type BenchmarkStatsComparable

`b`

second benchmark stats (or any object with required properties)

type BenchmarkStatsComparable

`options?`

comparison options

type BenchmarkCompareOptions | undefined

optional

returns

BenchmarkComparison

comparison result with significance, effect size, and recommendation

examples

const comparison = benchmark_stats_compare(result_a.stats, result_b.stats);
if (comparison.significant) {
  console.log(`${comparison.faster} is ${comparison.speedup_ratio.toFixed(2)}x faster`);
}

BenchmarkCompareOptions
#

benchmark_stats.ts view source

BenchmarkCompareOptions

Options for benchmark comparison.

`alpha`

Significance level for hypothesis testing (default: 0.05)

type number

`min_percent_difference`

Minimum percentage difference to consider practically meaningful, as a ratio.
Below this threshold, differences are classified as 'negligible' and
significant is forced to false, regardless of p-value.
This prevents the t-test's oversensitivity at large sample sizes from
flagging system-level noise (thermal throttle, OS scheduler, cache pressure)
as meaningful differences.
Effect magnitude thresholds scale from this value:
negligible < min, small < min*3, medium < min*5, large >= min*5.
Default: 0.10 (10%).

type number

BenchmarkComparison
#

benchmark_stats.ts view source

BenchmarkComparison

Result from comparing two benchmark stats.

`faster`

Which benchmark is faster ('a', 'b', or 'equal' if difference is negligible)

type 'a' | 'b' | 'equal'

`speedup_ratio`

How much faster the winner is (e.g., 1.5 means 1.5x faster)

type number

`significant`

Whether the difference is both statistically and practically significant

type boolean

`p_value`

P-value from Welch's t-test (lower = more confident the difference is real)

type number

`percent_difference`

Percentage difference between means as a ratio (0.05 = 5%, 1.0 = 100%)

type number

`effect_size`

Cohen's d effect size (informational — not used for classification)

type number

`effect_magnitude`

Interpretation of practical significance based on percentage difference

type EffectMagnitude

`ci_overlap`

Whether the 95% confidence intervals overlap

type boolean

`recommendation`

Human-readable interpretation of the comparison

type string

BenchmarkStats
#

benchmark_stats.ts view source

Complete statistical analysis of timing measurements.
Includes outlier detection, descriptive statistics, and performance metrics.
All timing values are in nanoseconds.

`mean_ns`

Mean (average) time in nanoseconds

type number

readonly

`p50_ns`

50th percentile (median) time in nanoseconds

type number

readonly

`std_dev_ns`

Standard deviation in nanoseconds

type number

readonly

`min_ns`

Minimum time in nanoseconds

type number

readonly

`max_ns`

Maximum time in nanoseconds

type number

readonly

`p75_ns`

75th percentile in nanoseconds

type number

readonly

`p90_ns`

90th percentile in nanoseconds

type number

readonly

`p95_ns`

95th percentile in nanoseconds

type number

readonly

`p99_ns`

99th percentile in nanoseconds

type number

readonly

`cv`

Coefficient of variation (std_dev / mean)

type number

readonly

`confidence_interval_ns`

95% confidence interval for the mean in nanoseconds

type [number, number]

readonly

`outliers_ns`

Array of detected outlier values in nanoseconds

type Array<number>

readonly

`outlier_ratio`

Ratio of outliers to total samples

type number

readonly

`sample_size`

Number of samples after outlier removal

type number

readonly

`raw_sample_size`

Original number of samples (before outlier removal)

type number

readonly

`ops_per_second`

Operations per second (NS_PER_SEC / mean_ns)

type number

readonly

`failed_iterations`

Number of failed iterations (NaN, Infinity, or negative values)

type number

readonly

`constructor`

type new (timings_ns: number[]): BenchmarkStats

`timings_ns`

type number[]

`toString`

Format stats as a human-readable string.

type (): string

returns string

BenchmarkStatsComparable
#

benchmark_stats.ts view source

BenchmarkStatsComparable

Minimal stats interface for comparison.
This allows comparing stats from different sources (e.g., loaded baselines).

`mean_ns`

type number

`std_dev_ns`

type number

`sample_size`

type number

`confidence_interval_ns`

type [number, number]

EffectMagnitude
#

benchmark_stats.ts view source

EffectMagnitude

Effect size magnitude interpretation (Cohen's d).