benchmark_stats.ts

Benchmark-specific statistical analysis. Uses the general stats utilities from stats.ts for timing/performance analysis. All timing values are in nanoseconds.

Declarations
#

6 declarations

view source

benchmark_stats_compare
#

benchmark_stats.ts view source

(a: BenchmarkStatsComparable, b: BenchmarkStatsComparable, options?: BenchmarkCompareOptions | undefined): BenchmarkComparison

Compare two benchmark results for practical and statistical significance. Uses percentage difference for effect magnitude classification, with Welch's t-test for statistical confidence. Cohen's d is computed as an informational metric but does not drive classification — its thresholds (0.2/0.5/0.8) are calibrated for social science and produce false positives in benchmarking where within-run variance is tight.

a

first benchmark stats (or any object with required properties)

b

second benchmark stats (or any object with required properties)

options?

comparison options

type BenchmarkCompareOptions | undefined
optional

returns

BenchmarkComparison

comparison result with significance, effect size, and recommendation

examples

const comparison = benchmark_stats_compare(result_a.stats, result_b.stats); if (comparison.significant) { console.log(`${comparison.faster} is ${comparison.speedup_ratio.toFixed(2)}x faster`); }

BenchmarkCompareOptions
#

benchmark_stats.ts view source

BenchmarkCompareOptions

Options for benchmark comparison.

alpha

Significance level for hypothesis testing (default: 0.05)

type number

min_percent_difference

Minimum percentage difference to consider practically meaningful, as a ratio. Below this threshold, differences are classified as 'negligible' and significant is forced to false, regardless of p-value. This prevents the t-test's oversensitivity at large sample sizes from flagging system-level noise (thermal throttle, OS scheduler, cache pressure) as meaningful differences.

Effect magnitude thresholds scale from this value: negligible < min, small < min*3, medium < min*5, large >= min*5.

Default: 0.10 (10%).

type number

BenchmarkComparison
#

benchmark_stats.ts view source

BenchmarkComparison

Result from comparing two benchmark stats.

faster

Which benchmark is faster ('a', 'b', or 'equal' if difference is negligible)

type 'a' | 'b' | 'equal'

speedup_ratio

How much faster the winner is (e.g., 1.5 means 1.5x faster)

type number

significant

Whether the difference is both statistically and practically significant

type boolean

p_value

P-value from Welch's t-test (lower = more confident the difference is real)

type number

percent_difference

Percentage difference between means as a ratio (0.05 = 5%, 1.0 = 100%)

type number

effect_size

Cohen's d effect size (informational — not used for classification)

type number

effect_magnitude

Interpretation of practical significance based on percentage difference

ci_overlap

Whether the 95% confidence intervals overlap

type boolean

recommendation

Human-readable interpretation of the comparison

type string

BenchmarkStats
#

benchmark_stats.ts view source

Complete statistical analysis of timing measurements. Includes outlier detection, descriptive statistics, and performance metrics. All timing values are in nanoseconds.

mean_ns

Mean (average) time in nanoseconds

type number

readonly

p50_ns

50th percentile (median) time in nanoseconds

type number

readonly

std_dev_ns

Standard deviation in nanoseconds

type number

readonly

min_ns

Minimum time in nanoseconds

type number

readonly

max_ns

Maximum time in nanoseconds

type number

readonly

p75_ns

75th percentile in nanoseconds

type number

readonly

p90_ns

90th percentile in nanoseconds

type number

readonly

p95_ns

95th percentile in nanoseconds

type number

readonly

p99_ns

99th percentile in nanoseconds

type number

readonly

cv

Coefficient of variation (std_dev / mean)

type number

readonly

confidence_interval_ns

95% confidence interval for the mean in nanoseconds

type [number, number]

readonly

outliers_ns

Array of detected outlier values in nanoseconds

type Array<number>

readonly

outlier_ratio

Ratio of outliers to total samples

type number

readonly

sample_size

Number of samples after outlier removal

type number

readonly

raw_sample_size

Original number of samples (before outlier removal)

type number

readonly

ops_per_second

Operations per second (NS_PER_SEC / mean_ns)

type number

readonly

failed_iterations

Number of failed iterations (NaN, Infinity, or negative values)

type number

readonly

constructor

type new (timings_ns: number[]): BenchmarkStats

timings_ns
type number[]

toString

Format stats as a human-readable string.

type (): string

returns string

BenchmarkStatsComparable
#

benchmark_stats.ts view source

BenchmarkStatsComparable

Minimal stats interface for comparison. This allows comparing stats from different sources (e.g., loaded baselines).

mean_ns

type number

std_dev_ns

type number

sample_size

type number

confidence_interval_ns

type [number, number]

EffectMagnitude
#

Depends on
#

Imported by
#