API Reference¶

`jsharpe` ¶

JSharpe: Sharpe Ratio Analysis and Statistical Testing.

This package provides comprehensive tools for Sharpe ratio analysis, including statistical significance testing, multiple testing corrections, and portfolio optimization utilities.

Key features

Sharpe ratio variance estimation under non-Gaussian returns
Minimum track record length computation
Probabilistic Sharpe Ratio (PSR) calculation
False Discovery Rate (FDR) control for multiple strategy testing
Family-Wise Error Rate (FWER) corrections (Bonferroni, Šidák, Holm)
Minimum variance portfolio optimization

Example

import numpy as np from jsharpe.sharpe import sharpe_ratio_variance, probabilistic_sharpe_ratio

Compute variance of a Sharpe ratio estimate¶

var = sharpe_ratio_variance(SR=0.5, T=24)

Compute the Probabilistic Sharpe Ratio¶

psr = probabilistic_sharpe_ratio(SR=0.5, SR0=0, T=24)

`FDR_critical_value(q, SR0, SR1, sigma0, sigma1, p_H1)` ¶

Compute critical value for FDR control in hypothesis testing.

Given a mixture model where H ~ Bernoulli(p_H1) determines whether X follows N(SR0, sigma0^2) or N(SR1, sigma1^2), finds the critical value c such that P[H=0 | X > c] = q.

Parameters:

Name	Type	Description	Default
`q`	`float`	Desired False Discovery Rate in (0, 1).	required
`SR0`	`float`	Mean under null hypothesis (must be < SR1).	required
`SR1`	`float`	Mean under alternative hypothesis.	required
`sigma0`	`float`	Standard deviation under null hypothesis (must be > 0).	required
`sigma1`	`float`	Standard deviation under alternative hypothesis (must be > 0).	required
`p_H1`	`float`	Prior probability of alternative hypothesis in (0, 1).	required

Returns:

Type	Description
`float`	Critical value c. Returns -inf if solution is outside [-10, 10],
`float`	or nan if no solution exists.

Raises:

Type	Description
`AssertionError`	If parameters are out of valid ranges.

Example

c = FDR_critical_value(q=0.2, SR0=0, SR1=0.5, sigma0=0.2, sigma1=0.3, p_H1=0.1) c > 0 # Critical value should be positive True

Source code in jsharpe/sharpe/corrections.py

def FDR_critical_value(q: float, SR0: float, SR1: float, sigma0: float, sigma1: float, p_H1: float) -> float:
    """Compute critical value for FDR control in hypothesis testing.

    Given a mixture model where H ~ Bernoulli(p_H1) determines whether
    X follows N(SR0, sigma0^2) or N(SR1, sigma1^2), finds the critical
    value c such that P[H=0 | X > c] = q.

    Args:
        q: Desired False Discovery Rate in (0, 1).
        SR0: Mean under null hypothesis (must be < SR1).
        SR1: Mean under alternative hypothesis.
        sigma0: Standard deviation under null hypothesis (must be > 0).
        sigma1: Standard deviation under alternative hypothesis (must be > 0).
        p_H1: Prior probability of alternative hypothesis in (0, 1).

    Returns:
        Critical value c. Returns -inf if solution is outside [-10, 10],
        or nan if no solution exists.

    Raises:
        AssertionError: If parameters are out of valid ranges.

    Example:
        >>> c = FDR_critical_value(q=0.2, SR0=0, SR1=0.5, sigma0=0.2, sigma1=0.3, p_H1=0.1)
        >>> c > 0  # Critical value should be positive
        True
    """
    assert SR0 < SR1
    assert 0 < q < 1
    assert 0 < p_H1 < 1
    assert sigma0 > 0
    assert sigma1 > 0

    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", message="invalid value encountered in scalar divide")
        warnings.filterwarnings("ignore", message="divide by zero encountered in scalar divide")

        def f(c: float) -> float:
            """Compute the posterior probability P[H=0 | X > c].

            Args:
                c: Candidate critical value.

            Returns:
                Posterior false discovery probability at threshold c.
            """
            a = 1 / (
                1
                + scipy.stats.norm.sf((c - SR1) / sigma1) / scipy.stats.norm.sf((c - SR0) / sigma0) * p_H1 / (1 - p_H1)
            )
            return float(np.where(np.isfinite(a), a, 0))

        if f(-10) < q:  # Solution outside of the search interval
            return float(-np.inf)

        if (f(-10) - q) * (f(10) - q) > 0:  # No solution, for instance if σ₀≫σ₁ and q small
            return float(np.nan)

        return float(scipy.optimize.brentq(lambda c: f(c) - q, -10, 10))

`adjusted_p_values_bonferroni(ps)` ¶

Adjust p-values using Bonferroni correction for FWER control.

Multiplies each p-value by the number of tests M, capping at 1. This is the most conservative multiple testing correction.

Parameters:

Name	Type	Description	Default
`ps`	`ndarray`	Array of unadjusted p-values.	required

Returns:

Type	Description
`ndarray`	Array of Bonferroni-adjusted p-values, each in [0, 1].

Example

p_vals = np.array([0.01, 0.03, 0.05]) adj_p = adjusted_p_values_bonferroni(p_vals) np.allclose(adj_p, [0.03, 0.09, 0.15]) True

Source code in jsharpe/sharpe/corrections.py

def adjusted_p_values_bonferroni(ps: np.ndarray) -> np.ndarray:
    """Adjust p-values using Bonferroni correction for FWER control.

    Multiplies each p-value by the number of tests M, capping at 1.
    This is the most conservative multiple testing correction.

    Args:
        ps: Array of unadjusted p-values.

    Returns:
        Array of Bonferroni-adjusted p-values, each in [0, 1].

    Example:
        >>> p_vals = np.array([0.01, 0.03, 0.05])
        >>> adj_p = adjusted_p_values_bonferroni(p_vals)
        >>> np.allclose(adj_p, [0.03, 0.09, 0.15])
        True
    """
    M = len(ps)
    result: np.ndarray = np.minimum(1, M * ps)
    return result

`adjusted_p_values_holm(ps, *, variant='bonferroni')` ¶

Adjust p-values using Holm's step-down procedure for FWER control.

A step-down procedure that is uniformly more powerful than Bonferroni while still controlling the Family-Wise Error Rate.

Parameters:

Name	Type	Description	Default
`ps`	`ndarray`	Array of unadjusted p-values.	required
`variant`	`str`	Correction method for each step. Either "bonferroni" (default) or "sidak".	`'bonferroni'`

Returns:

Type	Description
`ndarray`	Array of Holm-adjusted p-values, each in [0, 1].

Raises:

Type	Description
`AssertionError`	If variant is not "bonferroni" or "sidak".

Example

p_vals = np.array([0.01, 0.04, 0.03]) adj_p = adjusted_p_values_holm(p_vals) float(adj_p[0]) # Smallest p-value adjusted most 0.03

Source code in jsharpe/sharpe/corrections.py

def adjusted_p_values_holm(ps: np.ndarray, *, variant: str = "bonferroni") -> np.ndarray:
    """Adjust p-values using Holm's step-down procedure for FWER control.

    A step-down procedure that is uniformly more powerful than Bonferroni
    while still controlling the Family-Wise Error Rate.

    Args:
        ps: Array of unadjusted p-values.
        variant: Correction method for each step. Either "bonferroni"
            (default) or "sidak".

    Returns:
        Array of Holm-adjusted p-values, each in [0, 1].

    Raises:
        AssertionError: If variant is not "bonferroni" or "sidak".

    Example:
        >>> p_vals = np.array([0.01, 0.04, 0.03])
        >>> adj_p = adjusted_p_values_holm(p_vals)
        >>> float(adj_p[0])  # Smallest p-value adjusted most
        0.03
    """
    assert variant in ["bonferroni", "sidak"]
    i = np.argsort(ps)
    M = len(ps)
    p_adjusted = np.zeros(M)
    previous = 0
    for j, idx in enumerate(i):
        candidate = min(1, ps[idx] * (M - j)) if variant == "bonferroni" else 1 - (1 - ps[idx]) ** (M - j)
        p_adjusted[idx] = max(previous, candidate)
        previous = p_adjusted[idx]
    return p_adjusted

`adjusted_p_values_sidak(ps)` ¶

Adjust p-values using Šidák correction for FWER control.

Uses the formula: 1 - (1 - p)^M, which is slightly less conservative than Bonferroni when tests are independent.

Parameters:

Name	Type	Description	Default
`ps`	`ndarray`	Array of unadjusted p-values.	required

Returns:

Type	Description
`ndarray`	Array of Šidák-adjusted p-values, each in [0, 1].

Example

p_vals = np.array([0.01, 0.03, 0.05]) adj_p = adjusted_p_values_sidak(p_vals) np.all(adj_p <= adjusted_p_values_bonferroni(p_vals)) np.True_

Source code in jsharpe/sharpe/corrections.py

def adjusted_p_values_sidak(ps: np.ndarray) -> np.ndarray:
    """Adjust p-values using Šidák correction for FWER control.

    Uses the formula: 1 - (1 - p)^M, which is slightly less conservative
    than Bonferroni when tests are independent.

    Args:
        ps: Array of unadjusted p-values.

    Returns:
        Array of Šidák-adjusted p-values, each in [0, 1].

    Example:
        >>> p_vals = np.array([0.01, 0.03, 0.05])
        >>> adj_p = adjusted_p_values_sidak(p_vals)
        >>> np.all(adj_p <= adjusted_p_values_bonferroni(p_vals))
        np.True_
    """
    M = len(ps)
    return 1 - (1 - ps) ** M

`autocorrelation(X)` ¶

Compute mean first-order autocorrelation across columns.

Calculates the lag-1 autocorrelation for each column of the input matrix and returns the mean across all columns.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Data matrix of shape (n_observations, n_series).	required

Returns:

Type	Description
`float`	Mean autocorrelation coefficient across all columns.

Example

np.random.seed(42)

Generate AR(1) process with rho=0.5¶

n = 1000 X = np.zeros((n, 1)) X[0] = np.random.normal() for i in range(1, n): ... X[i] = 0.5 * X[i-1] + np.sqrt(1-0.25) * np.random.normal() ac = autocorrelation(X) bool(0.4 < ac < 0.6) # Should be close to 0.5 True

Source code in jsharpe/sharpe/generators.py

def autocorrelation(X: np.ndarray) -> float:
    """Compute mean first-order autocorrelation across columns.

    Calculates the lag-1 autocorrelation for each column of the input
    matrix and returns the mean across all columns.

    Args:
        X: Data matrix of shape (n_observations, n_series).

    Returns:
        Mean autocorrelation coefficient across all columns.

    Example:
        >>> np.random.seed(42)
        >>> # Generate AR(1) process with rho=0.5
        >>> n = 1000
        >>> X = np.zeros((n, 1))
        >>> X[0] = np.random.normal()
        >>> for i in range(1, n):
        ...     X[i] = 0.5 * X[i-1] + np.sqrt(1-0.25) * np.random.normal()
        >>> ac = autocorrelation(X)
        >>> bool(0.4 < ac < 0.6)  # Should be close to 0.5
        True
    """
    _nr, nc = X.shape
    ac = np.zeros(nc)
    for i in range(nc):
        ac[i] = np.corrcoef(X[1:, i], X[:-1, i])[0, 1]
    return float(ac.mean())

`control_for_FDR(q, *, SR0=0, SR1=0.5, p_H1=0.05, T=24, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

Compute critical value to test multiple Sharpe ratios controlling FDR.

Determines the critical Sharpe ratio threshold and associated error rates to control the False Discovery Rate at level q when testing multiple strategies.

Parameters:

Name	Type	Description	Default
`q`	`float`	Desired False Discovery Rate level in (0, 1).	required
`SR0`	`float`	Sharpe ratio under null hypothesis H0. Default 0.	`0`
`SR1`	`float`	Sharpe ratio under alternative hypothesis H1. Default 0.5.	`0.5`
`p_H1`	`float`	Prior probability that H1 is true. Default 0.05.	`0.05`
`T`	`int`	Number of observations. Default 24.	`24`
`gamma3`	`float`	Skewness of returns. Default 0.	`0.0`
`gamma4`	`float`	Kurtosis of returns (non-excess). Default 3 (Gaussian).	`3.0`
`rho`	`float`	Autocorrelation of returns. Default 0.	`0.0`
`K`	`int`	Number of strategies (K=1 for FDR; K>1 for FWER-FDR). Default 1.	`1`

Returns:

Type	Description
`tuple[float, float, float, float]`	Tuple of (alpha, beta, SR_c, q_hat): - alpha: Significance level P[SR > SR_c \| H0] - beta: Type II error P[SR <= SR_c \| H1]; power is 1 - beta - SR_c: Critical Sharpe ratio threshold - q_hat: Estimated FDR (should be close to q)

Example

alpha, beta, SR_c, q_hat = control_for_FDR(q=0.25, T=24) bool(0 < alpha < 1) True bool(SR_c > 0) # Critical value is positive True

Source code in jsharpe/sharpe/corrections.py

def control_for_FDR(
    q: float,
    *,
    SR0: float = 0,
    SR1: float = 0.5,
    p_H1: float = 0.05,
    T: int = 24,
    gamma3: float = 0.0,
    gamma4: float = 3.0,
    rho: float = 0.0,
    K: int = 1,
) -> tuple[float, float, float, float]:
    """Compute critical value to test multiple Sharpe ratios controlling FDR.

    Determines the critical Sharpe ratio threshold and associated error rates
    to control the False Discovery Rate at level q when testing multiple
    strategies.

    Args:
        q: Desired False Discovery Rate level in (0, 1).
        SR0: Sharpe ratio under null hypothesis H0. Default 0.
        SR1: Sharpe ratio under alternative hypothesis H1. Default 0.5.
        p_H1: Prior probability that H1 is true. Default 0.05.
        T: Number of observations. Default 24.
        gamma3: Skewness of returns. Default 0.
        gamma4: Kurtosis of returns (non-excess). Default 3 (Gaussian).
        rho: Autocorrelation of returns. Default 0.
        K: Number of strategies (K=1 for FDR; K>1 for FWER-FDR). Default 1.

    Returns:
        Tuple of (alpha, beta, SR_c, q_hat):
            - alpha: Significance level P[SR > SR_c | H0]
            - beta: Type II error P[SR <= SR_c | H1]; power is 1 - beta
            - SR_c: Critical Sharpe ratio threshold
            - q_hat: Estimated FDR (should be close to q)

    Example:
        >>> alpha, beta, SR_c, q_hat = control_for_FDR(q=0.25, T=24)
        >>> bool(0 < alpha < 1)
        True
        >>> bool(SR_c > 0)  # Critical value is positive
        True
    """
    Z = scipy.stats.norm.cdf

    s0 = math.sqrt(sharpe_ratio_variance(SR0, T, gamma3=gamma3, gamma4=gamma4, rho=rho, K=K))
    s1 = math.sqrt(sharpe_ratio_variance(SR1, T, gamma3=gamma3, gamma4=gamma4, rho=rho, K=K))
    SRc = FDR_critical_value(q, SR0, SR1, s0, s1, p_H1)

    beta = Z((SRc - SR1) / s1)
    alpha = q / (1 - q) * p_H1 / (1 - p_H1) * (1 - beta)
    q_hat = 1 / (1 + (1 - beta) / alpha * p_H1 / (1 - p_H1))

    return alpha, beta, SRc, q_hat

`critical_sharpe_ratio(SR0, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05, K=1)` ¶

Compute critical Sharpe ratio for hypothesis testing.

Determines the threshold SR_c for the one-sided test

H0: SR = SR0 vs H1: SR > SR0

at significance level alpha.

Parameters:

Name	Type	Description	Default
`SR0`	`float`	Sharpe ratio under null hypothesis.	required
`T`	`int`	Number of observations.	required
`gamma3`	`float`	Skewness of returns. Default 0.	`0.0`
`gamma4`	`float`	Kurtosis of returns (non-excess). Default 3 (Gaussian).	`3.0`
`rho`	`float`	Autocorrelation of returns. Default 0.	`0.0`
`alpha`	`float`	Significance level. Default 0.05.	`0.05`
`K`	`int`	Number of strategies for variance adjustment. Default 1.	`1`

Returns:

Type	Description
`float`	Critical Sharpe ratio threshold. Reject H0 if observed SR > SR_c.

Example

SR_c = critical_sharpe_ratio(SR0=0, T=24, alpha=0.05) bool(SR_c > 0) # Need positive SR to reject H0: SR=0 True

Source code in jsharpe/sharpe/psr.py

def critical_sharpe_ratio(
    SR0: float,
    T: int,
    *,
    gamma3: float = 0.0,
    gamma4: float = 3.0,
    rho: float = 0.0,
    alpha: float = 0.05,
    K: int = 1,
) -> float:
    """Compute critical Sharpe ratio for hypothesis testing.

    Determines the threshold SR_c for the one-sided test:
        H0: SR = SR0  vs  H1: SR > SR0
    at significance level alpha.

    Args:
        SR0: Sharpe ratio under null hypothesis.
        T: Number of observations.
        gamma3: Skewness of returns. Default 0.
        gamma4: Kurtosis of returns (non-excess). Default 3 (Gaussian).
        rho: Autocorrelation of returns. Default 0.
        alpha: Significance level. Default 0.05.
        K: Number of strategies for variance adjustment. Default 1.

    Returns:
        Critical Sharpe ratio threshold. Reject H0 if observed SR > SR_c.

    Example:
        >>> SR_c = critical_sharpe_ratio(SR0=0, T=24, alpha=0.05)
        >>> bool(SR_c > 0)  # Need positive SR to reject H0: SR=0
        True
    """
    variance = sharpe_ratio_variance(SR0, T, gamma3=gamma3, gamma4=gamma4, rho=rho, K=K)
    return float(SR0 + scipy.stats.norm.ppf(1 - alpha) * math.sqrt(variance))

`effective_rank(C)` ¶

Compute the effective rank of a positive semi-definite matrix.

The effective rank measures the "effective dimensionality" of a matrix by computing the exponential of the entropy of its normalized eigenvalues. This provides a continuous measure between 1 (perfectly correlated) and n (perfectly uncorrelated/identity matrix).

Algorithm

Compute eigenvalues (non-negative for PSD matrices)
Discard zero eigenvalues
Normalize to form a probability distribution
Compute entropy H = -Σ p_i log(p_i)
Return exp(H)

Parameters:

Name	Type	Description	Default
`C`	`ndarray`	Positive semi-definite matrix (e.g., correlation matrix). Shape (n, n).	required

Returns:

Type	Description
`float`	Effective rank, a value in [1, n] where n is the matrix dimension.

References

Roy, O. and Vetterli, M. (2007). "The effective rank: a measure of effective dimensionality." EURASIP Journal on Advances in Signal Processing. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.2721

Example

import numpy as np

Identity matrix has effective rank equal to its dimension¶

abs(effective_rank(np.eye(3)) - 3.0) < 1e-10 True

Perfectly correlated matrix has effective rank 1¶

C = np.ones((3, 3)) abs(effective_rank(C) - 1.0) < 1e-10 True

Source code in jsharpe/sharpe/clustering.py

def effective_rank(C: np.ndarray) -> float:
    """Compute the effective rank of a positive semi-definite matrix.

    The effective rank measures the "effective dimensionality" of a matrix
    by computing the exponential of the entropy of its normalized eigenvalues.
    This provides a continuous measure between 1 (perfectly correlated) and
    n (perfectly uncorrelated/identity matrix).

    Algorithm:
        1. Compute eigenvalues (non-negative for PSD matrices)
        2. Discard zero eigenvalues
        3. Normalize to form a probability distribution
        4. Compute entropy H = -Σ p_i log(p_i)
        5. Return exp(H)

    Args:
        C: Positive semi-definite matrix (e.g., correlation matrix).
            Shape (n, n).

    Returns:
        Effective rank, a value in [1, n] where n is the matrix dimension.

    References:
        Roy, O. and Vetterli, M. (2007). "The effective rank: a measure of
        effective dimensionality." EURASIP Journal on Advances in Signal
        Processing. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.2721

    Example:
        >>> import numpy as np
        >>> # Identity matrix has effective rank equal to its dimension
        >>> abs(effective_rank(np.eye(3)) - 3.0) < 1e-10
        True
        >>> # Perfectly correlated matrix has effective rank 1
        >>> C = np.ones((3, 3))
        >>> abs(effective_rank(C) - 1.0) < 1e-10
        True
    """
    p = np.linalg.eigvalsh(C)
    p = p[p > 0]
    p = p / sum(p)
    H = np.sum(-p * np.log(p))
    return math.exp(H)

`expected_maximum_sharpe_ratio(number_of_trials, variance, SR0=0)` ¶

Compute expected maximum Sharpe ratio under multiple testing.

Estimates E[max(SR_1, ..., SR_K)] assuming K independent Sharpe ratio estimates, each with the same variance. Uses the Gumbel approximation for the expected maximum of normals.

Parameters:

Name	Type	Description	Default
`number_of_trials`	`int`	Number of strategies (K).	required
`variance`	`float`	Variance of individual Sharpe ratio estimates.	required
`SR0`	`float`	Baseline Sharpe ratio to add. Default 0.	`0`

Returns:

Type	Description
`float`	Expected value of the maximum Sharpe ratio across K strategies.

Example

Expected maximum increases with number of trials¶

e1 = expected_maximum_sharpe_ratio(1, 0.1) e10 = expected_maximum_sharpe_ratio(10, 0.1) bool(e10 > e1) True

Source code in jsharpe/sharpe/psr.py

def expected_maximum_sharpe_ratio(number_of_trials: int, variance: float, SR0: float = 0) -> float:
    """Compute expected maximum Sharpe ratio under multiple testing.

    Estimates E[max(SR_1, ..., SR_K)] assuming K independent Sharpe ratio
    estimates, each with the same variance. Uses the Gumbel approximation
    for the expected maximum of normals.

    Args:
        number_of_trials: Number of strategies (K).
        variance: Variance of individual Sharpe ratio estimates.
        SR0: Baseline Sharpe ratio to add. Default 0.

    Returns:
        Expected value of the maximum Sharpe ratio across K strategies.

    Example:
        >>> # Expected maximum increases with number of trials
        >>> e1 = expected_maximum_sharpe_ratio(1, 0.1)
        >>> e10 = expected_maximum_sharpe_ratio(10, 0.1)
        >>> bool(e10 > e1)
        True
    """
    return float(
        SR0
        + (
            np.sqrt(variance)
            * (
                (1 - np.euler_gamma) * scipy.stats.norm.ppf(1 - 1 / number_of_trials)
                + np.euler_gamma * scipy.stats.norm.ppf(1 - 1 / number_of_trials / np.exp(1))
            )
        )
    )

`generate_autocorrelated_non_gaussian_data(N, n, SR0=0, name='gaussian', rho=None, gaussian_autocorrelation=0)` ¶

Generate autocorrelated non-Gaussian return data for simulation.

Creates a matrix of simulated returns with specified autocorrelation and marginal distribution characteristics (skewness/kurtosis).

Uses a copula-like approach

Generate AR(1) Gaussian processes
Transform to uniform via Gaussian CDF
Transform to target marginals via inverse CDF

Parameters:

Name	Type	Description	Default
`N`	`int`	Number of time periods (rows).	required
`n`	`int`	Number of assets/strategies (columns).	required
`SR0`	`float`	Target Sharpe ratio. Default 0.	`0`
`name`	`str`	Distribution type. One of "gaussian", "mild", "moderate", "severe". Default "gaussian".	`'gaussian'`
`rho`	`float \| None`	Autocorrelation coefficient. If None, uses gaussian_autocorrelation.	`None`
`gaussian_autocorrelation`	`float`	Autocorrelation for Gaussian case. Default 0.	`0`

Returns:

Type	Description
`ndarray`	Array of shape (N, n) containing simulated returns.

Example

np.random.seed(42) X = generate_autocorrelated_non_gaussian_data(100, 2, SR0=0.1, name="mild") X.shape (100, 2)

Source code in jsharpe/sharpe/generators.py

def generate_autocorrelated_non_gaussian_data(
    N: int,
    n: int,
    SR0: float = 0,
    name: str = "gaussian",
    rho: float | None = None,
    gaussian_autocorrelation: float = 0,
) -> np.ndarray:
    """Generate autocorrelated non-Gaussian return data for simulation.

    Creates a matrix of simulated returns with specified autocorrelation
    and marginal distribution characteristics (skewness/kurtosis).

    Uses a copula-like approach:
        1. Generate AR(1) Gaussian processes
        2. Transform to uniform via Gaussian CDF
        3. Transform to target marginals via inverse CDF

    Args:
        N: Number of time periods (rows).
        n: Number of assets/strategies (columns).
        SR0: Target Sharpe ratio. Default 0.
        name: Distribution type. One of "gaussian", "mild", "moderate",
            "severe". Default "gaussian".
        rho: Autocorrelation coefficient. If None, uses gaussian_autocorrelation.
        gaussian_autocorrelation: Autocorrelation for Gaussian case. Default 0.

    Returns:
        Array of shape (N, n) containing simulated returns.

    Example:
        >>> np.random.seed(42)
        >>> X = generate_autocorrelated_non_gaussian_data(100, 2, SR0=0.1, name="mild")
        >>> X.shape
        (100, 2)
    """
    if rho is None:
        # With the distributions we consider the autocorrelation is almost the same.
        rho = gaussian_autocorrelation

    shape = (N, n)

    # Marginal distribution: ppf
    R = 10_000
    marginal = generate_non_gaussian_data(R, 1, SR0=SR0, name=name)[:, 0]
    ppf = scipy.interpolate.interp1d(ppoints(R), sorted(marginal), fill_value="extrapolate")

    # AR(1) processes
    X = np.random.normal(size=shape)
    for i in range(1, shape[0]):
        X[i, :] = rho * X[i - 1, :] + np.sqrt(1 - rho**2) * X[i, :]

    # Convert the margins to uniform, with the Gaussian cdf
    X = scipy.stats.norm.cdf(X)

    # Convert the uniforms to the target margins, using the ppf
    result: np.ndarray = ppf(X)

    return result

`generate_non_gaussian_data(nr, nc, *, SR0=0, name='severe')` ¶

Generate non-Gaussian return data with specified characteristics.

Creates a matrix of simulated returns from a mixture distribution that exhibits the specified skewness and kurtosis characteristics while maintaining the target Sharpe ratio.

Parameters:

Name	Type	Description	Default
`nr`	`int`	Number of rows (observations/time periods).	required
`nc`	`int`	Number of columns (assets/strategies).	required
`SR0`	`float`	Target Sharpe ratio. Default 0.	`0`
`name`	`str`	Distribution severity. One of: - "gaussian": No skewness or kurtosis - "mild": Slight negative skew and excess kurtosis - "moderate": Moderate negative skew and excess kurtosis - "severe": Strong negative skew and excess kurtosis Default "severe".	`'severe'`

Returns:

Type	Description
`ndarray`	Array of shape (nr, nc) containing simulated returns.

Raises:

Type	Description
`AssertionError`	If name is not a valid distribution type.

Example

np.random.seed(42) X = generate_non_gaussian_data(1000, 1, SR0=0.2, name="mild") X.shape (1000, 1)

Source code in jsharpe/sharpe/generators.py

def generate_non_gaussian_data(
    nr: int,
    nc: int,
    *,
    SR0: float = 0,
    name: str = "severe",
) -> np.ndarray:
    """Generate non-Gaussian return data with specified characteristics.

    Creates a matrix of simulated returns from a mixture distribution that
    exhibits the specified skewness and kurtosis characteristics while
    maintaining the target Sharpe ratio.

    Args:
        nr: Number of rows (observations/time periods).
        nc: Number of columns (assets/strategies).
        SR0: Target Sharpe ratio. Default 0.
        name: Distribution severity. One of:
            - "gaussian": No skewness or kurtosis
            - "mild": Slight negative skew and excess kurtosis
            - "moderate": Moderate negative skew and excess kurtosis
            - "severe": Strong negative skew and excess kurtosis
            Default "severe".

    Returns:
        Array of shape (nr, nc) containing simulated returns.

    Raises:
        AssertionError: If name is not a valid distribution type.

    Example:
        >>> np.random.seed(42)
        >>> X = generate_non_gaussian_data(1000, 1, SR0=0.2, name="mild")
        >>> X.shape
        (1000, 1)
    """
    configs = {
        "gaussian": (0, 0, 0.015, 0.010),
        "mild": (0.04, -0.03, 0.015, 0.010),
        "moderate": (0.03, -0.045, 0.020, 0.010),
        "severe": (0.02, -0.060, 0.025, 0.010),
    }
    assert name in configs

    def mixture_variance(
        p_tail: float,
        mu_tail: float,
        sigma_tail: float,
        mu_core: float,
        sigma_core: float,
    ) -> float:
        """Compute the variance of a two-component Gaussian mixture.

        Args:
            p_tail: Mixing weight of the tail component.
            mu_tail: Mean of the tail component.
            sigma_tail: Standard deviation of the tail component.
            mu_core: Mean of the core component.
            sigma_core: Standard deviation of the core component.

        Returns:
            Variance of the mixture distribution.
        """
        w = 1.0 - p_tail
        mu = w * mu_core + p_tail * mu_tail
        m2 = w * (sigma_core**2 + mu_core**2) + p_tail * (sigma_tail**2 + mu_tail**2)
        return float(m2 - mu**2)

    def gen_with_true_SR0(reps: int, T: int, cfg: tuple[float, float, float, float], SR0: float) -> np.ndarray:
        """Generate mixture returns scaled to a target population Sharpe ratio.

        Args:
            reps: Number of independent return series to generate.
            T: Length of each return series.
            cfg: Mixture config tuple (p_tail, mu_tail, sigma_tail, sigma_core).
            SR0: Target population Sharpe ratio.

        Returns:
            Array of shape (reps, T) with non-Gaussian returns at the given Sharpe ratio.
        """
        p, mu_tail, sig_tail, sig_core = cfg
        # Zero-mean baseline mixture (choose mu_core so mean=0)
        mu_core0 = -p * mu_tail / (1.0 - p)
        std0 = np.sqrt(mixture_variance(p, mu_tail, sig_tail, mu_core0, sig_core))
        mu_shift = SR0 * std0  # sets population Sharpe to SR0, preserves skew/kurt
        mask = np.random.uniform(size=(reps, T)) < p
        X = np.random.normal(mu_core0 + mu_shift, sig_core, size=(reps, T))
        X[mask] = np.random.normal(mu_tail + mu_shift, sig_tail, size=mask.sum())
        return X

    return gen_with_true_SR0(nr, nc, configs[name], SR0)

`get_random_correlation_matrix(number_of_trials=100, effective_number_of_trials=10, number_of_observations=200, noise=0.1)` ¶

Generate a random correlation matrix with block structure.

Creates a correlation matrix representing clustered strategies, where strategies within the same cluster are highly correlated and strategies across clusters have lower correlation.

Parameters:

Name	Type	Description	Default
`number_of_trials`	`int`	Number of time series (strategies). Default 100.	`100`
`effective_number_of_trials`	`int`	Number of clusters. Default 10.	`10`
`number_of_observations`	`int`	Number of time periods to simulate. Default 200.	`200`
`noise`	`float`	Noise level added to each series. Default 0.1.	`0.1`

Returns:

Type	Description
`tuple[ndarray, ndarray, ndarray]`	Tuple of (C, X, clusters): - C: Correlation matrix of shape (number_of_trials, number_of_trials) - X: Data matrix of shape (number_of_observations, number_of_trials) - clusters: Cluster assignment for each strategy

Example

np.random.seed(42) C, X, clusters = get_random_correlation_matrix( ... number_of_trials=20, effective_number_of_trials=4 ... ) C.shape (20, 20) np.allclose(np.diag(C), 1) # Diagonal is all ones True

Source code in jsharpe/sharpe/generators.py

def get_random_correlation_matrix(
    number_of_trials: int = 100,
    effective_number_of_trials: int = 10,
    number_of_observations: int = 200,
    noise: float = 0.1,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """Generate a random correlation matrix with block structure.

    Creates a correlation matrix representing clustered strategies, where
    strategies within the same cluster are highly correlated and strategies
    across clusters have lower correlation.

    Args:
        number_of_trials: Number of time series (strategies). Default 100.
        effective_number_of_trials: Number of clusters. Default 10.
        number_of_observations: Number of time periods to simulate. Default 200.
        noise: Noise level added to each series. Default 0.1.

    Returns:
        Tuple of (C, X, clusters):
            - C: Correlation matrix of shape (number_of_trials, number_of_trials)
            - X: Data matrix of shape (number_of_observations, number_of_trials)
            - clusters: Cluster assignment for each strategy

    Example:
        >>> np.random.seed(42)
        >>> C, X, clusters = get_random_correlation_matrix(
        ...     number_of_trials=20, effective_number_of_trials=4
        ... )
        >>> C.shape
        (20, 20)
        >>> np.allclose(np.diag(C), 1)  # Diagonal is all ones
        True
    """
    while True:
        block_positions = [
            0,
            *sorted(np.random.choice(number_of_trials, effective_number_of_trials - 1, replace=True)),
            number_of_trials,
        ]
        block_sizes = np.diff(block_positions)
        if np.all(block_sizes > 0):
            break

    clusters = np.array([block_number for block_number, size in enumerate(block_sizes) for _ in range(size)])
    X0 = np.random.normal(size=(number_of_observations, effective_number_of_trials))
    X = np.zeros(shape=(number_of_observations, number_of_trials))
    for i, cluster in enumerate(clusters):
        X[:, i] = X0[:, cluster] + noise * np.random.normal(size=number_of_observations)
    C = np.asarray(np.corrcoef(X, rowvar=False))
    np.fill_diagonal(C, 1)  # rounding errors
    C = np.clip(C, -1, 1)
    return C, X, clusters

`make_expectation_gh(n_nodes=200)` ¶

Create an expectation function using Gauss-Hermite quadrature.

Returns a function that computes E[g(Z)] where Z ~ N(0,1) using Gauss-Hermite quadrature with the specified number of nodes.

The approximation is: E[g(Z)] ≈ (1/√π) Σ w_i g(√2 t_i) where (t_i, w_i) are Gauss-Hermite nodes and weights.

Parameters:

Name	Type	Description	Default
`n_nodes`	`int`	Number of quadrature nodes. Higher values increase accuracy. Default 200.	`200`

Returns:

Type	Description
`Callable[[Callable[[ndarray], ndarray]], float]`	Function E(g) that computes E[g(Z)] for Z ~ N(0,1).

Example

E = make_expectation_gh(n_nodes=100)

E[Z^2] should be 1 for standard normal¶

bool(abs(E(lambda z: z**2) - 1.0) < 1e-6) True

E[Z] should be 0 for standard normal¶

bool(abs(E(lambda z: z)) < 1e-10) True

Source code in jsharpe/sharpe/quadrature.py

def make_expectation_gh(
    n_nodes: int = 200,
) -> Callable[[Callable[[np.ndarray], np.ndarray]], float]:
    """Create an expectation function using Gauss-Hermite quadrature.

    Returns a function that computes E[g(Z)] where Z ~ N(0,1) using
    Gauss-Hermite quadrature with the specified number of nodes.

    The approximation is: E[g(Z)] ≈ (1/√π) Σ w_i g(√2 t_i)
    where (t_i, w_i) are Gauss-Hermite nodes and weights.

    Args:
        n_nodes: Number of quadrature nodes. Higher values increase accuracy.
            Default 200.

    Returns:
        Function E(g) that computes E[g(Z)] for Z ~ N(0,1).

    Example:
        >>> E = make_expectation_gh(n_nodes=100)
        >>> # E[Z^2] should be 1 for standard normal
        >>> bool(abs(E(lambda z: z**2) - 1.0) < 1e-6)
        True
        >>> # E[Z] should be 0 for standard normal
        >>> bool(abs(E(lambda z: z)) < 1e-10)
        True
    """
    nodes, weights = np.polynomial.hermite.hermgauss(n_nodes)
    scale = np.sqrt(2.0)
    norm = 1.0 / np.sqrt(np.pi)
    x = scale * nodes

    def E(g: Callable[[np.ndarray], np.ndarray]) -> float:
        """Compute E[g(Z)] for Z ~ N(0,1) via Gauss-Hermite quadrature.

        Args:
            g: Function to integrate against the standard normal density.

        Returns:
            Approximation of E[g(Z)].
        """
        vals = g(x)
        return float(norm * np.dot(weights, vals))

    return E

`minimum_track_record_length(SR, SR0, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05)` ¶

Compute minimum track record length for statistical significance.

Determines the minimum number of observations T required for the observed Sharpe ratio SR to be significantly greater than SR0 at confidence level 1 - alpha.

Parameters:

Name	Type	Description	Default
`SR`	`float`	Observed Sharpe ratio.	required
`SR0`	`float`	Sharpe ratio under null hypothesis H0.	required
`gamma3`	`float`	Skewness of returns. Default 0.	`0.0`
`gamma4`	`float`	Kurtosis of returns (non-excess). Default 3 (Gaussian).	`3.0`
`rho`	`float`	Autocorrelation of returns. Default 0.	`0.0`
`alpha`	`float`	Significance level. Default 0.05.	`0.05`

Returns:

Type	Description
`float`	Minimum track record length (number of periods) required.

Example

Higher Sharpe ratio needs shorter track record¶

mtrl_high = minimum_track_record_length(SR=1.0, SR0=0) mtrl_low = minimum_track_record_length(SR=0.5, SR0=0) bool(mtrl_high < mtrl_low) True

Source code in jsharpe/sharpe/psr.py

def minimum_track_record_length(
    SR: float,
    SR0: float,
    *,
    gamma3: float = 0.0,
    gamma4: float = 3.0,
    rho: float = 0.0,
    alpha: float = 0.05,
) -> float:
    """Compute minimum track record length for statistical significance.

    Determines the minimum number of observations T required for the observed
    Sharpe ratio SR to be significantly greater than SR0 at confidence level
    1 - alpha.

    Args:
        SR: Observed Sharpe ratio.
        SR0: Sharpe ratio under null hypothesis H0.
        gamma3: Skewness of returns. Default 0.
        gamma4: Kurtosis of returns (non-excess). Default 3 (Gaussian).
        rho: Autocorrelation of returns. Default 0.
        alpha: Significance level. Default 0.05.

    Returns:
        Minimum track record length (number of periods) required.

    Example:
        >>> # Higher Sharpe ratio needs shorter track record
        >>> mtrl_high = minimum_track_record_length(SR=1.0, SR0=0)
        >>> mtrl_low = minimum_track_record_length(SR=0.5, SR0=0)
        >>> bool(mtrl_high < mtrl_low)
        True
    """
    var = sharpe_ratio_variance(SR0, T=1, gamma3=gamma3, gamma4=gamma4, rho=rho, K=1)
    return float(var * (scipy.stats.norm.ppf(1 - alpha) / (SR - SR0)) ** 2)

`minimum_variance_weights_for_correlated_assets(V)` ¶

Compute weights of the minimum variance portfolio for correlated assets.

Computes the portfolio weights that minimize portfolio variance subject to the constraint that weights sum to 1. Assumes a constant-correlation covariance structure for efficient computation.

Parameters:

Name	Type	Description	Default
`V`	`ndarray`	Covariance matrix of asset returns. Shape (n, n).	required

Returns:

Type	Description
`ndarray`	Portfolio weights that minimize variance. Shape (n,).
`ndarray`	Weights sum to 1.

Example

import numpy as np

Create a simple covariance matrix¶

V = np.array([[0.04, 0.01], [0.01, 0.09]]) w = minimum_variance_weights_for_correlated_assets(V) np.isclose(w.sum(), 1.0) np.True_

Source code in jsharpe/sharpe/linalg.py

def minimum_variance_weights_for_correlated_assets(V: np.ndarray) -> np.ndarray:
    """Compute weights of the minimum variance portfolio for correlated assets.

    Computes the portfolio weights that minimize portfolio variance subject
    to the constraint that weights sum to 1. Assumes a constant-correlation
    covariance structure for efficient computation.

    Args:
        V: Covariance matrix of asset returns. Shape (n, n).

    Returns:
        Portfolio weights that minimize variance. Shape (n,).
        Weights sum to 1.

    Example:
        >>> import numpy as np
        >>> # Create a simple covariance matrix
        >>> V = np.array([[0.04, 0.01], [0.01, 0.09]])
        >>> w = minimum_variance_weights_for_correlated_assets(V)
        >>> np.isclose(w.sum(), 1.0)
        np.True_
    """
    ones = np.ones(shape=V.shape[0])
    S = robust_covariance_inverse(V)
    w = S @ ones
    w = w / np.sum(w)
    return w

`number_of_clusters(C, *, retries=10, max_clusters=100)` ¶

Compute the optimal number of clusters from a correlation matrix.

Implements the algorithm from section 8.1 of Lopez de Prado (2018): 1. Convert the correlation matrix into a distance matrix. 2. Using the columns of the distance matrix as features, run the k-means algorithm for each k and compute the quality of the clustering. 3. Return the clustering with the highest quality.

Quality is defined as the mean of the silhouette scores divided by their standard deviation.

Parameters:

Name	Type	Description	Default
`C`	`ndarray`	Correlation matrix. Must be square, symmetric, finite, with ones on the diagonal and all entries in [-1, 1].	required
`retries`	`int`	Number of times to run k-means for each k to reduce the impact of random initialisation. Default 10.	`10`
`max_clusters`	`int`	Maximum number of clusters to evaluate. Capped at `C.shape[0] - 1`. Default 100.	`100`

Returns:

Type	Description
`tuple[int, dict[int, float], ndarray]`	Tuple of (n_clusters, qualities, labels): - n_clusters: Optimal number of clusters. - qualities: Dict mapping k to its quality score. - labels: Cluster assignment for each observation (shape (n,)).

References

Lopez de Prado, M. (2018). "Detection of false investment strategies using unsupervised learning methods." SSRN 3167017. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3167017

Example

from jsharpe.sharpe.generators import get_random_correlation_matrix np.random.seed(42) C, _, _ = get_random_correlation_matrix( ... number_of_trials=20, effective_number_of_trials=4 ... ) n, qualities, labels = number_of_clusters(C, retries=3, max_clusters=8) 2 <= n <= 8 True labels.shape (20,)

Source code in jsharpe/sharpe/clustering.py

def number_of_clusters(
    C: np.ndarray,
    *,
    retries: int = 10,
    max_clusters: int = 100,
) -> tuple[int, dict[int, float], np.ndarray]:
    """Compute the optimal number of clusters from a correlation matrix.

    Implements the algorithm from section 8.1 of Lopez de Prado (2018):
        1. Convert the correlation matrix into a distance matrix.
        2. Using the columns of the distance matrix as features, run the
           k-means algorithm for each k and compute the quality of the
           clustering.
        3. Return the clustering with the highest quality.

    Quality is defined as the mean of the silhouette scores divided by
    their standard deviation.

    Args:
        C: Correlation matrix. Must be square, symmetric, finite, with ones
            on the diagonal and all entries in [-1, 1].
        retries: Number of times to run k-means for each k to reduce the
            impact of random initialisation. Default 10.
        max_clusters: Maximum number of clusters to evaluate. Capped at
            ``C.shape[0] - 1``. Default 100.

    Returns:
        Tuple of (n_clusters, qualities, labels):
            - n_clusters: Optimal number of clusters.
            - qualities: Dict mapping k to its quality score.
            - labels: Cluster assignment for each observation (shape (n,)).

    References:
        Lopez de Prado, M. (2018). "Detection of false investment strategies
        using unsupervised learning methods." SSRN 3167017.
        https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3167017

    Example:
        >>> from jsharpe.sharpe.generators import get_random_correlation_matrix
        >>> np.random.seed(42)
        >>> C, _, _ = get_random_correlation_matrix(
        ...     number_of_trials=20, effective_number_of_trials=4
        ... )
        >>> n, qualities, labels = number_of_clusters(C, retries=3, max_clusters=8)
        >>> 2 <= n <= 8
        True
        >>> labels.shape
        (20,)
    """
    assert isinstance(C, np.ndarray)
    assert np.all(C >= -1)
    assert np.all(C <= 1)
    assert np.all(np.diag(C) == 1)
    assert np.all(np.isfinite(C))

    max_clusters = min(max_clusters, C.shape[0] - 1)

    # Convert correlations to distances
    D = np.sqrt((1 - C) / 2)
    assert np.all(np.isfinite(D))

    qualities: dict[int, float] = {}
    best_labels: dict[int, np.ndarray] = {}
    for k in range(2, max_clusters + 1):
        qualities[k] = -np.inf
        for _ in range(retries):
            with warnings.catch_warnings():
                warnings.simplefilter("ignore")
                _centroids, labels = scipy.cluster.vq.kmeans2(D, k, minit="points", iter=300)
            # Skip degenerate solutions with empty clusters
            if len(np.unique(labels)) < k:
                continue
            silhouette_vals = _silhouette_samples(D, labels)
            std = silhouette_vals.std()
            if std == 0:
                continue
            q = float(silhouette_vals.mean() / std)
            if q > qualities[k]:
                qualities[k] = q
                best_labels[k] = labels.copy()

    # Select the best k among those for which a valid solution was found
    valid_k = {k: q for k, q in qualities.items() if k in best_labels}
    if not valid_k:
        raise RuntimeError("No valid clustering solution found; try increasing retries or reducing max_clusters.")
    best_k = max(valid_k, key=lambda x: valid_k[x])
    return best_k, qualities, best_labels[best_k]

`oFDR(SR, SR0, SR1, T, p_H1, *, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

Compute observed FDR given the observed Sharpe ratio.

Calculates P[H0 | SR > SR_obs], the probability that the null hypothesis is true given the observed Sharpe ratio value. This is the "observed" FDR which conditions on the actual observation rather than just the critical value.

Parameters:

Name	Type	Description	Default
`SR`	`float`	Observed Sharpe ratio.	required
`SR0`	`float`	Sharpe ratio under null hypothesis.	required
`SR1`	`float`	Sharpe ratio under alternative hypothesis.	required
`T`	`int`	Number of observations.	required
`p_H1`	`float`	Prior probability that H1 is true.	required
`gamma3`	`float`	Skewness of returns. Default 0.	`0.0`
`gamma4`	`float`	Kurtosis of returns (non-excess). Default 3 (Gaussian).	`3.0`
`rho`	`float`	Autocorrelation of returns. Default 0.	`0.0`
`K`	`int`	Number of strategies for variance adjustment. Default 1.	`1`

Returns:

Type	Description
`float`	Posterior probability of H0 given the observed SR.

Example

Higher observed SR should give lower probability of H0¶

fdr_low = oFDR(SR=0.3, SR0=0, SR1=0.5, T=24, p_H1=0.1) fdr_high = oFDR(SR=0.8, SR0=0, SR1=0.5, T=24, p_H1=0.1) bool(fdr_high < fdr_low) True

Source code in jsharpe/sharpe/corrections.py

def oFDR(
    SR: float,
    SR0: float,
    SR1: float,
    T: int,
    p_H1: float,
    *,
    gamma3: float = 0.0,
    gamma4: float = 3.0,
    rho: float = 0.0,
    K: int = 1,
) -> float:
    """Compute observed FDR given the observed Sharpe ratio.

    Calculates P[H0 | SR > SR_obs], the probability that the null hypothesis
    is true given the observed Sharpe ratio value. This is the "observed"
    FDR which conditions on the actual observation rather than just the
    critical value.

    Args:
        SR: Observed Sharpe ratio.
        SR0: Sharpe ratio under null hypothesis.
        SR1: Sharpe ratio under alternative hypothesis.
        T: Number of observations.
        p_H1: Prior probability that H1 is true.
        gamma3: Skewness of returns. Default 0.
        gamma4: Kurtosis of returns (non-excess). Default 3 (Gaussian).
        rho: Autocorrelation of returns. Default 0.
        K: Number of strategies for variance adjustment. Default 1.

    Returns:
        Posterior probability of H0 given the observed SR.

    Example:
        >>> # Higher observed SR should give lower probability of H0
        >>> fdr_low = oFDR(SR=0.3, SR0=0, SR1=0.5, T=24, p_H1=0.1)
        >>> fdr_high = oFDR(SR=0.8, SR0=0, SR1=0.5, T=24, p_H1=0.1)
        >>> bool(fdr_high < fdr_low)
        True
    """
    p0 = 1 - probabilistic_sharpe_ratio(SR, SR0, T=T, gamma3=gamma3, gamma4=gamma4, rho=rho, K=K)
    p1 = 1 - probabilistic_sharpe_ratio(SR, SR1, T=T, gamma3=gamma3, gamma4=gamma4, rho=rho, K=K)
    p_H0 = 1 - p_H1
    return p0 * p_H0 / (p0 * p_H0 + p1 * p_H1)

`pFDR(p_H1, alpha, beta)` ¶

Compute posterior FDR given test outcome exceeds critical value.

Calculates P[H0 | SR > SR_c], the probability that the null hypothesis is true given that the observed Sharpe ratio exceeds the critical value. This is the "predictive" FDR based on the critical value, not the observed value.

Parameters:

Name	Type	Description	Default
`p_H1`	`float`	Prior probability that H1 is true.	required
`alpha`	`float`	Significance level (Type I error rate).	required
`beta`	`float`	Type II error rate (1 - power).	required

Returns:

Type	Description
`float`	Posterior probability of H0 given rejection.

Example

With 5% prior on H1 and 5% significance¶

fdr = pFDR(p_H1=0.05, alpha=0.05, beta=0.3) 0 < fdr < 1 True

Source code in jsharpe/sharpe/corrections.py

def pFDR(
    p_H1: float,
    alpha: float,
    beta: float,
) -> float:
    """Compute posterior FDR given test outcome exceeds critical value.

    Calculates P[H0 | SR > SR_c], the probability that the null hypothesis
    is true given that the observed Sharpe ratio exceeds the critical value.
    This is the "predictive" FDR based on the critical value, not the
    observed value.

    Args:
        p_H1: Prior probability that H1 is true.
        alpha: Significance level (Type I error rate).
        beta: Type II error rate (1 - power).

    Returns:
        Posterior probability of H0 given rejection.

    Example:
        >>> # With 5% prior on H1 and 5% significance
        >>> fdr = pFDR(p_H1=0.05, alpha=0.05, beta=0.3)
        >>> 0 < fdr < 1
        True
    """
    p_H0 = 1 - p_H1
    return 1 / (1 + (1 - beta) * p_H1 / alpha / p_H0)

`ppoints(n, a=None)` ¶

Generate probability points for Q-Q plots and distribution fitting.

Creates n equidistant points in the interval (0, 1), suitable for probability plotting positions. The boundaries 0 and 1 are excluded. This function mirrors the behavior of R's ppoints() function.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of probability points to generate.	required
`a`	`float \| None`	Plotting position parameter in [0, 1]. Defaults to 0.5 if n > 10, otherwise ⅜ (following R's convention).	`None`

Returns:

Type	Description
`ndarray`	Array of n equidistant probability points in (0, 1).

Raises:

Type	Description
`ValueError`	If a is not in [0, 1].

Example

import numpy as np pts = ppoints(5) len(pts) 5 np.all((pts > 0) & (pts < 1)) np.True_ ppoints(5, a=0.5) array([0.1, 0.3, 0.5, 0.7, 0.9])

Source code in jsharpe/sharpe/linalg.py

def ppoints(n: int, a: float | None = None) -> np.ndarray:
    """Generate probability points for Q-Q plots and distribution fitting.

    Creates n equidistant points in the interval (0, 1), suitable for
    probability plotting positions. The boundaries 0 and 1 are excluded.
    This function mirrors the behavior of R's `ppoints()` function.

    Args:
        n: Number of probability points to generate.
        a: Plotting position parameter in [0, 1]. Defaults to 0.5 if n > 10,
            otherwise 3/8 (following R's convention).

    Returns:
        Array of n equidistant probability points in (0, 1).

    Raises:
        ValueError: If a is not in [0, 1].

    Example:
        >>> import numpy as np
        >>> pts = ppoints(5)
        >>> len(pts)
        5
        >>> np.all((pts > 0) & (pts < 1))
        np.True_
        >>> ppoints(5, a=0.5)
        array([0.1, 0.3, 0.5, 0.7, 0.9])
    """
    if a is None:
        a = 0.5 if n > 10 else 3 / 8
    if not (0 <= a <= 1):
        raise ValueError(f"the offset should be in [0,1], got {a}")
    return np.linspace(1 - a, n - a, n) / (n + 1 - 2 * a)

`probabilistic_sharpe_ratio(SR, SR0, *, variance=None, T=None, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

Compute the Probabilistic Sharpe Ratio (PSR).

The PSR is 1 - p, where p is the p-value of testing H0: SR = SR0 vs H1: SR > SR0. It can be interpreted as a "Sharpe ratio on a probability scale", i.e., mapping the SR to [0, 1].

Parameters:

Name	Type	Description	Default
`SR`	`float`	Observed Sharpe ratio.	required
`SR0`	`float`	Sharpe ratio under null hypothesis.	required
`variance`	`float \| None`	Variance of SR estimator. Provide this OR (T, gamma3, ...).	`None`
`T`	`int \| None`	Number of observations (if variance not provided).	`None`
`gamma3`	`float`	Skewness of returns. Default 0.	`0.0`
`gamma4`	`float`	Kurtosis of returns (non-excess). Default 3 (Gaussian).	`3.0`
`rho`	`float`	Autocorrelation of returns. Default 0.	`0.0`
`K`	`int`	Number of strategies for variance adjustment. Default 1.	`1`

Returns:

Type	Description
`float`	Probabilistic Sharpe Ratio in [0, 1]. Values near 1 indicate
`float`	strong evidence that the true SR exceeds SR0.

Raises:

Type	Description
`AssertionError`	If both variance and T are provided.

Example

psr = probabilistic_sharpe_ratio(SR=0.5, SR0=0, T=24) bool(0 < psr < 1) True

Higher observed SR gives higher PSR¶

psr_high = probabilistic_sharpe_ratio(SR=1.0, SR0=0, T=24) bool(psr_high > psr) True

Source code in jsharpe/sharpe/psr.py

def probabilistic_sharpe_ratio(
    SR: float,
    SR0: float,
    *,
    variance: float | None = None,
    T: int | None = None,
    gamma3: float = 0.0,
    gamma4: float = 3.0,
    rho: float = 0.0,
    K: int = 1,
) -> float:
    """Compute the Probabilistic Sharpe Ratio (PSR).

    The PSR is 1 - p, where p is the p-value of testing H0: SR = SR0 vs
    H1: SR > SR0. It can be interpreted as a "Sharpe ratio on a probability
    scale", i.e., mapping the SR to [0, 1].

    Args:
        SR: Observed Sharpe ratio.
        SR0: Sharpe ratio under null hypothesis.
        variance: Variance of SR estimator. Provide this OR (T, gamma3, ...).
        T: Number of observations (if variance not provided).
        gamma3: Skewness of returns. Default 0.
        gamma4: Kurtosis of returns (non-excess). Default 3 (Gaussian).
        rho: Autocorrelation of returns. Default 0.
        K: Number of strategies for variance adjustment. Default 1.

    Returns:
        Probabilistic Sharpe Ratio in [0, 1]. Values near 1 indicate
        strong evidence that the true SR exceeds SR0.

    Raises:
        AssertionError: If both variance and T are provided.

    Example:
        >>> psr = probabilistic_sharpe_ratio(SR=0.5, SR0=0, T=24)
        >>> bool(0 < psr < 1)
        True
        >>> # Higher observed SR gives higher PSR
        >>> psr_high = probabilistic_sharpe_ratio(SR=1.0, SR0=0, T=24)
        >>> bool(psr_high > psr)
        True
    """
    if variance is None:
        assert T is not None, "T must be provided if variance is not provided"
        variance = sharpe_ratio_variance(SR0, T, gamma3=gamma3, gamma4=gamma4, rho=rho, K=K)
    else:
        assert T is None, "Provide either the variance or (T, gamma3, gamma4, rho)"
    return float(scipy.stats.norm.cdf((SR - SR0) / math.sqrt(variance)))

`robust_covariance_inverse(V)` ¶

Compute inverse of a constant-correlation covariance matrix.

Uses the Sherman-Morrison formula for efficient computation. Assumes the variance matrix has the form: $V = \rho \sigma \sigma' + (1-\rho) \text{diag}(\sigma^2)$ (constant correlations across all pairs).

Its inverse is computed as: $V^{-1} = A^{-1} - \dfrac{ A^{-1} \rho \sigma \sigma' A^{-1} } { 1 + \rho \sigma' A^{-1} \sigma }$

Parameters:

Name	Type	Description	Default
`V`	`ndarray`	Covariance matrix with constant off-diagonal correlations. Shape (n, n).	required

Returns:

Type	Description
`ndarray`	Inverse of the covariance matrix. Shape (n, n).

Example

import numpy as np

Create a constant-correlation covariance matrix¶

rho = 0.5 sigma = np.array([1.0, 2.0, 1.5]) C = rho * np.ones((3, 3)) np.fill_diagonal(C, 1) V = (C * sigma.reshape(-1, 1)).T * sigma.reshape(-1, 1) V_inv = robust_covariance_inverse(V) np.allclose(V @ V_inv, np.eye(3), atol=1e-10) True

Source code in jsharpe/sharpe/linalg.py

def robust_covariance_inverse(V: np.ndarray) -> np.ndarray:
    r"""Compute inverse of a constant-correlation covariance matrix.

    Uses the Sherman-Morrison formula for efficient computation.
    Assumes the variance matrix has the form:
    $V = \rho \sigma \sigma' + (1-\rho) \text{diag}(\sigma^2)$
    (constant correlations across all pairs).

    Its inverse is computed as:
    $V^{-1} = A^{-1} - \dfrac{ A^{-1} \rho \sigma \sigma' A^{-1} }
    { 1 + \rho \sigma' A^{-1} \sigma }$

    Args:
        V: Covariance matrix with constant off-diagonal correlations.
            Shape (n, n).

    Returns:
        Inverse of the covariance matrix. Shape (n, n).

    Example:
        >>> import numpy as np
        >>> # Create a constant-correlation covariance matrix
        >>> rho = 0.5
        >>> sigma = np.array([1.0, 2.0, 1.5])
        >>> C = rho * np.ones((3, 3))
        >>> np.fill_diagonal(C, 1)
        >>> V = (C * sigma.reshape(-1, 1)).T * sigma.reshape(-1, 1)
        >>> V_inv = robust_covariance_inverse(V)
        >>> np.allclose(V @ V_inv, np.eye(3), atol=1e-10)
        True
    """
    sigma = np.sqrt(np.diag(V))
    C = (V.T / sigma).T / sigma
    rho = np.mean(C[np.triu_indices_from(C, 1)])
    A = np.diag(1 / sigma**2) / (1 - rho)
    sigma = sigma.reshape(-1, 1)
    result: np.ndarray = A - (rho * A @ sigma @ sigma.T @ A) / (1 + rho * sigma.T @ A @ sigma)
    return result

`sharpe_ratio_power(SR0, SR1, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05, K=1)` ¶

Compute statistical power for Sharpe ratio hypothesis test.

Calculates the power (1 - β) of the test H0: SR = SR0 vs H1: SR = SR1, which is the probability of correctly rejecting H0 when the true Sharpe ratio is SR1.

Power is equivalent to recall in classification:

Power = P[reject H0 | H1] = TP / (TP + FN)

Parameters:

Name	Type	Description	Default
`SR0`	`float`	Sharpe ratio under null hypothesis.	required
`SR1`	`float`	Sharpe ratio under alternative hypothesis (should be > SR0).	required
`T`	`int`	Number of observations.	required
`gamma3`	`float`	Skewness of returns. Default 0.	`0.0`
`gamma4`	`float`	Kurtosis of returns (non-excess). Default 3 (Gaussian).	`3.0`
`rho`	`float`	Autocorrelation of returns. Default 0.	`0.0`
`alpha`	`float`	Significance level. Default 0.05.	`0.05`
`K`	`int`	Number of strategies for variance adjustment. Default 1.	`1`

Returns:

Type	Description
`float`	Statistical power in [0, 1].

Example

More observations increases power¶

power_short = sharpe_ratio_power(SR0=0, SR1=0.5, T=12) power_long = sharpe_ratio_power(SR0=0, SR1=0.5, T=48) bool(power_long > power_short) True

Source code in jsharpe/sharpe/psr.py

def sharpe_ratio_power(
    SR0: float,
    SR1: float,
    T: int,
    *,
    gamma3: float = 0.0,
    gamma4: float = 3.0,
    rho: float = 0.0,
    alpha: float = 0.05,
    K: int = 1,
) -> float:
    """Compute statistical power for Sharpe ratio hypothesis test.

    Calculates the power (1 - β) of the test H0: SR = SR0 vs H1: SR = SR1,
    which is the probability of correctly rejecting H0 when the true
    Sharpe ratio is SR1.

    Note: Power is equivalent to recall in classification:
        Power = P[reject H0 | H1] = TP / (TP + FN)

    Args:
        SR0: Sharpe ratio under null hypothesis.
        SR1: Sharpe ratio under alternative hypothesis (should be > SR0).
        T: Number of observations.
        gamma3: Skewness of returns. Default 0.
        gamma4: Kurtosis of returns (non-excess). Default 3 (Gaussian).
        rho: Autocorrelation of returns. Default 0.
        alpha: Significance level. Default 0.05.
        K: Number of strategies for variance adjustment. Default 1.

    Returns:
        Statistical power in [0, 1].

    Example:
        >>> # More observations increases power
        >>> power_short = sharpe_ratio_power(SR0=0, SR1=0.5, T=12)
        >>> power_long = sharpe_ratio_power(SR0=0, SR1=0.5, T=48)
        >>> bool(power_long > power_short)
        True
    """
    critical_SR = critical_sharpe_ratio(SR0, T, gamma3=gamma3, gamma4=gamma4, rho=rho, alpha=alpha)
    variance = sharpe_ratio_variance(SR1, T, gamma3=gamma3, gamma4=gamma4, rho=rho, K=K)
    beta = scipy.stats.norm.cdf((critical_SR - SR1) / math.sqrt(variance))
    return float(1 - beta)

`sharpe_ratio_variance(SR, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

Compute the asymptotic variance of the Sharpe ratio estimator.

Accounts for non-Gaussian returns (skewness, kurtosis) and autocorrelation. Under Gaussian iid returns, reduces to (1 + SR^2/2) / T.

Parameters:

Name	Type	Description	Default
`SR`	`float`	Sharpe ratio value (annualized or per-period).	required
`T`	`int`	Number of observations (time periods).	required
`gamma3`	`float`	Skewness of returns. Default 0 (symmetric).	`0.0`
`gamma4`	`float`	Kurtosis of returns (non-excess). Default 3 (Gaussian).	`3.0`
`rho`	`float`	First-order autocorrelation of returns. Default 0 (iid).	`0.0`
`K`	`int`	Number of strategies for multiple testing adjustment. Default 1.	`1`

Returns:

Type	Description
`float`	Variance of the Sharpe ratio estimator.

Example

Variance under Gaussian iid assumptions¶

var_gaussian = sharpe_ratio_variance(SR=0.5, T=24)

Variance with non-Gaussian returns (higher kurtosis)¶

var_nongauss = sharpe_ratio_variance(SR=0.5, T=24, gamma4=6.0) bool(var_nongauss > var_gaussian) # Higher kurtosis increases variance True

Source code in jsharpe/sharpe/psr.py

def sharpe_ratio_variance(
    SR: float,
    T: int,
    *,
    gamma3: float = 0.0,
    gamma4: float = 3.0,
    rho: float = 0.0,
    K: int = 1,
) -> float:
    """Compute the asymptotic variance of the Sharpe ratio estimator.

    Accounts for non-Gaussian returns (skewness, kurtosis) and autocorrelation.
    Under Gaussian iid returns, reduces to (1 + SR^2/2) / T.

    Args:
        SR: Sharpe ratio value (annualized or per-period).
        T: Number of observations (time periods).
        gamma3: Skewness of returns. Default 0 (symmetric).
        gamma4: Kurtosis of returns (non-excess). Default 3 (Gaussian).
        rho: First-order autocorrelation of returns. Default 0 (iid).
        K: Number of strategies for multiple testing adjustment. Default 1.

    Returns:
        Variance of the Sharpe ratio estimator.

    Example:
        >>> # Variance under Gaussian iid assumptions
        >>> var_gaussian = sharpe_ratio_variance(SR=0.5, T=24)
        >>> # Variance with non-Gaussian returns (higher kurtosis)
        >>> var_nongauss = sharpe_ratio_variance(SR=0.5, T=24, gamma4=6.0)
        >>> bool(var_nongauss > var_gaussian)  # Higher kurtosis increases variance
        True
    """
    A = 1
    B = rho / (1 - rho)
    C = rho**2 / (1 - rho**2)
    a = A + 2 * B
    b = A + B + C
    c = A + 2 * C
    V = (a * 1 - b * gamma3 * SR + c * (gamma4 - 1) / 4 * SR**2) / T
    return float(V * moments_Mk(K)[2])

`variance_of_the_maximum_of_k_Sharpe_ratios(number_of_trials, variance)` ¶

Compute the variance of the maximum Sharpe ratio across K strategies.

Selection across a larger pool increases the uncertainty of the selected (maximum) Sharpe ratio estimate. This function applies a logarithmic variance inflation factor that increases monotonically with K.

Parameters:

Name	Type	Description	Default
`number_of_trials`	`int`	Number of strategies (K >= 1).	required
`variance`	`float`	Base variance of individual Sharpe ratio estimates.	required

Returns:

Type	Description
`float`	Inflated variance accounting for selection from K strategies.

Example

More trials increases variance due to selection bias¶

v1 = variance_of_the_maximum_of_k_Sharpe_ratios(1, 0.1) v10 = variance_of_the_maximum_of_k_Sharpe_ratios(10, 0.1) bool(v10 > v1) True

Source code in jsharpe/sharpe/psr.py

def variance_of_the_maximum_of_k_Sharpe_ratios(number_of_trials: int, variance: float) -> float:
    """Compute the variance of the maximum Sharpe ratio across K strategies.

    Selection across a larger pool increases the uncertainty of the selected
    (maximum) Sharpe ratio estimate. This function applies a logarithmic
    variance inflation factor that increases monotonically with K.

    Args:
        number_of_trials: Number of strategies (K >= 1).
        variance: Base variance of individual Sharpe ratio estimates.

    Returns:
        Inflated variance accounting for selection from K strategies.

    Example:
        >>> # More trials increases variance due to selection bias
        >>> v1 = variance_of_the_maximum_of_k_Sharpe_ratios(1, 0.1)
        >>> v10 = variance_of_the_maximum_of_k_Sharpe_ratios(10, 0.1)
        >>> bool(v10 > v1)
        True
    """
    # Monotone increasing variance inflation with K (K>=1)
    inflation = 1.0 + np.log(max(1, int(number_of_trials)))
    return float(variance * inflation)

API Reference¶

jsharpe ¶

Compute variance of a Sharpe ratio estimate¶

Compute the Probabilistic Sharpe Ratio¶

FDR_critical_value(q, SR0, SR1, sigma0, sigma1, p_H1) ¶

adjusted_p_values_bonferroni(ps) ¶

adjusted_p_values_holm(ps, *, variant='bonferroni') ¶

adjusted_p_values_sidak(ps) ¶

autocorrelation(X) ¶

Generate AR(1) process with rho=0.5¶

control_for_FDR(q, *, SR0=0, SR1=0.5, p_H1=0.05, T=24, gamma3=0.0, gamma4=3.0, rho=0.0, K=1) ¶

critical_sharpe_ratio(SR0, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05, K=1) ¶

effective_rank(C) ¶

Identity matrix has effective rank equal to its dimension¶

Perfectly correlated matrix has effective rank 1¶

expected_maximum_sharpe_ratio(number_of_trials, variance, SR0=0) ¶

Expected maximum increases with number of trials¶

generate_autocorrelated_non_gaussian_data(N, n, SR0=0, name='gaussian', rho=None, gaussian_autocorrelation=0) ¶

generate_non_gaussian_data(nr, nc, *, SR0=0, name='severe') ¶

get_random_correlation_matrix(number_of_trials=100, effective_number_of_trials=10, number_of_observations=200, noise=0.1) ¶

make_expectation_gh(n_nodes=200) ¶

E[Z^2] should be 1 for standard normal¶

E[Z] should be 0 for standard normal¶

minimum_track_record_length(SR, SR0, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05) ¶

Higher Sharpe ratio needs shorter track record¶

minimum_variance_weights_for_correlated_assets(V) ¶

Create a simple covariance matrix¶

number_of_clusters(C, *, retries=10, max_clusters=100) ¶

oFDR(SR, SR0, SR1, T, p_H1, *, gamma3=0.0, gamma4=3.0, rho=0.0, K=1) ¶

Higher observed SR should give lower probability of H0¶

pFDR(p_H1, alpha, beta) ¶

With 5% prior on H1 and 5% significance¶

ppoints(n, a=None) ¶

probabilistic_sharpe_ratio(SR, SR0, *, variance=None, T=None, gamma3=0.0, gamma4=3.0, rho=0.0, K=1) ¶

Higher observed SR gives higher PSR¶

robust_covariance_inverse(V) ¶

Create a constant-correlation covariance matrix¶

sharpe_ratio_power(SR0, SR1, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05, K=1) ¶

More observations increases power¶

sharpe_ratio_variance(SR, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, K=1) ¶

Variance under Gaussian iid assumptions¶

Variance with non-Gaussian returns (higher kurtosis)¶

variance_of_the_maximum_of_k_Sharpe_ratios(number_of_trials, variance) ¶

More trials increases variance due to selection bias¶

`jsharpe` ¶

`FDR_critical_value(q, SR0, SR1, sigma0, sigma1, p_H1)` ¶

`adjusted_p_values_bonferroni(ps)` ¶

`adjusted_p_values_holm(ps, *, variant='bonferroni')` ¶

`adjusted_p_values_sidak(ps)` ¶

`autocorrelation(X)` ¶

`control_for_FDR(q, *, SR0=0, SR1=0.5, p_H1=0.05, T=24, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

`critical_sharpe_ratio(SR0, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05, K=1)` ¶

`effective_rank(C)` ¶

`expected_maximum_sharpe_ratio(number_of_trials, variance, SR0=0)` ¶

`generate_autocorrelated_non_gaussian_data(N, n, SR0=0, name='gaussian', rho=None, gaussian_autocorrelation=0)` ¶

`generate_non_gaussian_data(nr, nc, *, SR0=0, name='severe')` ¶

`get_random_correlation_matrix(number_of_trials=100, effective_number_of_trials=10, number_of_observations=200, noise=0.1)` ¶

`make_expectation_gh(n_nodes=200)` ¶

`minimum_track_record_length(SR, SR0, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05)` ¶

`minimum_variance_weights_for_correlated_assets(V)` ¶

`number_of_clusters(C, *, retries=10, max_clusters=100)` ¶

`oFDR(SR, SR0, SR1, T, p_H1, *, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

`pFDR(p_H1, alpha, beta)` ¶

`ppoints(n, a=None)` ¶

`probabilistic_sharpe_ratio(SR, SR0, *, variance=None, T=None, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

`robust_covariance_inverse(V)` ¶

`sharpe_ratio_power(SR0, SR1, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, alpha=0.05, K=1)` ¶

`sharpe_ratio_variance(SR, T, *, gamma3=0.0, gamma4=3.0, rho=0.0, K=1)` ¶

`variance_of_the_maximum_of_k_Sharpe_ratios(number_of_trials, variance)` ¶