pythonsi.test_statistics

Test statistic definitions for selective inference.

Classes

class pythonsi.test_statistics.AD_DATestStatistic(xs: Data, xt: Data)[source]
class pythonsi.test_statistics.FSTestStatistic(x: ndarray[tuple[Any, ...], dtype[floating]], y: ndarray[tuple[Any, ...], dtype[floating]])[source]

Compute test statistic and other utilities for feature selection inference.

This class computes test statistics for testing individual features after feature selection, implementing the post-selection inference framework for validating selected features.

The test statistic is designed for testing:

\[H_0: \beta_j = 0 \quad \text{vs} \quad H_1: \beta_j \neq 0\]

for a specific feature \(j\) in the active set, where \(\beta_j\) is the coefficient of feature \(j\) in the linear model.

Parameters:
  • x (array-like, shape (n, p)) – Design matrix containing all features

  • y (array-like, shape (n, 1)) – Response vector

x_node

Node containing the design matrix

Type:

Data

y_node

Node containing the response vector

Type:

Data

class pythonsi.test_statistics.SFS_DATestStatistic(xs: ndarray[tuple[Any, ...], dtype[floating]], ys: ndarray[tuple[Any, ...], dtype[floating]], xt: ndarray[tuple[Any, ...], dtype[floating]], yt: ndarray[tuple[Any, ...], dtype[floating]])[source]

Test statistic for feature selection inference after domain adaptation.

This class computes test statistics for testing individual features after feature selection on domain-adapted data, implementing the post-selection inference framework for cross-domain feature validation.

The test statistic is designed for testing:

\[H_0: \beta_j = 0 \quad \text{vs} \quad H_1: \beta_j \neq 0\]

for a specific feature \(j\) in the active set, where \(\beta_j\) is the coefficient of feature \(j\) in the target domain after domain adaptation via optimal transport.

Parameters:
  • xs (array-like, shape (ns, p)) – Source domain design matrix

  • ys (array-like, shape (ns, 1)) – Source domain response vector

  • xt (array-like, shape (nt, p)) – Target domain design matrix

  • yt (array-like, shape (nt, 1)) – Target domain response vector

xs_node

Node containing the source domain design matrix

Type:

Data

ys_node

Node containing the source domain response vector

Type:

Data

xt_node

Node containing the target domain design matrix

Type:

Data

yt_node

Node containing the target domain response vector

Type:

Data

Notes

The test statistic accounts for the domain adaptation step by focusing the inference on the target domain data while using the source domain for adaptation. This allows for valid inference on features selected after optimal transport domain adaptation.

class pythonsi.test_statistics.TLHDRTestStatistic(XS_list: ndarray[tuple[Any, ...], dtype[floating]], YS_list: ndarray[tuple[Any, ...], dtype[floating]], X0: ndarray[tuple[Any, ...], dtype[floating]], Y0: ndarray[tuple[Any, ...], dtype[floating]])[source]

Test statistic for selection inference in high-dimensional regression after transfer learning with multiple source domains.

This class computes test statistics for testing individual features after feature selection via a transfer learning procedure, implementing the post-selection inference framework for high-dimensional regression.

The test statistic is designed for testing:

\[H_0: \beta_j = 0 \quad \text{vs} \quad H_1: \beta_j \neq 0,\]

where \(\beta_j\) is the coefficient of feature \(j\) in the target domain after transfer learning and feature selection.

Parameters:
  • XS_list (array-like, shape (K, nS, p)) –

    A 3D numpy array containing source domain design matrices. - K: number of source domains - nS: sample size per source domain - p: number of features (shared across domains)

    The array is structured such that XS_list[k] corresponds to the design matrix of the \(k\)-th source domain, with shape (nS, p).

  • YS_list (array-like, shape (K * nS, 1)) –

    A 2D numpy array containing the source domain response vectors stacked vertically across all K source domains. - The first nS rows correspond to the first source domain,

    the next nS to the second, and so on.

  • X0 (array-like, shape (nT, p)) – Target domain design matrix. - nT: number of samples in the target domain - p: number of features (same as in source domains)

  • Y0 (array-like, shape (nT, 1)) – Target domain response vector.

XS_list_node

Node containing the collection of source domain design matrices.

Type:

Data

YS_list_node

Node containing the stacked source domain response vector.

Type:

Data

X0_node

Node containing the target domain design matrix.

Type:

Data

Y0_node

Node containing the target domain response vector.

Type:

Data

Notes

The test statistic accounts for the transfer learning step by focusing the inference on the target domain while leveraging information from multiple source domains. This allows for valid inference on features selected after the transfer learning process.