Package PyML :: Package feature_selection :: Module featsel
[frames] | no frames]

Module featsel

source code

Classes
  FeatureSelector
api for feature selection objects
  OneAgainstRestSelect
Use a two-class feature selection method for multi-class problem by doing feature selection in a one-against-the-rest manner, and returns the union of all the features selected.
  RFE
RFE (Recursive Feature Elimination) uses the vector w of an SVM for feature selection.
  MultiplicativeUpdate
Multiplicative update uses the vector w of an SVM to do feature selection.
  Random
A feature selection method that keeps a random set of features
  Filter
A simple feature selection method that filters features according to a feature score.
  FeatureScorer
base class for objects that have a 'score' function for scoring the features of a dataset
  FeatureScore
A class for scoring the features of a dataset USAGE: construction: f = FeatureScore(scoreName, mode = modeValue) or using copy construction : f = FeatureScore(otherFeatureScore) scoreName is the type of filter; available filters are: "predictivity", "oddsRatio", "golub" mode is one of the following: oneAgainstRest (default) oneAgainstOne
  BackwardSelector
Functions
 
parseArgs(data, targetClass, otherClass=None, **args)
parse arguments for a feature scoring function
source code
 
singleFeatureSuccRate(data, targetClass, otherClass=None, **args) source code
 
predictivity(data, targetClass, otherClass=None, **args)
A feature score for discrete data; the score for feature i is: s_i = P(Fi | C1) - P(Fi | C2), where P(Fi | C) is the estimated probability of Feature i being nonzero given the class variable This is estimated as: s_i = # of patterns in target class that have feature i / no.
source code
 
countDiff(data, targetClass, otherClass=None, **args)
A feature score for discrete data; the score for feature i is: s_i = (#(Fi | C ) - #(Fi | not C)) / #(Fi | C)
source code
 
sensitivity(data, targetClass, otherClass=None, **args)
A feature score for discrete data (alternatively, with a threshold it could be used for continuous data) s_i = #(Fi | C) / #(C)
source code
 
ppv(data, targetClass, otherClass=None, **args)
A feature score for discrete data s_i = #(Fi | C) / #(Fi)
source code
 
ppvThreshold(data, targetClass, otherClass=None, **args)
A feature score for discrete data s_i = #(Fi | C) / #(Fi) if #(Fi | C) > threshold and 0 otherwise
source code
 
specificity(data, targetClass, otherClass=None, **args)
A feature score for discrete data s_i = #(Fi | C) / #(Fi)
source code
 
usefullness(data, targetClass, otherClass=None, **args)
A feature score for discrete data optional arguments: threshold fraction
source code
 
abundance(data, targetClass, otherClass=None, **args)
Fraction of patterns that have a feature: A(F,C) = #(F | C) #(C)
source code
 
oddsRatio(data, targetClass, otherClass=None, **args) source code
 
logOddsRatio(data, targetClass, otherClass=None, **args) source code
 
relief(data) source code
 
golub(data, targetClass, otherClass, **args)
The Golub feature score: s = (mu1 - mu2) / sqrt(sigma1^2 + sigma2^2)
source code
 
succ(data, targetClass, otherClass, **args)
the score of feature j is the success rate of a classifier that classifies into the target class all points whose value of the feature are higher than some threshold (linear 1-d classifier).
source code
 
balancedSucc(data, targetClass, otherClass, **args)
the score of feature j is the success rate of a classifier that classifies into the target class all points whose value of the feature are higher than some threshold (linear 1-d classifier).
source code
 
roc(data, targetClass, otherClass, **args) source code
 
featureCount(data, *options, **args)
returns a vector where component i gives the number of patterns where feature i is nonzero INPUTS: data - a dataset targetClass - class for which to count (optional, default behavior is to look at all patterns) Y - alternative label vector (optional) feature - either a feature or list of features - counts the number of patterns for which the feature or list of features is non-zero I - a list of indices on which to do feature count OPTIONS: "complement" - look at the complement of the target class
source code
 
featureMean(data, targetClass=None, Y=None)
returns a vector where component i is the mean of feature i INPUT: data - a dataset targetClass - class for which to take the mean (optional) Y - alternative label vector (optional)
source code
 
featureStd(data, targetClass=None, Y=None)
returns a vector where component i is the standard deviation of feature i INPUT: data - a dataset targetClass - class for which to take the mean (optional) Y - alternative label vector (optional)
source code
 
eliminateSparseFeatures(data, threshold)
removes from the data features whose feature count is below a threshold data - a dataset threshold - number of occurrences of the feature below which it will be eliminated
source code
 
nonredundantFeatures(data, w=None)
Compute a set of nonredundant features for a 0/1 sparse dataset a feature is defined as redundant if there is another feature which has nonzero value for exactly the same patterns, and has a larger weight INPUT: a dataset and a list of weights for each feature in the data weights are optional.
source code
 
linearlySeparable(data)
returns 1 if data is linearly separable and 0 otherwise.
source code
 
extractNumFeatures(resultsFileName) source code
 
weights2ranks(weights, data) source code
 
featureReport(data, score='roc', targetClass=1, otherClass=0) source code
Function Details

predictivity(data, targetClass, otherClass=None, **args)

source code 
A feature score for discrete data; the score for feature i is:
s_i = P(Fi | C1) - P(Fi | C2),
where P(Fi | C) is the estimated probability of Feature i being nonzero given
the class variable
This is estimated as:
s_i = # of patterns in target class that have feature i /
      no. of patterns in target class
      -
      # of patterns in other class that have feature i /
      no. of patterns in other class

specificity(data, targetClass, otherClass=None, **args)

source code 

A feature score for discrete data s_i = #(Fi | C) / #(Fi)

or perhaps: 1 - #(Fi | not C) / #(not C)

nonredundantFeatures(data, w=None)

source code 
Compute a set of nonredundant features for a 0/1 sparse dataset a feature is defined as redundant if there is another feature which has nonzero value for exactly the same patterns, and has a larger weight INPUT: a dataset and a list of weights for each feature in the data weights are optional. OUTPUT: a list of redundant features

linearlySeparable(data)

source code 
returns 1 if data is linearly separable and 0 otherwise. More specifically, it trains a soft margin SVM and checks if all training points are correclty classified