Package PyML :: Package evaluators :: Module assess
[frames] | no frames]

Module assess

source code

Classes
  ResultsContainer
  Results
  ClassificationFunctions
  ClassificationResultsContainer
A class for holding the results of testing a classifier
  ClassificationResults
  ResultsList
  RegressionResultsContainer
  RegressionResults
Functions
 
test(classifier, data, **args)
test a classifier on a given dataset
source code
 
loo(classifier, data, **args)
perform Leave One Out
source code
 
cvFromFolds(classifier, data, trainingPatterns, testingPatterns, **args)
perform cross validation
source code
 
cv(classifier, data, numFolds=5, **args)
perform k-fold cross validation
source code
 
stratifiedCV(classifier, data, numFolds=5, **args)
perform k-fold stratified cross-validation; in each fold the number of patterns from each class is proportional to the relative fraction of the class in the dataset
source code
 
nCV(classifier, data, **args)
runs CV n times, returning a 'ResultsList' object.
source code
 
makeFolds(data, numFolds, datasetName, directory='.')
split a dataset into several folds and save the training and testing data of each fold as a separate dataset
source code
 
cvFromFile(classifier, trainingBase, testingBase, datasetClass, **args)
perform CV when the training and test data are in files whose names are of the form: trainingBase + number + string and testingBase + number + string For example: training0.data, training1.data, training2.data and testing0.data, testing1.data, testing2.data for 3 fold CV.
source code
 
scatter(r1, r2, statistic='roc', x1Label='', x2Label='', fileName=None, **args)
a scatter plot for comparing the performance of two classifiers
source code
 
plotROC2(decisionFunc, givenY, fileName=None, **args) source code
 
plotROC(res, fileName=None, **args)
plot the ROC curve from a given Results (or Results-like) object
source code
 
plotROCs(resList, descriptions=None, fileName=None, **args)
plot multiple ROC curves.
source code
 
significance(r1, r2, statistic='roc')
report the statistical significance of the difference in error rates of a series of classification results of two classifiers using the Wilcoxon signed rank test.
source code
 
trainTest(classifierTemplate, data, trainingPatterns, testingPatterns, **args)
Train a classifier on the list of training patterns, and test it on the test patterns
source code
 
confmat(L1, L2)
computes the confusion matrix between two labelings
source code
 
superConfmat(Y1, Y2, numClasses=0)
computes the confusion matrix between two labelings, where the matrix is assumed to be square, according to the labels of L1 L1 and L2 are assumed to have integer components in the range 0,.., numClasses
source code
 
roc(Y, givenY, decisionFunc, n=None, targetClass=1, normalize=True)
Compute the ROC curve and area under the curve for a two class problem
source code
 
convert(object, attributes) source code
 
saveResultObjects(objects, fileName, *options)
save a list or dictionary of Results objects it is o.k.
source code
 
loadResults(fileName, isNewFormat=True)
isNewFormat -- whether the Results were saved under version 0.6.1 or newer
source code
 
loadResults2(fileName)
load a list of list of Results objects or a dictionary of a list of Results objects
source code
Function Details

test(classifier, data, **args)

source code 
test a classifier on a given dataset
Parameters:
  • classifier - a trained classifier
  • data - a dataset
  • stats - whether to compute the statistics of the match between the predicted labels and the given labels [True by default]
Returns:
a Results class instance

loo(classifier, data, **args)

source code 

perform Leave One Out

USAGE: loo(classifier, data)

Returns:
a results object

cvFromFolds(classifier, data, trainingPatterns, testingPatterns, **args)

source code 
perform cross validation
Parameters:
  • classifier - a classifier template
  • data - a dataset
  • trainingPatterns - a list providing the training examples for each fold
  • testingPatterns - a list providing the testing examples for each fold
  • intermediateFile - a file name to save intermediate results under if this argument is not given, no intermediate results are saved
Returns:
a Results object. The ROC curve is computed using the resulting classification of each point in the dataset (in contrast to Provost, Fawcett and Kohavi who compute average ROC curves).

cv(classifier, data, numFolds=5, **args)

source code 
perform k-fold cross validation
Parameters:
  • classifier - a classifier template
  • data - a dataset
  • numFolds - number of cross validation folds (default = 5)
  • numFolds - number of cross validation folds (default = 5)
  • seed - random number generator seed
  • foldsToPerform - number of folds to actually perform (in case you're doing n fold CV, and want to save time, and only do some of the folds)
Returns:
a Results object.

stratifiedCV(classifier, data, numFolds=5, **args)

source code 
perform k-fold stratified cross-validation; in each fold the number of patterns from each class is proportional to the relative fraction of the class in the dataset
Parameters:
  • classifier - a classifier template
  • data - a dataset
  • numFolds - number of cross validation folds (default = 5)
  • numFolds - number of cross-validation folds -- overrides the numFolds parameter
  • seed - random number generator seed
  • trainingAllFolds - a list of patterns that are to be used as training examples in all CV folds.
  • intermediateFile - a file name to save intermediate results under if this argument is not given, not intermediate results are saved
  • foldsToPerform - number of folds to actually perform (in case you're doing n fold CV, and want to save time, and only do some of the folds)
Returns:
a Results object.

nCV(classifier, data, **args)

source code 
runs CV n times, returning a 'ResultsList' object.
Parameters:
  • classifier - classifier template
  • data - dataset
  • cvType - which CV function to apply (default: stratifiedCV)
  • seed - random number generator seed (default: 1) This is used as the seed for the first CV run. Subsequent runs use seed + 1, seed + 2...
  • iterations - number of times to run CV (default: 10)
  • numFolds - number of folds to use with CV (default: 5)
  • intermediateFile - a file name to save intermediate results under if this argument is not given, no intermediate results are saved
Returns:
ResultsList - a list of the results of each CV run as a ResultsList object

makeFolds(data, numFolds, datasetName, directory='.')

source code 

split a dataset into several folds and save the training and testing data of each fold as a separate dataset

data - a dataset instance numfolds - number of folds into which to split the data datasetName - string to use for the file names directory - the directory into which to deposit the files

cvFromFile(classifier, trainingBase, testingBase, datasetClass, **args)

source code 
perform CV when the training and test data are in files whose names are of the form: trainingBase + number + string and testingBase + number + string For example: training0.data, training1.data, training2.data and testing0.data, testing1.data, testing2.data for 3 fold CV. training and testing files are matched by the number appearing after the strings trainingBase and testingBase both trainingBase and testingBase can be paths.

scatter(r1, r2, statistic='roc', x1Label='', x2Label='', fileName=None, **args)

source code 
a scatter plot for comparing the performance of two classifiers
Parameters:
  • r1, r2 - both are either a list of Result classes, or a list of success rates / ROC scores
  • statistic - which measure of classifier success to plot values : 'roc', 'successRate', 'balancedSuccessRate' in order to specify parts of the roc curve you can use something like: 'roc50' or 'roc0.1'
  • title - the title of the plot

plotROC(res, fileName=None, **args)

source code 
plot the ROC curve from a given Results (or Results-like) object
Parameters:
  • res - Results (or Container object that was made by saving a a Results object (note that if you have a Results object you can use this function as a method so there is no need to supply this argument).
  • fileName - optional argument - if given, the roc curve is saved in the given file name. The format is determined by the extension. Supported extensions: .eps, .png, .svg
  • rocN - what type of ROC curve to plot (roc50, roc10 etc.) default is full ROC curve
  • normalize - whether to normalize the ROC curve (default: True)
  • plotStr - which string to pass to matplotlib's plot function default: 'ob'
  • axis - redefine the figure axes; takes a list of the form [xmin,xmax,ymin,ymax]
  • show - whether to show the ROC curve (default: True) useful when you just want to save the curve to a file. The use of Some file formats automatically sets this to False (e.g. svg files). This relates to quirks of matplotlib.

plotROCs(resList, descriptions=None, fileName=None, **args)

source code 
plot multiple ROC curves.
Parameters:
  • resList - a list or dictionary of Result or Result-like objects
  • descriptions - text for the legend (a list the size of resList). A legend is not shown if this parameter is not given In the case of a dictionary input the description for the legend is taken from the dictionary keys.
  • fileName - if given, a file to save the figure in
  • legendLoc - the position of the legend -- an integer between 0 and 9; see the matplotlib documentation for details
  • plotStrings - a list of matlab style plotting string to send to the plotROC function (instead of the plotString keyword of plotROC)
  • other, keywords - keywords of the plotROC function

significance(r1, r2, statistic='roc')

source code 

report the statistical significance of the difference in error rates of a series of classification results of two classifiers using the Wilcoxon signed rank test.

Returns: pvalue, (median1, median2) where: pvalue - the pvalue of the two sided Wilcoxon signed rank test; to get the pvalue of a one sided test divide the pvalue by two. (median1, median2) - the median of the statistics of the inputs r1 and r2.

Parameters:
  • r1, r2 - both are either a list of Result classes, or a list of success rates
  • statistic - which measure of classifier success to plot values : 'roc', 'successRate', 'balancedSuccessRate' in order to specify parts of the roc curve you can use something like: 'roc50' or 'roc0.1'

roc(Y, givenY, decisionFunc, n=None, targetClass=1, normalize=True)

source code 

Compute the ROC curve and area under the curve for a two class problem

  • Y - the predicted labels (can put None instead)

  • givenY - the true labels
    • decisionFunc - the values of the decision function
    • n - the number of false positives to take into account (roc_n)
    • targetClass - the "positive" class
  • normalize whether to normalize the roc curve (default: True) when this is set to False, TP/FP counts are output rather than TP/FP rates

Parameters:  

saveResultObjects(objects, fileName, *options)

source code 
save a list or dictionary of Results objects it is o.k. if the list or dictionary is itself a list or dictionary of OPTIONS: long - save the long attribute list