Package PyML :: Package datagen :: Module toydata
[frames] | no frames]

Module toydata

source code

generate toy datasets based on code by Mark Rogers

Functions
 
sineClass(xlim=[0,1], ylim=[0,1], n=20, sigma=0.04)
Generates a 2-D noisy sine wave...
source code
 
multivariate_normal(mu, sigma=0.1, n=20)
a wrapper around numpy's random.multivariate_normal function Generates data from a Gaussian distribution with mean mu and standard deviation sigma Parameters: mu - mean sigma - variance (either a float, list or square matrix) n - number of points to generate
source code
 
gaussianData(mu, sigma, n) source code
 
noisyData()
Creates two populations, usually linearly-separable, but with vastly different variance.
source code
 
sineData(n=30)
Uses sine-wave populations to create two class populations that meander close to each other.
source code
 
separableData()
Creates two linearly-separable populations, one centered at (-.5,0) and the other at (0.5,0).
source code
Variables
  USAGE = ...
Function Details

sineClass(xlim=[0,1], ylim=[0,1], n=20, sigma=0.04)

source code 

Generates a 2-D noisy sine wave
Parameters:
  xlim     - list of length 2 that delimits the x value range 
  ylim     - list of length 2 that delimits the y value range
  n        - number of data points
Note: for use with PyML demo2d, only use x and y values
      between -1 and 1

multivariate_normal(mu, sigma=0.1, n=20)

source code 

a wrapper around numpy's random.multivariate_normal function
Generates data from a Gaussian distribution with mean mu
and standard deviation sigma
Parameters:
  mu      - mean
  sigma   - variance (either a float, list or square matrix)
  n       - number of points to generate

Note: for use with PyML demo2d, only use mu1 and mu2
      values that keep populations between -1 and 1

noisyData()

source code 

Creates two populations, usually linearly-separable, but with vastly different variance. Simulates a problem where one population has significantly more noise than another. Data are output in a CSV format suitable for creating a PyML VectorDataSet (labelsColumn=1).

sineData(n=30)

source code 

Uses sine-wave populations to create two class populations that meander close to each other. Data are output in a CSV format suitable for creating a PyML VectorDataSet (labelsColumn=1).

separableData()

source code 

Creates two linearly-separable populations, one centered at (-.5,0) and the other at (0.5,0). Data are output in a CSV format suitable for creating a PyML VectorDataSet (labelsColumn=1).


Variables Details

USAGE

Value:
"""
Usage: python generate.py type
Where 'type' is one of:
    l  - two similar, linearly-separable populations
    n  - two linearly-separable populations, one with more
         noise than the other
    s  - two populations generated by sine waves (with some noise)
"""