Trait

org.apache.spark.mllib.tree.impurity

Impurity

Related Doc: package impurity

Permalink

trait Impurity extends Serializable

Trait for calculating information gain. This trait is used for (a) setting the impurity parameter in org.apache.spark.mllib.tree.configuration.Strategy (b) calculating impurity values from sufficient statistics.

Annotations
@Since( "1.0.0" )
Source
Impurity.scala
Linear Supertypes
Serializable, Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Impurity
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. Developer API abstract def calculate(count: Double, sum: Double, sumSquares: Double): Double

    Permalink

    information calculation for regression

    information calculation for regression

    count

    number of instances

    sum

    sum of labels

    sumSquares

    summation of squares of the labels

    returns

    information value, or 0 if count = 0

    Annotations
    @Since( "1.0.0" ) @DeveloperApi()
  2. Developer API abstract def calculate(counts: Array[Double], totalCount: Double): Double

    Permalink

    information calculation for multiclass classification

    information calculation for multiclass classification

    counts

    Array[Double] with counts for each label

    totalCount

    sum of counts for all labels

    returns

    information value, or 0 if totalCount = 0

    Annotations
    @Since( "1.1.0" ) @DeveloperApi()