#11143 [FIX] Normalize node risk with sample weight sum

In case of regression trees, node risk is computed as sum of squared error. To get a meaningfull value to compare with it needs to be normalized to the number of samples in the node (or more generally to the sum of sample weights in this node). Otherwise the sum of squared error is highly dependend on the number of samples in the node and comparision with `regressionAccuracy` parameter is not very meaningful. After normalization `node_risk` means in fact sample variance for all samples in the node, which makes much more sence and seams to be what was originaly intended by the code given that node risk is later used as a split termination criteria by ``` sqrt(node.node_risk) < params.getRegressionAccuracy() ```

#11143 [FIX] Normalize node risk with sample weight sum
In case of regression trees, node risk is computed as sum of squared error. To get a meaningfull value to compare with it needs to be normalized to the number of samples in the node (or more generally to the sum of sample weights in this node). Otherwise the sum of squared error is highly dependend on the number of samples in the node and comparision with `regressionAccuracy` parameter is not very meaningful. After normalization `node_risk` means in fact sample variance for all samples in the node, which makes much more sence and seams to be what was originaly intended by the code given that node risk is later used as a split termination criteria by ``` sqrt(node.node_risk) < params.getRegressionAccuracy() ```
24e2e0d3 · codingforfun · 70bbf17b · 24e2e0d3
Commit 24e2e0d3 authored Mar 27, 2018 by codingforfun
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 0 deletions

tree.cpp modules/ml/src/tree.cpp +1 -0

No files found.
--- a/modules/ml/src/tree.cpp
+++ b/modules/ml/src/tree.cpp
@@ -632,6 +632,7 @@ void DTreesImpl::calcValue( int nidx, const vector<int>& _sidx )
        }

        node->node_risk = sum2 - (sum/sumw)*sum;
+        node->node_risk /= sumw;
        node->value = sum/sumw;
    }
 }