Analysis of artificial intelligence machine learning CART algorithm

    There are three main categories of artificial intelligence machine learning: 1) classification; 2) regression; 3) clustering. Today we will focus on the CART algorithm.

    We know that the decision tree algorithm occupies two positions in the top ten machine learning, namely the C4.5 algorithm and the CART algorithm, which shows the importance of the CART algorithm. The following focuses on the CART algorithm.

    Different from ID3 and C4.5, CART is a binary decision tree, which is a full binary tree. The CART algorithm was proposed by Breiman et al. in 1984. It uses a completely different method to construct prediction criteria from traditional statistics. It is given in the form of a binary tree, which is easy to understand, use and explain. The prediction tree constructed by the CART model is in many cases more accurate than the algebraic prediction criteria constructed by commonly used statistical methods, and the more complex the data and the more variables, the more significant the superiority of the algorithm.

    The CART algorithm can be used for both classification and regression. The CART algorithm is called a landmark algorithm in the field of data mining.

    CART algorithm concept:

    CART (Classification and Regression Tree) Classification and Regression Tree is a decision tree construction algorithm. CART is a learning method to output the conditional probability distribution of random variable Y under the condition of a given input random variable X. CART assumes that the decision tree is a binary tree, and the values ​​of the internal node features are "yes" and "no", the left branch is the branch with the value "yes", and the right branch is the branch with the value "no". Such a decision tree is equivalent to recursively dicing each feature, dividing the input space, that is, the feature space into a finite number of units, and determining the predicted probability distribution on these units, that is, the conditional probability of output under the given conditions of the input distributed.

    The CART algorithm can handle both discrete and continuous problems. When this algorithm deals with continuous problems, it mainly uses binary segmentation to deal with continuous variables, that is, if the eigenvalue is greater than a given value, go to the left subtree or go to the right subtree.

    CART algorithm composition:

    The CART algorithm is composed as follows:

    1) Decision tree generation: Generate a decision tree based on the training data set, and the generated decision tree should be as large as possible; build nodes from the root from top to bottom, and choose the best one at each node (different algorithms use different indicators to define "Best") attribute to split the training data set in the child nodes as pure as possible.

    2) Decision tree pruning: Pruning the generated tree with the verification data set and selecting the optimal subtree. At this time, the minimum loss function is used as the criterion for pruning. CCP (Cost-Complexity Pruning) is used here.

    The generation of the decision tree is the process of recursively constructing a binary decision tree, using the square error minimization criterion for the regression tree and the Gini index minimization criterion for the classification tree to perform feature selection and generate the binary tree.

    Analysis of artificial intelligence machine learning CART algorithm

    CART decision tree generation:

    1) Regression tree generation

    The regression tree uses the mean square error as the loss function. When the tree is generated, the space is recursively divided according to the optimal feature and the optimal value under the optimal feature until the stopping condition is met. The stopping condition can be set manually, such as when The loss reduction value after segmentation is less than the given threshold ε, then the segmentation is stopped and leaf nodes are generated. For the generated regression tree, the category of each leaf node is the mean value of the label falling on the leaf node data.

    The regression tree is a binary tree, each time it is divided according to a certain value under the feature, each internal node is a judgment of the corresponding feature, until the leaf node gets its category, the difficulty of constructing this tree It lies in how to select the optimal segmentation feature and the segmentation variable corresponding to the segmentation feature.

    Regression trees and model trees can handle both continuous and discrete features.

    The regression tree generation algorithm is as follows:

    Input: training data set D = {(x1, y1), (x2, y2),..., (xN, yN)}

    Output: regression tree T

    1) Solve the selected segmentation feature j and the segmentation feature value s, j divides the training set D into two parts, R1 and R2, after segmentation according to (j, s) as follows:

    R1(j,s)={xi|xji≤s} R2(j,s)={xi|xji>s}

    c1=1N1∑xi∈R1yi c2=1N2∑xi∈R2yi

    2) Traverse all possible solutions (j, s) to find the optimal (j*, s*), the optimal solution minimizes the corresponding loss, and it can be divided according to the optimal feature (j*, s*) .

    Min {∑ (yi–c1)^2 +∑ (yi–c2)^2}

    j,s xi∈R1 xi∈R2

    3) Recursively call 1) and 2) until the stop condition is met.

    4) Return to the decision tree T.

    The regression tree mainly adopts a divide-and-conquer strategy to divide and conquer targets that cannot be optimized by the only global linear regression, and then obtain more accurate results, but it is not a wise choice to take the mean by segment. You can consider setting the leaf node to A linear function, this is the so-called piecewise linear model tree. Experiments show that the effect of the model tree is better than that of the regression tree. The model tree only needs to be slightly modified on the basis of the regression tree. For the data allocated to the leaf nodes, the least mean square loss of linear regression is used to calculate the loss of the node.

    Analysis of artificial intelligence machine learning CART algorithm

    2) Classification tree generation

    The classification tree is used for classification in CART. Unlike ID3 and C4.5, the CART classification tree uses Gini index to select the optimal segmentation feature, and each time it is divided into two.

    The Gini index is a concept similar to entropy. For a random variable X with K states corresponding to probabilities p1, p2,..., pK, the Gini index Gini is defined as follows:

    Gini(X)=∑pk(1?pk)=1? ∑kp2k

    kk

    The Gini index of set D under the condition of known characteristic A:

    Gini(D,A)=(|D1|/|D|)*Gini(D1)+(|D2|/|D|)*Gini(D2)

    The larger the value of Gini (D, A), the greater the uncertainty of the sample, which is similar to entropy, so the criterion for selecting feature A is that the smaller the value of Gini (D, A), the better.

    Camera Harness

    We usually use AVSS, AVS, TXL or cable with generally 18AWG or custom gauge to 16AWG or 14AWG.The connector we recommand is Tyco/AMP/Delphi/Bosch/Deutsch/Yazaki/Sumitomo /Molex replacements or originalTo be applied in automobile or motorcycle HID headlight upgrading system.


    We can do any extension cable on GPS Tracking Systems & Driving recorder camera. If you need further question, our profeessional engineers are able to solve for you.


    Camera Harness,Camera Wire Harness,Electrical Cable Crimper,Wiring Harness Clips

    Dongguan YAC Electric Co,. LTD. , https://www.yacentercn.com

    Previous Post: Introduction of polyphonic ringtone design and MIDI music file playback process
    Next Post: The future development direction of flexible sensor technology
    Home
    Recent Posts
    • High Power 100W Solar Energy LED Street Light
    • Hot Sale 18mm 10V Square SMD Piezo Buzzer Magnet…
    • LED Mini Warinig Lightbar with Magnet Mouting Em…
    • Communication, Telecommunication Monopole Steel …
    • 10′′-26′′ Cantilever TV Bracket TV Mount…
    • Rechargeable Electric Plug Mosquito Swatter with…
    • 2016 New Ce RoHS UL 5000-6500k Low Price T8 T5 L…
    • 10A 250V European Power AC DC Travel Universal A…
    • The development of robots has rapidly advanced t…
    • Aukus Exciting LED Module Outdoor Flood Light Li…
    • High Sensivity Key Control Overlay Silicone Rubb…
    • Yarn hairiness detection scheme based on machine…
    • Mobile phone, VR dual drive _ push up HTC's …
    • HDMI Splitter SP1001028
    • Which functions of smart door locks are really p…
    • China Manufacture Blood Bank Refrigerator
    • 21KV 35KV XLPE Steel Wire Armored PVC Sheathed P…
    • Small Torch Light, LED Light Torch
    • "The king of off-road" is backed by a …
    • "The king of off-road" is backed by a …