# br MPEI based regression tree

3.2.1.2. MPEI-based regression tree preselection. Bootstrap sampling helps to produce randomized regression trees. In order to guarantee the performance of the Thymoquinone learner pool, a preselection on base learners is performed. For classification, base learners whose accuracy is higher than 50 percent may make a positive contribution to the accuracy of the ensemble. In the context of regression, however, base learners with a large MSE may lower the performance of the ensemble model, although it is di cult to predefine a threshold to identify good base learners. At present, most performance indicators for regression models aim to measure the approximation error, which cannot explicit tell how good the model is. In this work, we propose a dimensionless calculation method to evaluate the accuracy of regression models that predict continuous output values.

Suppose there is a regression problem whose target value ranges from Ymin to Ymax and a regression model is used to predict the output Ypre, Ypre ∈ [Ymin, Ymax]. Let e (0 ≤ e ≤ Ymax − Ymin ) denote the average absolute error between the predicted and real values, so the error interval (EI) of the predictions can be defined as: [Ypre − e, Ypre + e]. When Ypre − e is less than Ymin, the lower bound becomes Ymin. When Ypre + e is greater than Ymax, the upper bound becomes Ymax. Then a minimum

Table 2

Algorithm for semi-random regression tree set generation and refinement.

Construction process of semi-random regression tree set

4. choose the feature subset with highest importance summation: Fmax ;
5. train a regression tree RT based on S and Fmax ;

7. if the size of RTS is less than T then

11. compute MPEI of the tree;
12. if MPEI of the tree is higher than β then

Output: A set of semi-random regression trees.

obeying a discrete uniform distribution. Let τ = Y˜max − Y˜min + 1, the mean proportion of the error interval (MPEI) in the whole target value interval can be obtained by the following equations:

τ
τ
τ
τ
4e˜

τ
τ

when e equals to 0, MPEI is 0. When e equals to Y˜max − Y˜min, MPEI is 1. MPEI increases with the increase of e. Our experiments have demonstrated that MPEI can effectively pre-select high-quality base learners to be stored in the base learner pool. Here, in our ensemble learning method, a set of random regression trees are constructed and refined by the algorithm listed in Table 2.

3.2.2. Diversity and MSE-based model selection

After generating a set of regression trees, the next step of our method is to select a subset of them for constructing the ensemble. In this work, model selection is performed based on the optimization of the diversity and mean squared error (MSE) of the base learners by means of an evolutionary multi-objective algorithm. Suppose that yij is the predicted value of the jth sample by ith regressor, yˆ j is the true value of the jth sample, y¯ j is the predicted value of the jth sample by the ensemble, nr is the size of the ensemble and ns is the number of samples. The selection method aims to find the
optimal regression tree combination by minimizing

nr −1
E(MSE
), where E(MSE
) is the mean squared error of the ith

nr

i

i

regressor and can be computed as
1

(y
i j
−
yˆ
j
)2
, and by maximizing

nr

ns

i
i

ns

y

the ith regressor and can be computed

ns

Let t denote the number of iterations. The overall procedure of model selection can be described as follows:

Step 1: Randomly construct R ensembles E1, E2, . . . , ER using the generated semi-random regression trees;