Applications of nature-inspired metaheuristic algorithms for tackling optimization problems across disciplines (2024)

Metaheuristics has been used to find estimates for model parameters and there is work that showed they can outperform those from statistical packages or find them when the latter fail to do so. For example42, showed that PSO can find more optimal L1-estimates for some models than those in statistical packages. In what is to follow, we demonstrate the CSO-MA can find more optimal maximum likelihood estimates and also able to find them when some statistical packages cannot. Our applications include finding maximum likelihood estimates for models in bioinformatics and research in education, and M-estimates for a Cox regression in a Markov renewal model.

Single-cell generalized trend model (scGTM)

Cui et al. (2022)43 proposed a model called scGTM to study relationship between pseudotime44 and gene expression data. The model assumes that the gene expression has a ‘hill’ trend along the pseudotime and can be modeled using a set of interpretable parameters. Below is a brief description of the model and shows CSO-MA outperforms PSO algorithm for all but one gene in terms of finding the optimal value of the negative loglikelihood function; details in43.

For a hill-shaped gene, the scGTM parameters are \(\Theta =(\mu _{\text {mag}}, k_1,k_2,t_0,\phi ,\alpha ,\beta )^T\) and they are estimated from from the observed expression counts \(\varvec{y} = (y_1, \ldots , y_C)^T\) and cell pseudotimes \(\varvec{t} = (t_1, \ldots , t_C)^T\) using the constrained maximum likelihood method. Here C is the number of cells and the interpretations of the parameters in the model are given in Section 2.1 of43. If \(\log L(\Theta \mid \varvec{y}, \varvec{t})\) is the log likelihood function, the optimization problem is:

$$\begin{aligned}{} & {} \max _{\Theta } \log L(\Theta \mid \varvec{y}, \varvec{t}) \end{aligned}$$

(3)

such that

$$\begin{aligned}{}&\min _{c \in \{1,\ldots ,C\}}\log (y_c+1)\le \mu _{\text {mag}}\le \max _{c \in \{1,\ldots ,C\}}\log (y_c+1)\,,\nonumber \\&k_1,k_2\ge 0\,,\; \min _{c \in \{1,\ldots ,C\}} t_c\le t_0\le \max _{c \in \{1,\ldots ,C\}} t_c\,,\; \phi \in \mathbb {Z}_+\,, \end{aligned}$$

(4)

where

$$\begin{aligned}{}&\log L(\Theta \mid \varvec{y}, \varvec{t})=\log \left[ \prod _{c=1}^C\textbf{P}(Y_{c}=y_{c} \mid t_c)\right] \nonumber \\&\quad =\sum _{c=1}^C \log \Big [(1-p_c)f(y_c|t_c) + p_c \; \mathbb {I}(y_c=0)\Big ] \end{aligned}$$

(5)

and

$$\begin{aligned}{}&f(y_c|t_c)=\frac{\tau _c^{y_c}}{y_c!} \frac{\Gamma (\phi +y_c)}{\Gamma (\phi )(\phi +\tau _c)^{y_c}} \frac{1}{\left( 1+\frac{\tau _c}{\phi }\right) ^\phi }\,,\\&\log (\tau _{c}+1)={\left\{ \begin{array}{ll} b + \mu _{\text {mag}}\exp {(-k_1(t_c-t_0)^2)} &{}\text { if }t_c \le t_0\\ b + \mu _{\text {mag}}\exp {(-k_2(t_c-t_0)^2)} &{}\text { if }t_c > t_0 \end{array}\right. }\,,\\&\log \left( \frac{p_c}{1-p_c}\right) =\alpha \log (\tau _c+1)+\beta \,, \end{aligned}$$

which are all functions of \(\Theta \). There are two difficulties in the optimization problem (3). First, the likelihood function (5) is neither convex nor concave. Second, the constraint is linear in \(\mu _{\text {mag}}\), \(k_1\), \(k_2\), and \(t_0\) but \(\phi \) is a positive integer-valued variable. Hence, conventional optimization algorithms, like P-IRLS in GAM45,46 and L-BFGS in switchDE47 are unlikely able to work well. The authors proposed PSO to solve for the constrained MLEs and a Python package is available online. We now apply CSO-MA to the same problem and compare results from the Python package. In addition, we compared CSO-MA’s performance with results from two recently proposed metaheuristic algorithms: the prairie dog optimization algorithm (PDO) proposed by48 and the Rutta and Kutta optimization (RUN) algorithm proposed by49. Table3 displays the negative log likelihood function values found by CSO-MA and PSO for the 20 exemplary genes in50 after 1000 function evaluations of Eq.(5) for the two algorithms and it shows that CSO-MA outperformed PSO and PDO in all but three of the 20 genes. The Wilcoxon test of CSO-MA against the other two algorithms produced p-values less than 0.001 (0.00077 for PSO and 0.00026 for PDO), suggesting that CSO-MA indeed outperformed PSO and PDO in this example.

Full size table

Figure2 displays the fitted PAEP gene given by CSO-MA, PSO and PDO. We observe that CSO-MA captures the “fast decreasing trend” when \(t\ge 0.8\) better than PSO does, and it reaches the higher peak than PDO does. Figures for other genes also show a consistent pattern.

Comparison of CSO-MA, PDO and PSO results for the fitted scGTM with gene PAEP.

Full size image

Parameter estimation for a Rasch model

The Rasch model is one of the most widely used item response models in education and psychology research51. Estimating the parameters in the Rasch and other item response models can be challenging and there is continuing interest to estimate them using different methods and studying the various computational issues. For example52,53, reported that there are at least 27 R packages indexed with the word “Rasch” and 11 packages capable of estimating parameters and analysis for the Rasch model.

The expectation-maximization (EM) algorithms is a common method for parameter estimation in statistics54,55,56. The Bock-Aitkin algorithm is a variant of the EM algorithm and is one of the most popular algorithms for estimating parameters in the Rasch models57. Because the Rasch model also has many extensions with applications in agriculture, health care studies and in research in marketing58,59,60, this subsection compares, for the first time, how metaheuristic algorithms perform relative to the Bock-Aitkin’s method.

We give a brief review of the Rasch model before we compare the estimation results given by CSO-MA, Bock-Aitkin’s (in the R package ltm) and two other metaheuristic algorithms CA and PSO in terms of the likelihood values. In a Rasch model, we work with \(N \times I\) binary item response data where 1 indicates correct and 0 indicates incorrect responses. The data come from a cognitive assessment (e.g., math or reading) that includes I test items. A group of N students gave their responses to the I items, and their binary answers to each of the N items were scored and analyzed51. The Rasch model is given by:

$$\begin{aligned}{} & {} \text{ logit } \big ( \textbf{P} \big (Y_{ji} =1 | \theta _j \big ) \big ) = \theta _j - \beta _i , \; \; \theta _j \sim N(0, \sigma ^2). \end{aligned}$$

(6)

The item parameter \(\beta _i\) represents the difficulty of item i and parameter \(\theta _j\) represents the ability of person j. We assume that \(\theta _j \sim N(0, \sigma ^2)\). This model is called the one-parameter model because it considers one type of item characteristic (difficulty). Let \(p_{ji}=\textbf{P} \big (Y_{ji} =1 | \theta _j\big )\) and write the marginal likelihood function for model (6) as

$$\begin{aligned}{}&L({\Theta }) = \prod _{j=1}^N \int \prod _{i=1}^I p_{ji}^{Y_{ji}} (1-p_{ji})^{1-Y_{ji}} \pi (\theta ) d\theta , \end{aligned}$$

(7)

where \(\Theta = \big (\beta _1, \cdots , \beta _I, \sigma ^2 \big )^T \) and \(\pi (\theta )\) is the prior of \(\theta \).

Metaheuristics has been shown that it can provide superior performance over statistical methods. For instance61, tackled the challenge of deriving the maximum likelihood estimates for parameters in a mixture of two Weibull distributions with complete and multiple censored data. Their simulation outcomes indicated that the Particle Swarm Optimization (PSO) frequently outperformed the quasi-Newton method and the EM algorithm in terms of bias and root mean square errors.

In this study, we present similar results and show that the nature-inspired metaheuristic algorithm Mutation Algorithm (CSO-MA) can also give more precise maximum likelihood estimates compared to three of its competitors: PSO, the Bock-Aitkin’s method, and the Cat Swarm Algorithm (CA). PSO is legendary and an exemplary nature-inspired swarm based algorithm and CA was introduced by62, and its effectiveness as an optimizer for a single objective function was demonstrated in63, where they showed its superior competitive edge against several contemporary top-performing algorithms.

We employed the “Verbal Aggression” data set the R Archive64 and let NLL denote the minimized value of the negative log-likelihood function. Table 4 displays the NLLs from the 4 algorithms, where a swarm size of 30 was used for the 3 metaheuristic algorithms. The hyper-parameter for CSO-MA, was set to \(\phi =0.3\), and the hyper-parameters for PSO and CA were set to the default values in the R package metaheuristicOpt65. Evidently, CSO-MA has the smallest NLL value and is the winner. The estimated NNL values from CSO-MA, PSO, and Bock-Aitkin are similar, but that from CA is not, suggesting that CA appears less reliable since its estimated NLLs (gold points and lines on the left panel do not come close to the others.

Full size table

Figure 3 presents a two-panel visualization. The upper panel illustrates the estimated parameters derived from the four algorithms: CSO-MA, Bock-Aitkin, PSO, and CA. Here, the x-axis represents all 24 parameters (encompassing 23 items in addition to the variance parameter) in the model, while the y-axis depicts their estimated values. The lower panel delineates the progression trajectories of the negative log-likelihood functions of the four algorithms, spanning about 100 function evaluations. The left panel shows that except for the CA algorithm, Bock-Aitkin, PSO and CSO-MA give similar parameter estimates; the right panel shows that Bock-Aitkin converges fastest in terms of number of function evaluations while PSO is the slowest. However, CSO-MA has the smallest negative log-likelihood value, or equivalently, the largest log-likelihood value.

The left panel shows estimated parameters from the four algorithms: CSO-MA, Bock-Aitkin, PSO and CA. The x-axis refers to all 24 parameters (23 items plus the variance parameter) in the model and the y-axis refers to the estimated parameter values. The right panel shows the trajectories of the negative log likelihood functions of the four algorithms as they evaluate the negative log-likelihood functions about 100 times..

Full size image

M-estimation for Cox regression in a Markov renewal model

In this subsection, we show CSO-MA can solve estimating equations and produce M-estimates for model parameters, that are sometimes more efficient than those from statistical packages. Askin et al. (2017)66 correctly noted that metaheuristics is rarely used to solve estimating equations in the statistical community.

In a survival study, the experience of a patient may be modelled as a process with finite states67 and modelling is based on transition probabilities among different states. We take bone marrow transplantation (BMT) as an example. BMT is a primary treatment for leukemia but has major complications, notably Graft-Versus-Host Disease (GVHD), where transplanted marrow’s immune cells react against the recipient’s cells in two forms: Acute (AGVHD) and Chronic (CGVHD). The main treatment failure is death in remission, often seen in patients with AGVHD or both GVHD types, occurring unpredictably before relapse. The term “death in remission” in the context of leukemia refers to the death of a patient who is in remission from leukemia. This means the patient has achieved remission, where there are no detectable leukemia cells in the body, but they died from other causes that are not directly related to the active progression of leukemia. However, both AGVHD and CGVHD reduce leukemia relapse risks. Hence, there’s a five-state model: transplant (TX), AGVHD, and CGVHD are temporary states, while relapse and death in remission are absorbing states68. Figure4 shows the possible transitions among different states (i.e., TX, AGVHD, CGVHD, Relapse and Death).

A five-state Markov renewal model for BMT failure. Reproduced from68. TX = Transplant, AGVHD = Acute Graft-Versus-Host Disease, CGVHD = Chronic Graft-Versus-Host Disease, Relapse = Relapse of leukemia, Death in Remission = Death of a patient who is in remission from leukemia.

Full size image

To model such a process in a mathematically rigorous way, we assume observations on each individual form a Markov renewal process with a finite state, say \(\{1,2,\cdots , r\}\)69. That is, we observe a process \((X, T)=\{(X_n, T_n):n\ge 0\}\) where (for simplicity, we do not consider censoring in this subsection), and \(0=T_0<T_1<T_2<\cdots \) are calendar times of entrances into the states \(X_0, X_1, \cdots , X_n\in \{1,2,\cdots ,r\}\). In the BMT example, \(r=5\) and \(X_n\) takes values in \(\{\)TX, AGVHD, CGVHD, Relapse, Death in Remission\(\}\) and \(W_i=T_n-T_{n-1}\) represents the sojourn time staying in the state \(X_n\). We also observe a covariate matrix \(\textbf{Z}=\{\textbf{Z}_{ij}:i,j=1,2,\cdots ,r\}\) where each \(\textbf{Z}_{ij}\) itself is a vector. In practice, we assume that the sojourn time \(W_n\) given \(X_{n-1}=i\) and \(\textbf{Z}\) has survival probability70

$$\begin{aligned} \textbf{P}(W_n>x|X_{n-1}=i, \textbf{Z})=\exp \left( -\sum _{k=1,k\not =i}^rA_{0,ik}(x)e^{\beta ^TZ_{ik}}\right) \end{aligned}$$

and the transition probability is (\(i\not =j\))

$$\begin{aligned} \textbf{P}(X_{n}=j|X_{n-1}=i, W_n)=\frac{\alpha _{0,ij}(W_n)e^{\beta ^TZ_{ij}}}{\sum _{k\not =i}\alpha _{0,ik}(W_n)e^{\beta ^TZ_{ik}}}, \end{aligned}$$

where \(\beta \) is the parameter of interest, \(A_{0,ik}(x)=\int _0^x\alpha _{0,ik}(s)ds\) is the baseline cumulative hazard from state i to state k and \(\alpha _{0, ik}(x)\) is the hazard function from state i to state k71. Suppose we observe M iid individuals and suppose the risk process for an individual is given by \(Y_{i}(x)=\sum _{n\ge 1}\mathbb {I}(W_n\ge x, X_{n-1}=i)\). For a fixed x, \(Y_i(x)\) counts the number of visits to state i with sojourn time more than x for a particular individual. In the five-state model in Figure4, since we cannot revisit the states that we have already exited, \(Y_i(x)\) is a binary variable. Then from68,72,73, the estimating equation for \(\beta \) is

$$\begin{aligned}{}&\textbf{U}(\beta )=\sum _{m=1}^M\sum _{i\not =j}^r\int _0^\infty \left[ \textbf{Z}_{ijm}-\frac{S_{ij}^{(1)}(x,\beta )}{S_{ij}^{(0)}(x,\beta )}\right] dN_{ijm}(x). \end{aligned}$$

(8)

Here \(N_{ijm}(x)=\sum _{n\ge 1}\mathbb {I}(T_n\le x, X_n=j, X_{n-1}=i)\), \(S_{ij}^{(0)}(x,\beta )=\frac{1}{M}\sum _{m=1}^MY_{im}(x)e^{\beta ^TZ_{ijm}}\) and \(S_{ij}^{(1)}(x,\beta )\) is the first partial derivative of \(S_{ij}^{(0)}\) with respect to \(\beta \). The M-estimates of \(\beta \) are obtained by solving \(\textbf{U}(\beta )=0\). To apply CSO-MA to obtain the estimates, we turn the problem of solving \(\textbf{U}(\beta )=0\) into a minimization problem as follows:

$$\begin{aligned}{}&\widehat{\beta }=\arg \min _\beta \Vert \textbf{U}(\beta )\Vert _p \end{aligned}$$

(9)

where \(p\in [1,\infty ]\) is a user-selected constant. If the solution exists for \(\textbf{U}(\beta )=0\), then we have \(\min \Vert \textbf{U}(\beta )\Vert _p=0\) for any \(p\ge 1\). Using metaheuristics to creatively solve the system of nonlinear equations74,75, results from our simulation study suggest that the choice of p does not affect the convergence speed of CSO-MA nor the estimated parameters.

For simulation, we set \(p=2\) and assume \(r=3\), \(A_{0, ij}(x)=0.5x\) for all \(i\not =j\), the true parameter vector \(\beta =(0.901, 0.759, 0.348)^T\) and elements of the covariance matrix \(\textbf{Z}\) are random uniform variates from \([-1, 1]\). In total, we generated \(M=100\) individuals and the left panel of Figure5 shows one of the realizations. The swarm size for CSO-MA was 20 and we ran it for 100 function evaluations and the right panel of Figure5 shows the convergence of CSO-MA. The estimated parameter is \(\widehat{\beta }=(0.908, 0.753, 0.329)^T\), which is close to the true value. The observed vector of biases \((0.007,0.006,0.017)^T\) is likely due to both the optimization algorithm and the method of partial likelihood itself. The first issue can be reduced by trying using different initialized values of CSO-MA and the second issue may be solved by having a larger sample size so that consistency of the estimators is guaranteed theoretically. For space consideration, we omit additional simulation results that support the effectiveness of CSO-MA for estimating the true parameters correctly.

Application of CSO-MA to find M-estimates for a Cox regression in a Markov renewal model. The left panel is one of the realizations of 100 individuals; the red dots represent the jump times and the transitions for the pair \((X_n, T_n)\). The right panel shows the convergence trajectory of CSO-MA.

Full size image

To further investigate the scalability of CSO-MA and compare it with other algorithms, we perform another simulation study where the state space of \(X_i\) consists of two, i.e., \(\{1,2\}\) and 2 is an absorbing state. Consequently, the Markov renewal model is equivalent to a two-state Markov model or a Cox proportional hazards model71, the sample size is \(M=10,000\) and the \(\beta \) parameter has is the \(100\times 1\) vector with all entries equal to 1. The elements of the covariance matrix \(\textbf{Z}\) are again generated uniformly from \([-1,1]\) to mimic the high-dimensional scenario in statistical applications76. The simulation is performed on the Matlab 2023a platform. Instead of minimizing the norm of \(\textbf{U}(\beta )\), we minimize the negative partial log-likelihood (NLL) value68. We compare CSO-MA with PDO and Runge Kutta optimization (abbreviated as RUN, it is another recently proposed metaheuristics49) in terms of their optimum values, stability and running time. The results are given in Table5. We run each algorithm 30 times to get reasonable statistical results and the number of function evaluation is set to 1000, the swarm size for each algorithm is set to 30. The results suggest that RUN performs the best in terms of NLL and its stability; The CSO-MA has the best performance in terms of average elapsed time and PDO is the slowest among the three algorithms.

Full size table

Matrix completion (missing data imputation) in a two-compartment model

In real studies, such as clinical trials, missing or incomplete data is omnipresent. They occur in computer vision, clinical trials and genomics, just to name a few77. Missing data also appear a lot in a recommendation or recommender system, which is defined as a decision making strategy for users under complex information environments78; see79 for an overview of this emerging area of research to alleviate the problem on information load. The best strategy in dealing with missing data is to avoid having them in the first place. This would require constant monitoring of the data and filling in the missing data as soon as they are discovered. Despite the best efforts, missing data abounds and pose problems in data analysis. Matrix completion is the task of filling in the missing entries of a partially observed matrix that represents the data structure. In many instances, the task is equivalent to performing data imputation in statistics. The leads to matrix completion problems and they occur across disciplines. Ensembled models have also been built based on matrix completion for computational drug repurposing to fight the virus SARS-COV-280.

In this subsection, we apply CSO-MA to a missing data imputation problem in a non-linear Gaussian regression model using simulated data. Missing data is ubiquitous in all research fields. Imputation is one of the most common ways to fill in and analyze missing data81 and the Expected Maximization (EM) method54 is a popular choice for imputing multivariate normal data. We briefly describe the problem and the EM algorithm below.

Suppose that \((Y_1,Y_2)\in \mathbb {R}^2\) has a bivariate normal distribution with mean \(\mu (\theta )=(\mu _1(x,\theta ),\mu _2(x,\theta ))^T\) and a known covariance matrix \(\Sigma =\left( \begin{matrix} \sigma _1^2&{} \quad \rho \sigma _1\sigma _2\\ \rho \sigma _1\sigma _2&{} \quad \sigma _2^2 \end{matrix}\right) \) where \(\theta \) is a vector of parameters characterizing \(\mu \) and x is (possibly) a vector of covariates. We observe n realizations \(y_i=(y_{i1},y_{i2})^T, i=1,2,\cdots ,n\) and \(y_{ij}\) contains missing values for some i and j. Let \(Y_{(0)}\) and \(Y_{(1)}\) denote the observed and missing parts, respectively. On page 250-251 in Little and Rubin (2019)81, at the \((t+1)^{th}\) iteration, the E step of the algorithm calculates

$$\begin{aligned} \textbf{E}\left( \sum _{i=1}y_{ij}\Big |Y_{(0)},\theta ^{(t)}\right) =\sum _{i=1}^ny_{ij}^{(t+1)} \end{aligned}$$

and

$$\begin{aligned} \textbf{E}\left( \sum _{i=1}y_{ij}y_{ik}\Big |Y_{(0)},\theta ^{(t)}\right) =\sum _{i=1}^n\left( y_{ij}^{(t+1)}y_{ik}^{(t+1)}+c_{jki}^{(t+1)}\right) \end{aligned}$$

for \(j,k=1,2,\cdots ,K\) where

$$\begin{aligned} y_{ij}^{(t+1)}={\left\{ \begin{array}{ll} y_{ij}&{} \quad \text { if }y_{ij}\in Y_{(0)}\\ \textbf{E}\left( y_{ij}\Big |Y_{(0)},\theta ^{(t)}\right) &{} \quad \text { if }y_{ij}\in Y_{(1)} \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} c_{jki}^{(t+1)}={\left\{ \begin{array}{ll} 0&{} \quad \text { if }y_{ij}\text { or }y_{ik}\text { is observed.}\\ Cov\left( y_{ij},y_{ik}\Big |Y_{(0)},\theta ^{(t)}\right) &{} \quad \text { if }y_{ij},y_{ik}\in Y_{(1)}. \end{array}\right. } \end{aligned}$$

After the E-step, missing values are replaced by the conditional expectation derived above. Next, for the M-step, we maximize the following conditional log-likelihood with respect to \(\theta \) using CSO-MA:

$$\begin{aligned}{} & {} \textbf{E}\left( l(\theta |Y_{(0)},Y_{(1)})\Big |Y_{(0)},\theta ^{(t)}\right) = - \frac{1}{2}\sum _{i=1}^n\left( \textbf{y}_i^{(t+1)}-\mu (x_i,\theta )\right) ^T\Sigma ^{-1}\left( \textbf{y}_i^{(t+1)}-\mu (x_i,\theta )\right) + C \end{aligned}$$

(10)

where \(\textbf{y}_i^{(t+1)}=(y_{i1}^{(t+1)},\ y_{i2}^{(t+1)})\) and C is a constant independent of \(\theta \). Section 8.6 in Wild and Seber (1989)82 (page 414) illustrates a two-compartment model with (see also chapter 7 in83)

$$\begin{aligned} y_{ij}=\mu _j(x_i,\theta )+\epsilon _{ij},i=1,2,\cdots ,n,\ j=1,2, \end{aligned}$$

where x refers to time, \((\epsilon _{i1},\epsilon _{i2})^T\) are independently drawn from \(N_2\left( 0,\Sigma \right) \), and the two means are

$$\begin{aligned}{} & {} \mu _1(x,\theta )=\theta _1e^{-\theta _2 x}+(1-\theta _1)e^{-\theta _3x},\ \mu _2(x,\theta )=1-(\theta _1+\theta _4)e^{-\theta _2x}+(\theta _1+\theta _4-1)e^{-\theta _3x}, \end{aligned}$$

where

$$\begin{aligned} \theta _4=\frac{(\theta _3-\theta _2)\theta _1(1-\theta _1)}{(\theta _3-\theta _2)\theta _1+\theta _2}. \end{aligned}$$

Suppose at some time point x, the operator failed to record either \(y_{i1}, y_{i2}\) or both and we observe \(Y_{(0)}\) and \(Y_{(1)}\) (n observations in total). To make inference about \(\theta \), however, we still want to make use of the partially observed data. In this case, we apply the EM algorithm described above to maximize the conditional likelihood(10).

We analyze a real data set to illustrate this idea. The data set comes from Beauchamp and Corenell (1966)84, see also section 11.2 in Wild and Seber (1989)82. We randomly mask some of the values of the data in to be missing in Table6 and denote them by NA.

Full size table
Full size table

Using the complete observations, we estimated the covariance \(\Sigma \) to be \(\left( \begin{matrix} 0.075 &{} - 0.06\\ - 0.06 &{} 0.06 \end{matrix}\right) \) and in the original paper, using full data, the authors’ estimated the parameters to be \(\widehat{\theta }=(0.060, \ 0.007,\ 0.093)^T.\) For the EM algorithm, we set the initial \(\theta \) to be \((0.381,\ 0.021,\ 0.197)\) and ran CSO-MA for 200 iterations with \(\phi =0.3\). The whole algorithm alternates between computing expression(10) and applying CSO-MA to maximize(10). We ran 10 iterations in total and the imputed results are given in Table7. We further perform a simulation study (not reported here) with sample size \(n=80\) and 40 missing values in total. The true parameter \(\theta \) is \((0.4, 0.05, 0.3)^T\) and the initial value for the EM algorithm is \((0.1, 0.1, 0.1)^T\). The algorithm terminates after 5 iterations, with the estimated parameter value \(\widehat{\theta }=(0.392, 0.056, 0.275)^T\). This shows that CSO-MA performs well in its role as an optimizer.

A variable selection problem in ecology

In addition to numerous applications of metaheuristics in engineering and computer science, metaheuristics has also found applications ranging from addressing substantiability issues85 to land use19 and agriculture58. See also86, who used metaheuristic algorithms to design placements of the groundwater wells in the Los Angeles Basin.

In this subsection, we apply CSO-MA to a penalized linear regression problem in ecology. Model selection is essential in much of ecology because ecological systems are often too large and slow-moving for our hypotheses to be tested through manipulative experiments at the relevant temporal and spatial scales87.

The data comes from a plateau lake in Yunnan, China, and was collected by a group of researchers at the Department of Environmental Engineering, Tsinghua University in 2019. They took water samples in March (Spring), June (Summer), September (Autumn) and December (Winter). At each time, 30 sites were sampled from different parts of the waterway. Due to weather issues at the plateau lake in June, data from 6 sites were not recorded. Therefore, the total number of samples is 114 (\(=30\times 4 -6 \text { records}\)). The outcome variable is CRAP and the goal is to determine if and how 17 key variables affect the outcome. Table 8 lists all the regression variables and for space consideration, we only display in the same table, the first two sets of measurements from the \(114\times 18\) data matrix.

Full size table

Cyanobacteria can form dense and sometimes produce algal toxins. In extreme cases, the cyanobacteria bloom, with high cyanobacterial density or high proportion of cyanobacteria in phytoplankton, can threaten the aquatic ecosystem, fisheries and safety of the water for human drinking. Over the years, the cyanobacterial blooms increase in frequency, magnitude and duration globally88. The cyanobacteiral bloom is influenced by the surrounding environment. To effectively control and prevent the cyanobacterial bloom, one of the most important scientific questions is how other factors affect CRAP (Cyanobacteria relative abundance in Phytoplankton). High values of CRAP often indicate cyanobacterial bloom. Therefore, if we can control the key factors that are associated with CRAP, we can improve environmental health dramatically.

Linear regression analysis is a default choice for detecting association and outliers. We expect that many covariates are correlated. For example, NH4-N and NO3-N are highly correlated with TN. Thus, in reality, some measurements are more important than others to ecologists. In statistics, variable selection and penalized regression methods are proposed to address this issue. In what is to follow, we use CSO-MA and a penalized regression method known as smoothly clipped absolute deviation (SCAD)89 to selected variables into the model.

Let y be the vector of CRAP responses in the linear model, let X be the covariate matrix containing variables Depth to F and each column of X is standardize by subtracting its mean and dividing its standard deviation so that each column of X has mean 0 and standard deviation 1. This standardization step is crucial because we want to analyze the relative influence of the variables on CRAP and having different scales can cause confusion. All these variables are listed in Table 8. Let \(\beta \) be the vector of unknown parameters to be estimated by solving the following optimization problem:

$$\begin{aligned}{}&\min _\beta \ \Vert y-X\beta \Vert _2^2 + \rho \left( \sum _{i=1}^pP(\beta _j|\lambda ,a)\right) , \end{aligned}$$

(11)

where \(\rho \) is the regularization parameter, \(a,\lambda \) are tuning parameters and

$$\begin{aligned} P(\beta _j|\lambda ,a)= {\left\{ \begin{array}{ll} \lambda |\beta _j| &{}\text { if }|\beta _j|\le \lambda \\ \frac{a\lambda |\beta _j|-\beta _j^2-\lambda ^2}{a-1} &{}\text { if }\lambda <|\beta _j|\le a\lambda \\ \frac{\lambda ^2(a+1)}{2} &{}\text { if }|\beta _j|>a\lambda \end{array}\right. } \end{aligned}$$

is a differentiable and non-convex function and is called the SCAD penalty. The parameter \(\rho \) controls the degree of shrinkage applied to the coefficients. A larger \(\rho \) increases the penalty on the coefficients, driving them toward zero, and thus, helps in preventing overfitting by enforcing sparsity in the model. We set \(a=2.5\) and \(\lambda =1\), apply SCAD regression to the data (X,y) for different choices of \(\rho \) (see formula (11)) and optimize it using CSO-MA algorithm. We set 12 different values for \(\rho \), i.e., \(10^{-6}, 10^{-5}, 10^{-4}, 10^{-3}\), 0.01, 0.025, 0.05, 0.1, 0.2, 0.5, 1, 10, 100. For each \(\rho \), we record the best particle position found by CSO-MA as our estimation for \(\beta \). The CSO-MA algorithm is initialized with 25 particles and iterates 100 times (i.e., 100 function evaluations). We run the algorithm 50 times for each \(\rho \) to analyze the stability of CSO-MA. For illustration purpose, we demonstrate the average and standard deviation of the 50 runs when \(\rho =0.025\) and the results are shown in Table9; further, the average minimum of (11) when \(\rho =0.025\) is 0.315 with a standard deviation of 0.0009 (the other \(\rho \)’s have similar standard deviation and minimum values), suggesting the stability of CSO-MA algorithm.

Full size table

Figure 6 illustrates the solution path of SCAD using the CSO-MA algorithm. The x-axis represents the scaled \(\rho \) values. When \(\rho \) decreases from 100, estimation of turbidity (T) deviates from 0 at first. It suggests that turbidity is one of the most important measurements associating with the level of DRAP. One possible reason for such phenomenon is that the turbid water prevents light from penetrating, which in turn indicates a lower amount of the algae carrying out photosynthesis. Further, temperature (T) is another variable deviating from 0 at first. The reason is that the optimum temperature for algae growth is \(20+\ C^{\circ }\) and the lower the temperature, the less active the metabolism of algaeis. In addition, when \(\rho \) decreases from 0.05 to 0.01 (x from 7 to 5), parameter estimation for chemical elements, such as K, Mg, Na, all deviates from 0, suggesting that the concentration of chemical elements has slightly different association of CRAP.

This subsection shows CSO-MA can be usefully applied along with SCAD penalized regression to explore association among different components of water quality and how that affect the outcome CRAP. The interpretation of the solution path is in line with scientific common sense.

Solution path of SCAD using CSO-MA. Each line represents the trajectory of an estimated coefficient for a predictor variable across the ordered values of the regularization parameter \( \rho \). The y-axis denotes the estimated coefficient values. The x-axis corresponds to the ordinal position of each \( \rho \) value in the set \( 10^{-6}, 10^{-5}, \ldots , 100 \) , which have been rescaled to 1, 2, ..., for clarity of presentation.

Full size image
Applications of nature-inspired metaheuristic algorithms for tackling optimization problems across disciplines (2024)

FAQs

What are the applications of nature inspired optimization algorithms? ›

Nature-inspired optimized algorithms such as genetic algorithms, particle swarm optimization, and ant colony optimization have proven to be effective tools for solving complex optimization problems in a variety of fields such as engineering, finance, and healthcare.

What kind of problems can be solved with metaheuristic algorithms? ›

Since metaheuristics can solve multiple-objective multiple-solution and nonlinear formulations, they are employed to find high-quality solutions to an ever-growing number of complex real-world problems, such as the combinatorial ones.

What are the advantages of nature inspired algorithms? ›

However, by using these nature inspired algorithms, the problem can be solved with less computational efforts and time complexity. These algorithms use a stochastic approach to find the best solution in the large search space of the problem.

What is an example of a metaheuristic algorithm? ›

Famous examples of metaheuristics are genetic algorithms, particle swarm optimization, simulated annealing, and variable neighborhood search.

What are the application of optimization methods? ›

Optimization methods are used in many areas of study to find solutions that maximize or minimize some study parameters, such as minimize costs in the production of a good or service, maximize profits, minimize raw material in the development of a good, or maximize production.

What is a real life application of algorithm? ›

Examples of Algorithms in Everyday Life
  • Tying Your Shoes. Any step-by-step process that is completed the same way every time is an algorithm. ...
  • Following a Recipe. ...
  • Classifying Objects. ...
  • Bedtime Routines. ...
  • Finding a Library Book in the Library. ...
  • Driving to or from Somewhere. ...
  • Deciding What to Eat.
Aug 18, 2022

What are the applications of metaheuristic algorithm? ›

Popular metaheuristics for combinatorial problems include genetic algorithms by Holland et al., scatter search and tabu search by Glover. Another large field of application are optimization tasks in continuous or mixed-integer search spaces. This includes, e.g., design optimization or various engineering tasks.

What are the advantages of metaheuristic optimization techniques? ›

Metaheuristic algorithms provide effective and robust solutions by randomizing and selecting the best or near-optimal solutions, improving the optimization performance of neural networks .

What are the metaheuristic methods for optimization? ›

A metaheuristic method helps in solving the optimization problem. Problems in optimization can be found in many daily life aspects. The kinds of the metaheuristic method are various which are ant colony optimization (ACO), genetic algorithm (GA), and particle swarm optimization (PSO).

What are the latest nature inspired optimization algorithms? ›

Recent Nature Inspired algorithms include t Social Spider Algorithm, the Bat Algorithm, the Strawberry Algorithm, the Plant Propagation Algorithm, the Seed Based Plant Propagation Algorithm, Monkey Optimization etc. The latest one is SOCIAL SPIDER ALGORITHM.

What are nature inspired algorithms in AI? ›

These algorithms simulate the behaviors and processes observed in natural systems, such as evolution, natural selection, swarm behavior, and genetic inheritance, to solve complex problems in various domains.

What do you understand by nature inspired optimization techniques? ›

1. Introduction. Nature-inspired optimization algorithms (NIOAs), defined as a group of algorithms that are inspired by natural phenomena, including swarm intelligence, biological systems, physical and chemical systems and, etc..

What are the best metaheuristic algorithms? ›

Table 1
SNAlgorithmReferences
1Across Neighbourhood Search (ANS)Wu (2016)
2Adaptive Social Behavior Optimization (ASBO)Singh (2013)
3African Buffalo Optimization (ABO)Odili et al. (2015)
4African Vultures Optimization Algorithm (AVOA)Abdollahzadeh et al. (2021a)
87 more rows

What are the benefits of metaheuristics? ›

Metaheuristics are a powerful tool for solving complex problems. They offer flexibility and robustness to the problem-solving process, enabling solutions that would otherwise be difficult or impossible to attain.

What is a metaheuristic algorithm? ›

A metaheuristic algorithm simulates a certain animal behavior or evolutionary process by randomly searching in a given manner and finds the optimal solution in this manner.

What is the application of Astar algorithm? ›

Applications of A* Algorithm

It has applications in robotics, video games, route planning, logistics, and artificial intelligence. In robotics, A* helps robots navigate obstacles and find optimal paths. In video games, it enables NPCs to navigate game environments intelligently.

What are the examples of optimization in nature? ›

Perfect circles and spheres, the shape of hanging chains, the trajectory of planets and projectiles, the shape of soap bubbles and snowflakes, the forms of planets and galaxies, the formation of rocks and crystals, all have one thing in common: they are all solutions to certain optimization problems, trying to minimize ...

What are the applications of cuckoo search algorithm? ›

According to the statistical results, Cuckoo Search Algorithm (CSA) has been implemented in different area to optimize the solution. The major domains where CSA implements are image processing, pattern recognition, software testing, data mining, cyber security, cloud computing, IoT etc.

What are the applications of dragonfly algorithm? ›

Furthermore, the results of the applications that utilized the dragonfly algorithm in applied science are offered in the following area: machine learning, image processing, wireless, and networking. It is then compared with some other metaheuristic algorithms.

Top Articles
Latest Posts
Article information

Author: Stevie Stamm

Last Updated:

Views: 5633

Rating: 5 / 5 (60 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Stevie Stamm

Birthday: 1996-06-22

Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

Phone: +342332224300

Job: Future Advertising Analyst

Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.