改进黑翅鸢算法优化的XGBoost可解释模型在转基因棉籽油太赫兹光谱鉴别中的应用

  • 打印
  • 收藏
收藏成功


打开文本图片集

关键词:太赫兹光谱;转基因棉籽油;极端梯度提升;改进黑翅鸢算法;可解释性分析中图分类号:0439 文献标识码:Adoi:10. 37188/OPE.20253320.3192 CSTR:32169.14.OPE.20253320.3192

Abstract: To achieve accurate classification and identification of genetically modified and non-genetically modified cotonseed oil,this study proposes an explainable clasification model based on an improved black-winged kite algorithm optimized extreme gradient boosting(XGBoost)model. First,a terahertz time-domain spectroscopy(THz-TDS)system was used to colect terahertz absorption spectra of genetically modified and non-genetically modified cottonseed oil samples in the O.3-1.8 THz frequency range.Then,the traditional Black-winged Kite algorithm(BKA)was improved by introducing a dual-objective fitnessfunction optimization strategy,a reverse learming initial population strategy,and a Rayleigh distribution function to control the Lévy flight strategy.The improved Black-winged Kite algorithm (DLBKA) was used to perform dual-objective hyperparameter optimization of the tree depth,learning rate,and maximum iteration count of the XGBoost model,thereby constructing the DLBKA-XGBoost classification model.Finally,the model was applied to identify genetically modified cotonseed oil,and the model's identification results were analyzed for interpretability using the SHAP method. The results showed that the improved Black-winged Kite Algorithm-optimized XGBoost interpretable clasification model not only improved the accuracy of identifying genetically modified and non-genetically modified cottonseed oil(with a test set accuracy as high as 97.78% ,an improvement of 4.45% over the traditional Black-winged Kite algorithm-optimized model,an improvement of 14.45% over the traditional Whale Optimization Algorithm(WOA)-optimized model),but also provided explanations for the model,clarifying the positive influence mechanism of key feature frequencies on the identification results,thereby enhancing the model's transparency and credibility. Therefore,this study provides a fast and accurate analytical method for the identification of geneticallymodified cottonseed oil and offers valuable references for the identification of other genetically modified substances.

Key words: Terahertz spectroscopy; genetically modified cotonseed oil; extreme gradient boosting; improved black-winged kite algorithm;explainability analysis

1引言

棉花作为全球重要的纤维作物,在80余个国家广泛种植,年产量达2500万吨左右。(剩余14921字)

monitor
客服机器人