industrial enterprises generate a large amount of data that is worth exploring to serve their business. with the development of artificial intelligence (ai) and data mining technologies, a large number of open-source and commercial data modeling solutions have emerged in the market. however, due to the complexity of these solutions, it is difficult for industrial enterprises to build high-quality models based on these solutions and apply the models to their business. as a result, data waste and input-output imbalance frequently occur. to address this issue, nanjing tianfu software co., ltd. develops dtempower, a data modeling software, after mining the data modeling requirements of industrial enterprises.
dtempower provides a large number of algorithms to power every data modeling step, such as data cleaning, feature generation, sensitivity analysis, and model training. based on in-depth algorithm r&d for specific scenarios, dtempower uses the intelligent scheduling engine and hyperparameter optimization technologies to improve model quality and reduce the skill requirements for data modeling. in addition, dtempower provides a graphical modeling and development environment. users can drag algorithms to call them and draw lines to pass data. this dramatically lowers the barrier to using dtempower. the powerful algorithms and simple operations of dtempower enable anyone to quickly build excellent models.
models are reusable. dtempower defines a model exchange format called dt model to reuse models. a generated model can be saved as a single dt model. dt models can be called in dtempower run (dtrun), which is the model running module of dtempower, aipod, and other software to assist in various scenarios, such as real-time warning and design optimization.
to achieve the goals of "lowering the barrier to data modeling and improving the utilization efficiency of models", dtempower provides an all-in-one solution to power the entire process from modeling to model management and application in typical industrial application scenarios. this way, industrial enterprises can focus on their business without the need to spend excessive energy on data analysis, thereby utilizing the value of data.
(1) advanced and comprehensive algorithm toolbox
the core features of dtempower are based on an advanced and comprehensive algorithm toolbox. dtempower provides a large number of algorithm controls to power every data modeling step, such as data cleaning, feature generation, sensitivity analysis, and model training. in addition, dtempower features self-developed algorithms, hyperparameter optimization, intelligent scheduling of combined basic algorithms, and focus on various scenarios. compared with similar algorithms in the market, the algorithms of dtempower deliver better model training effects with higher precision and stability.
figure 1 advanced and comprehensive algorithm toolbox of dtempower
(2) no-code modeling
dtempower provides a graphical method for setting up data modeling workflows. users can use modules in the toolbox to operate data and models. even users who do not have advanced knowledge in coding and algorithms can set up a complex data modeling workflow by simply dragging and connecting nodes.
figure 2 no-code setup of data modeling workflows
(3) table-based data pre-processing
dtempower allows users to pre-process data in an interactive manner like in tables. in addition to common table operations and shortcut keys, dtempower provides additional features for data analysis, such as data visualization, data check and batch processing, and data file merging. data sets generated through pre-processing can be imported to a modeling workflow to complete subsequent model training and other operations.
figure 3 table-based interactive data pre-processing
(4) intelligent data cleaning algorithm
abnormal data greatly affects the model quality. therefore, this type of data must be cleaned. tianfu develops the intelligent data cleaning algorithm aiod based on the features of data sets for industrial design. relying on a self-developed intelligent scheduling engine, aiod controls dozens of data cleaning algorithms to help users accurately mine abnormal data in data sets with one click.
figure 4 intelligent detection of abnormal data based on the self-developed aiod algorithm
(5) aiagent-based training for small-scale data sets
industrial design data features small data sets and uneven data distribution. to accommodate these features, tianfu develops the intelligent training algorithm aiagent. powered by ensemble learning algorithms, intelligent stratification and classification, hyperparameter optimization, and other technologies, dtempower allows users to obtain the optimal model with one click without the need for manual intervention in training.
figure 5 effect comparison between aiagent and other algorithms in training a wave-making resistance data set
(6) mechanism model integration
dtempower allows users to embed a mechanism model into a training workflow to improve the precision and interpretability of modeling. in addition, dtempower provides the model aggregation feature that allows users to integrate an input formula model and a data training model to generate a dt model. this way, the integration of data mining models and mechanism models is implemented.
figure 6 integration of data mining models and mechanism models
(7) intelligent optimization oriented to industrial design
a typical application scenario of dt models is optimization, such as product design optimization and device running optimization. tianfu aipod allows users to import dt models to computational workflows with one click and solves optimization issues by using the silverbullet algorithm.
figure 7 one-click import process of dt models in aipod for optimization
(8) intelligent monitoring for industrial o&m
based on the powerful data modeling capabilities of dtempower, the extended intelligent monitoring toolbox provides an intelligent warning algorithm for time series data. this algorithm can easily fix determination logic and evaluate the health status of parameters in terms of fluctuation, change trend, and deviation between parameter values and model-predicted values. this algorithm also generates warnings in case of exceptions.
figure 8 identifying exceptions in time series and providing information about the possible causes to help users quickly handle exceptions
(1) data cleaning and aiagent-based data training
this case uses a simulation data set based on the styblinski-tang function to demonstrate the effect of aiagent in dtempower. the styblinski-tang function is as follows. the objective of data modeling is to obtain a prediction model from x (x1 to x5) to y.
in this case, a comparison project is created to verify that aiagent surpasses the ensemble learning algorithm adaboost under the same configurations. figure 9 shows the comparison of the response surfaces of the models trained by aiagent and adaboost. the response surface of the model trained by aiagent is the same as the theoretical value and has higher precision than the result obtained by using adaboost.
figure 9 comparison of the response surfaces of the models trained by aiagent and adaboost
(2) data pre-processing and visualization
suitable data sets are necessary for modeling. an efficient and easy-to-use data pre-processing and visualization tool can yield improved data analysis and data modeling results with minimum effort.
in this case, samples from different data sources are drawn on the same scatter plot and displayed in different colors. most sample points are distributed on the same curve, which represents the normal running mode. some sample points from faulty data sets are deviated from the curve, which indicates possible exceptions.
figure 10 response surface of an aggregation model
(3) time series forecasting for parameters
time series forecasting involves forecasting changes based on historical data. this case forecasts the sewage treatment system parameters to demonstrate how dtempower builds a data-driven model based on large amounts of measurement data in nonlinear, complex, and dynamic biochemistry scenarios with strong external interference, time variation, and coupling.
by properly selecting external features and introducing feature engineering technologies such as mean decrease impurity (mdi) and principal component analysis (pca), dtempower achieves the following improvements: dtempower improves the abundance of input information, which increases the precision of model prediction. in addition, dtempower addresses the issues caused by excessive amounts of features, such as dimension explosion and difficulties in model training. the r2 metric of the forecasting model increased from 0.68 to 0.94.
figure 11 progressive level improvement of data modeling based on dtempower