Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article explains what's involved in integrating ontology data from digital twin builder (preview) with machine learning (ML) for use in predictive modeling. Analysts and data scientists can use automated ML (AutoML) or custom ML models to extract value from semantic relationships in operational data.
Important
This feature is in preview.
Why use a digital twin builder ontology in ML?
Ontology data represented in digital twin builder (preview) enriches ML models by:
- Providing context to raw sensor data (like linking sensors to equipment)
- Improving feature engineering with structured knowledge
- Enhancing model accuracy by incorporating domain relationships
Steps to using ontology data with ML
The following sections describe the typical steps involved in using ontology data from digital twin builder (preview) in ML workflows.
Step 1: Ingesting ontology data
First, you need to pull ontology data into a queryable format. Ontology data usually includes:
- Entity relationships (like which sensor belongs to which equipment)
- Hierarchical mappings (like production line > equipment > sensors)
- Static metadata (like equipment type, location, and operational limits)
The following chart represents an example of ontology data.
Sensor ID | Equipment ID | Equipment type | Production line | Site |
---|---|---|---|---|
S001 | D101 | Distiller | Line A | Site A |
S002 | C201 | Condenser | Line B | Site B |
Here are some ways you can enrich and use this data:
- Join it with time series data, which adds context to raw sensor readings
- Aggregate relationships (like collecting the count of sensors for each piece of equipment)
- Filter by hierarchy (allows you to isolate data like failures in a specific plant location)
Step 2: Transforming ontology data for ML
Once ontology data is available, you can transform it for use in ML models.
This process might involve:
- Joining with time series data (like time series sensor readings)
- Deriving new features (like average equipment temperature or pressure trends)
- Creating categorical features (like equipment types or cooling medium)
The following chart represents an example of ontology data that went through feature engineering.
Process ID | Equipment ID | Failure | Mean pressure | Cooling type | Site |
---|---|---|---|---|---|
DP001 | D101 | No | 1.5 bar | Air-cooled | Site A |
DP002 | C201 | Yes | 2.3 bar | Water-cooled | Site B |
At this stage, ontology data is ready to be used in an ML model.
Step 3: Extending with AutoML or custom models
After preparing the dataset, you can choose one of these ML strategies based on your project needs:
- AutoML: Best for quick experimentation and optimization
- Custom ML: Best for full control and fine-tuned performance
AutoML
AutoML simplifies ML by automatically selecting the best model and hyperparameters. The expected outcome is that AutoML returns the best-performing model without manual tuning.
To use AutoML, follow these steps:
- Feed transformed data into an AutoML tool (like Azure AutoML or FLAML)
- Define the prediction target (like failure probability)
- Let AutoML optimize the model (like choosing between XGBoost and Random Forest)
Here are some examples of AutoML tools:
- Azure AutoML (Cloud-based, full automation)
- FLAML (Python-based, lightweight)
Custom ML
If you need full control over your models, you can use a custom model approach.
To work with a custom model, follow these steps:
- Select a model (like XGBoost, Random Forest, or Neural Networks)
- Manually engineer features (like rolling averages or anomaly detection)
- Train and evaluate your model using standard ML libraries (like scikit-learn or PyTorch)
Say you want to predict equipment failures using historical data and ontology relationships. An example ML pipeline might include these steps:
- Merge ontology and time series data
- One-hot encode categorical features
- Train a custom model using RandomForestClassifier or XGBoost
- Evaluate model performance using
accuracy_score
orf1_score