Introduction
In large-scale infrastructure projects, accurate forecasting of diaphragm wall installation duration is critical for sequencing, resource planning, and risk mitigation. Leveraging Azure Machine Learning Designer, I developed a regression model to predict installation time using geotechnical and structural features. This article outlines the step-by-step methodology, from data preparation to deployment.
📊 Step 1: Data Preparation
The dataset comprised 1,095 records, each representing a diaphragm wall panel installation event. Key features included:
- Soil classification (e.g., SM, CL, ML)
- Panel thickness
- Substructure length (SubLength)
- Tunnel section identifiers
đź§Ľ Cleaning & Normalization
- Numerical features such as panel thickness and SubLength were normalized using MinMaxScaler and MaxAbsScaler to ensure consistent model behavior
- Categorical features like soil type and tunnel section were encoded using LabelEncoder and OneHotEncoder
- Panel ID was excluded from encoding, as it served only as a nominal identifier without predictive value
This preprocessing ensured that all features were scaled appropriately and semantically meaningful for model training.
🔀 Step 2: Train-Test Split with Stratification
To ensure fair evaluation:
- Data was split into 90% training and 10% testing
- Stratification was applied based on soil type, preserving class balance across both sets
This approach ensured the model could generalize across varied geotechnical conditions.
🌲 Step 3: Model Selection – Boosted Decision Tree Regression
Given the moderate dataset size, I selected Boosted Decision Tree Regression for its:
- Efficiency on small-to-medium datasets
- Ability to model non-linear relationships
- Robustness to outliers and missing values
- Interpretability via feature importance metrics
This model type is well-suited for structured construction data with mixed feature types.
⚙️ Step 4: Hyperparameter Tuning
Using Azure ML’s Tune Model Hyperparameters component:
- Sweeping mode was set to Entire Grid for exhaustive search
- Performance metric was Accuracy, interpreted as prediction closeness within acceptable tolerances
- Label column was defined as Duration
This ensured the model was optimized across all relevant parameter combinations.
âś… Step 5: Model Evaluation
The trained model was validated using the 10% holdout set. Key metrics:
| Metric | Value | Interpretation |
| R² Score | 0.145 | Modest variance explained |
| MAE | 1.83 days | Average prediction error |
| RMSE | 2.74 days | Penalizes larger errors |
| Rel. Abs Error | 0.867 | Error relative to naive mean predictor |
While the R² was modest, the MAE of 1.83 days is practical for early-stage planning and sequencing.
đź§ Step 6: Stacking Ensemble (Optional Extension)
To improve performance, I am exploring stacking:
- Trained multiple base models (Boosted Tree, Linear Regression, Decision Forest)
- Extracted predictions using Convert to CSV
- Combined outputs and trained a meta-model (e.g., LightGBM) on the stacked predictions
This ensemble approach improved generalization and reduced overfitting, especially across soil types.

Training Pipeline for predicting the duration of Dwall construction
🚀 Step 7: Deployment & Validation
Using Designer’s Real-Time Inference Pipeline:
- Added Web Service Input, Enter Data Manually (replacing the Dwall dataset) components

Inference Pipeline to enable the deployment
- Deployed to an Azure Container Instance (Some errors in getting the endpoints as shown in the status of endpoints below)

- Validated predictions using manual input and REST API calls.
This enabled real-time duration forecasting for new panel configurations, supporting proactive planning and risk control.
📌 Conclusion
Azure ML Designer provided a robust, visual framework for building, tuning, and deploying a predictive model tailored to construction workflows. The methodology balances technical rigor with operational practicality—making it ideal for engineering teams seeking reproducible, scalable ML solutions.