Wyniki wyszukiwania - BazTech

1

Key Factors to Consider when Predicting the Costs of Forwarding Contracts

Vu Quang Hieu, Cen Ling, Ruta Dymitr, Liu Ming

Annals of Computer Science and Information Systems

|

2022

|

Vol. 30

447--450

EN

Predicting the cost of forwarding contracts is atypical problem that logistics companies need to solve in order to optimize their business for a better profit. This is the challenge defined in the FedCSIS 2022 Competition where a five-year history of contract data and their delivery routes from a large Polish logistics company are provided to train a Machine Learning model. In addition to the contract data, historical wholesale fuel prices and euro exchange rates at the contract time are also provided. To address this challenge, we first designed a basic solution where we focused on feature engineering to find good impact features for the model. After that, the same set of features were used to train two different models: one using XGBoost and the other using LightGBM. The average predictions of the two boosting models were then used as the predictions for the next post-processing step. Finally, in the post-processing step, we designed and trained a simple linear regression model to capture the average monthly changes of the contract cost, given the changes of the fuel prices and euro exchange rates. These captured changes were used to post-process (adjust) the predictions in the previous step to address the issue that tree-based models could not predict the value that they did not see before. While the basic solution with careful feature selection gave us a place in the top-5, our post-processing strategy in the last step helped us win the 3rd prize in the competition.

2

Diversified gradient boosting ensembles for prediction of the cost of forwarding contracts

Ruta Dymitr, Liu Ming, Cen Ling, Vu Quang Hieu

Annals of Computer Science and Information Systems

|

2022

|

Vol. 30

431--436

EN

A common business practice for transportation forwarders is to bid for shipping contracts at the transport or freight exchanges. Based on the detailed contract requirements they try to estimate the total expected cost of its execution and accordingly bid with the fixed price in advance for delivering such shipping service at the prescribed specification and schedule. The capability to accurately predict the cost of contract execution is the critical factor deciding about the profitability of offered shipping services as well as the amount of business drawn from freight exchanges. However, given highly volatile nature of the transport services ecosystem, it is difficult to simultaneously account for countless dynamically changing factors like fuel prices, currency exchange rates, temporal and spatial multitude of routing and implied traffic risks, the properties of cargo and shipping vehicles etc., which leads to big cost under- or over-estimation resulting with loss-making contracts or equally painful missed revenue opportunities. In the context of FedCSIS 2022 data mining competition we propose an accurate and robust predictor of the cost of forwarding contracts built upon the detailed contract data using the ensemble of the state-of-the-art gradient boosting-based regression models. Our established feature engineering framework combined with deep parametric optimization of the individual models and multi-faceted diversification techniques guiding hybrid final model ensembles were instrumental to outperform all the competitive predictors and win the FedCSIS 2022 contest.

3

Deep Bi-Directional LSTM Networks for Device Workload Forecasting

Ruta Dymitr, Cen Ling, Vu Quang Hieu

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

115--118

EN

Deep convolutional neural networks revolutionized the area of automated objects detection from images. Can the same be achieved in the domain of time series forecasting? Can one build a universal deep network that once trained on the past would be able to deliver accurate predictions reaching deep into the future for any even most diverse time series? This work is a first step in an attempt to address such a challenge in the context of a FEDCSIS'2020 Competition dedicated to network device workload prediction based on their historical time series data. We have developed and pre-trained a universal 3-layer bi-directional Long-Short-Term-Memory (LSTM) regression network that reported the most accurate hourly predictions of the weekly workload time series from the thousands of different network devices with diverse shape and seasonality profiles. We will also show how intuitive human-led post-processing of the raw LSTM predictions could easily destroy the generalization abilities of such prediction model.

4

Greedy incremental support vector regression

Ruta Dymitr, Cen Ling, Vu Quang Hieu

Annals of Computer Science and Information Systems

|

2019

|

Vol. 18

7--9

EN

Support Vector Regression (SVR) is a powerful supervised machine learning model especially well suited to the normalized or binarized data. However, its quadratic complexity in the number of training examples eliminates it from training on large datasets, especially high dimensional with frequent retraining requirement. We propose a simple two-stage greedy selection of training data for SVR to maximize its validation set accuracy at the minimum number of training examples and illustrate the performance of such strategy in the context of Clash Royale Challenge 2019, concerned with efficient decks' win rate prediction. Hundreds of thousands of labelled data examples were reduced to hundreds, optimized SVR was trained on to maximize the validation R2 score. The proposed model scored the first place in the Cash Royale 2019 challenge, outperforming over hundred of competitive teams from around the world.

5

Efficient support vector regression with reduced training data

Cen Ling, Vu Quang Hieu, Ruta Dymitr

Annals of Computer Science and Information Systems

|

2019

|

Vol. 18

15--18

EN

Support Vector Regression (SVR) as a supervised machine learning algorithm have gained popularity in various fields. However, the quadratic complexity of the SVR in the number of training examples prevents it from many practical applications with large training datasets. This paper aims to explore efficient ways that maximize prediction accuracy of the SVR at the minimum number of training examples. For this purpose, a clustered greedy strategy and a Genetic Algorithm (GA) based approach are proposed for optimal subset selection. The performance of the developed methods has been illustrated in the context of Clash Royale Challenge 2019, concerned with decks' win rate prediction. The training dataset with 100,000 examples were reduced to hundreds, which were fed to SVR training to maximize model prediction performance measured in validation R2 score. Our approach achieved the second highest score among over hundred participating teams in this challenge.