Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 48

Liczba wyników na stronie
first rewind previous Strona / 3 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  reinforcement learning
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 3 next fast forward last
EN
Feature Selection (FS) is an essential research topic in the area of machine learning. FS, which is the process of identifying the relevant features and removing the irrelevant and redundant ones, is meant to deal with the high dimensionality problem for the sake of selecting the best performing feature subset. In the literature, many feature selection techniques approach the task as a research problem, where each state in the search space is a possible feature subset. In this paper, we introduce a new feature selection method based on reinforcement learning. First, decision tree branches are used to traverse the search space. Second, a transition similarity measure is proposed so as to ensure exploit-explore trade-off. Finally, the informative features are the most involved ones in constructing the best branches. The performance of the proposed approaches is evaluated on nine standard benchmark datasets. The results using the AUC score show the effectiveness of the proposed system.
EN
Energy saving has always been a concern in production scheduling, especially in distributed hybrid flow shop scheduling problems. This study proposes a shuffled frog leaping algorithm with Q-learning (QSFLA) to solve distributed hybrid flow shop scheduling problems with energy-saving(DEHFSP) for minimizing the maximum completion time and total energy consumption simultaneously. The mathematical model is provided, and the lower bounds of two optimization objectives are given and proved. A Q-learning process is embedded in the memeplex search of QSFLA. The state of the population is calculated based on the lower bound. Sixteen search strategy combinations are designed according to the four kinds of global search and four kinds of neighborhood structure. One combination is selected to be used in the memeplex search according to the population state. An energy-saving operator is presented to reduce total energy consumption without increasing the processing time. One hundred forty instances with different scales are tested, and the computational results show that QSFLA is a very competitive algorithm for solving DEHFSP.
EN
In promoting the construction of prefabricated residential buildings in Yunnan villages and towns, the use of precast concrete elements is unstoppable. Due to the dense arrangement of steel bars at the joints of precast concrete elements, collisions are prone to occur, which can affect the stress of the components and even pose certain safety hazards for the entire construction project. Because the commonly used the steel bar obstacle avoidance method based on building information modeling has low adaptation rate and cannot change the trajectory of the steel bar to avoid collision, a multi-agent reinforcement learning-based model integrating building information modeling is proposed to solve the steel bar collision in reinforced concrete frame. The experimental results show that the probability of obstacle avoidance of the proposed model in three typical beam-column joints is 98.45%, 98.62% and 98.39% respectively, which is 5.16%, 12.81% and 17.50% higher than that of the building information modeling. In the collision-free path design of the same object, the research on the path design of different types of precast concrete elements takes about 3–4 minutes, which is far less than the time spent by experienced structural engineers on collision-free path modeling. The experimental results indicate that the model constructed by the research institute has good performance and has certain reference significance.
PL
Uczenie przez wzmacnianie ma coraz większe znaczenie w sterowaniu robotami, a symulacja odgrywa w tym procesie kluczową rolę. W obszarze bezzałogowych statków powietrznych (BSP, w tym dronów) obserwujemy wzrost liczby publikowanych prac naukowych zajmujących się tym zagadnieniem i wykorzystujących wspomniane podejście. W artykule omówiono opracowany system autonomicznego sterowania dronem, który ma za zadanie lecieć w zadanym kierunku (zgodnie z przyjętym układem odniesienia) i omijać napotykane w lesie drzewa na podstawie odczytów z obrotowego sensora LiDAR. Do jego przygotowania wykorzystano algorytm Proximal Policy Optimization (PPO), stanowiący przykład uczenia przez wzmacnianie (ang. reinforcement learning, RL). Do realizacji tego celu opracowano własny symulator w języku Python. Przy testach uzyskanego algorytmu sterowania wykorzystano również środowisko Gazebo, zintegrowane z Robot Operating System (ROS). Rozwiązanie zaimplementowano w układzie eGPU Nvidia Jetson Nano i przeprowadzono testy w rzeczywistości. Podczas nich dron skutecznie zrealizował postawione zadania i był w stanie w powtarzalny sposób omijać drzewa podczas przelotu przez las.
EN
Reinforcement learning is of increasing importance in the field of robot control and simulation plays a key role in this process. In the unmanned aerial vehicles (UAVs, drones), there is also an increase in the number of published scientific papers involving this approach. In this work, an autonomous drone control system was prepared to fly forward (according to its coordinates system) and pass the trees encountered in the forest based on the data from a rotating LiDAR sensor. The Proximal Policy Optimization (PPO) algorithm, an example of reinforcement learning (RL), was used to prepare it. A custom simulator in the Python language was developed for this purpose. The Gazebo environment, integrated with the Robot Operating System (ROS), was also used to test the resulting control algorithm. Finally, the prepared solution was implemented in the Nvidia Jetson Nano eGPU and verified in the real tests scenarios. During them, the drone successfully completed the set task and was able to repeatable avoid trees and fly through the forest.
EN
The aim of this study is to use the reinforcement learning method in order to generate a complementary signal for enhancing the performance of the system stabilizer. The reinforcement learning is one of the important branches of machine learning on the area of artificial intelligence and a general approach for solving the Marcov Decision Process (MDP) problems. In this paper, a reinforcement learning-based control method, named Q-learning, is presented and used to improve the performance of a 3-Band Power System Stabilizer (PSS3B) in a single-machine power system. For this end, we first set the parameters of the 3-band power system stabilizer by optimizing the eigenvalue-based objective function using the new optimization KH algorithm, and then its efficiency is improved using the proposed reinforcement learning algorithm based on the Q-learning method in real time. One of the fundamental features of the proposed reinforcement learning-based stabilizer is its simplicity and independence on the system model and changes in the working points of operation. To evaluate the efficiency of the proposed reinforcement learning-based 3-band power system stabilizer, its results are compared with the conventional power system stabilizer and the 3-band power system stabilizer designed by the use of the KH algorithm under different working points. The simulation results based on the performance indicators show that the power system stabilizer proposed in this study underperform the two other methods in terms of decrease in settling time and damping of low frequency oscillations.
EN
This paper presents a complete simulation and reinforce‐ ment learning solution to train mobile agents’ strategy of route tracking and avoiding mutual collisions. The aim was to achieve such functionality with limited resources, w.r.t. model input and model size itself. The designed models prove to keep agents safely on the track. Colli‐ sion avoidance agent’s skills developed in the course of model training are primitive but rational. Small size of the model allows fast training with limited computational resources.
PL
Niniejszy artykuł przedstawia sposób adaptacji szybkości transmisji danych do warunków panujących w sieci z użyciem uczenia maszynowego. Zaproponowane rozwiązanie rozszerza działanie istniejącego algorytmu dla przypadku, kiedy stacja odbiorcza znajduje się poza zasięgiem stacji nadawczej. Wartości przepustowości uzyskiwane z użyciem zaproponowanego rozwiązania są porównywalne z wynikami uzyskiwanymi dla tradycyjnych algorytmów Minstrel i CARA.
EN
This paper describes how to adapt data transmission rates to the varying network conditions using machine learning. The proposed algorithm is based on an earlier state-of-the-art solution and extends its operation for the case when the receiver is outside the range of the transmitter. The throughput values obtained with the use of the proposed algorithm are comparable to the results obtained with the traditional Minstrel and CARA algorithms.
PL
Dynamiczne wygaszanie punktów transmisyjnych to jedna z technik skoordynowanej transmisji z wielu punktów transmisyjnych, w której niektóre stacje bazowe mogą być tymczasowo wyciszone, np. w celu poprawy przepustowości użytkowników na obrzeżach komórki. W niniejszym artykule zaproponowano wykorzystanie głębokiego uczenia ze wzmocnieniem do wyboru schematu wygaszania, który zwiększa przepływność użytkowników znajdujących się na skraju komórki. Zaproponowany algorytm wykorzystuje sieć neuronową do wyboru schematu wygaszania na podstawie lokalizacji użytkowników. Badania symulacyjne wykazały, że dzięki zaproponowanemu rozwiązaniu przepływność użytkownika na krawędzi komórki możne wzrosnąć około 14,14 razy.
EN
Dynamic Point Blanking (DPB) is one of the Coordinated MultiPoint (CoMP) techniques, where some Base Stations (BSs) can be temporarily muted, e.g., to improve the cell-edge users throughput. In this paper, it is proposed to obtain the muting pattern that improves cell-edge users throughput with the use of Deep Reinforcement Learning technique. The proposed algorithm utilizes deep neural network to select muting pattern on the basis of user locations. Simulation studies have shown that cell-edge user throughput can be improved by the ratio of about 14.14, while using the proposed algorithm.
PL
W pracy zaproponowano algorytm automatycznego doboru kąta pochylenia anten stosowanych w systemach komórkowych oraz oceniono jego działanie. Wypracowano sposób doboru pochylenia anten z uwzględnieniem stanu środowiska, pokrycia sygnałem radiowym, pojemności sieci oraz interferencji międzykomórkowych. Zebrane wyniki pozwoliły na oszacowanie istotności ustawienia pochylenia anten, aby zapewnić odpowiednią jakość usług.
EN
The purpose of this thesis is to propose an algorithm for the automatic antenna tilt selection in cellular networks and evaluation of its performance. A method of adjusting the antenna inclination angle was developed, taking into account the state of the environment, coverage, system capacity, and inter-cell interference. Collected results allowed for conducting the importance of the proper setting of the antenna inclination angle to ensure appropriate quality of service.
EN
This paper presents an overview of four selected solutions addressing problem of bidding in card game of contract bridge. In the beginning the basic rules are presented along with basic problem size estimation. Brief description of collected work is presented in chronological order, tracking evolution of approaches to the problem. While presenting solution a short description of mathematical base is attached. In the end a comparison of solution is made, followed by an attempt to estimate future development of techniques.
PL
Artykuł przedstawia cztery wybrane podejścia do rozgrywania licytacji w brydżu. W części pierwszej przybliżane są zasady brydża, stanu wiedzy na jego temat oraz krótkie oszacowanie poziomu komplikacji problemu. W części zasadniczej przedstawiono krótkie opisy podejść badaczy do problemu licytacji, badania przedstawione są w kolejności chronologicznej, ukazując ewolucję podejść do problemu. W trakcie opisywania rozwiązań, przybliżane są po krótce matematyczne zasady działania wykorzystanych mechanizmów uczenia maszynowego. Część końcowa podsumowuje przedstawione porównanie rozwiązań i oszacowanie kierunku przyszłego rozwoju.
11
Content available remote Towards automatic facility layout design using reinforcement learning
EN
The accuracy and perfection of layout designing significantly depend on the designer's ability. Quick and near-optimal designs are very difficult to create. In this study, we proposed an automatic design mechanism that can more easily design layouts for various unit groups and sites using reinforcement learning. Accordingly, we devised a mechanism to deploy units to be able to fill the largest rectangular space in the current site. We aim to successfully deploy given units within a given site by filling a part of the site. We apply the mechanism to the three sets of units in benchmark problems. The performance was evaluated by changing the learning parameters and iteration count. Consequently, it was possible to produce a layout that successfully deployed units within a given one-floor site.
EN
In this paper, a new reinforcement learning intrusion detection system is developed for IoT networks incorporated with WSNs. A research is carried out and the proposed model RL-IDS plot is shown, where the detection rate is improved. The outcome shows a decrease in false alarm rates and is compared with the current methodologies. Computational analysis is performed, and then the results are compared with the current methodologies, i.e. distributed denial of service (DDoS) attack. The performance of the network is estimated based on security and other metrics.
PL
Algorytmy uczenia ze wzmocnieniem zyskują coraz większą popularność, a ich rozwój jest możliwy dzięki istnieniu narzędzi umożliwiających ich badanie. Niniejszy artykuł dotyczy możliwości zastosowania algorytmów uczenia maszynowego na platformie Unity wykorzystującej bibliotekę Unity ML-Agents Toolkit. Celem badania było porównanie dwóch algorytmów: Proximal Policy Optimization oraz Soft Actor-Critic. Zweryfikowano również możliwość poprawy wyników uczenia poprzez łączenie tych algorytmów z metodą uczenia przez naśladowanie Generative Adversarial Imitation Learning. Wyniki badania wykazały, że algorytm PPO może sprawdzić się lepiej w nieskomplikowanych środowiskach o nienatychmiastowym charakterze nagród, zaś dodatkowe zastosowanie GAIL może wpłynąć na poprawę skuteczności uczenia.
EN
Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.
EN
Beamforming training (BT) is considered as an essential process to accomplish the communications in the millimeter wave (mmWave) band, i.e., 30 ~ 300 GHz. This process aims to find out the best transmit/receive antenna beams to compensate the impairments of the mmWave channel and successfully establish the mmWave link. Typically, the mmWave BT process is highly-time consuming affecting the overall throughput and energy consumption of the mmWave link establishment. In this paper, a machine learning (ML) approach, specifically reinforcement learning (RL), is utilized for enabling the mmWave BT process by modeling it as a multi-armed bandit (MAB) problem with the aim of maximizing the long-term throughput of the constructed mmWave link. Based on this formulation, MAB algorithms such as upper confidence bound (UCB), Thompson sampling (TS), epsilon-greedy (e-greedy), are utilized to address the problem and accomplish the mmWave BT process. Numerical simulations confirm the superior performance of the proposed MAB approach over the existing mmWave BT techniques.
EN
This paper presents how Q-learning algorithm can be applied as a general-purpose self-improving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
16
Content available remote Mobile robots interacting with obstacles control based on artificial intelligence
EN
In this paper, research on the applications of artificial intelligence in implementing Deep Deterministic Policy Gradient (DDPG) on Gazebo model and the reality of mobile robot has been studied and applied. The goal of the experimental studies is to navigate the mobile robot to learn the best possible action to move in real-world environments when facing fixed and mobile obstacles. When the robot moves in an environment with obstacles, the robot will automatically control to avoid these obstacles. Then, the more time that can be maintained within a specific limit, the more rewards are accumulated and therefore better results will be achieved. The authors performed various tests with many transform parameters and proved that the DDPG algorithm is more efficient than algorithms like Q-learning, Machine learning, deep Q-network, etc. Then execute SLAM to recognize the robot positions, and virtual maps are precisely built and displayed in Rviz. The research results will be the basis for the design and construction of control algorithms for mobile robots and industrial robots applied in programming techniques and industrial factory automation control.
PL
Sieci 5G zapewniają wzrost efektywności widmowej m.in. poprzez heterogeniczną strukturę oraz wykorzystanie dużych macierzy antenowych. Te technologie wymagają użycia dużej liczby układów elektronicznych, co zwiększa zużycie energii. W pracy zaprezentowano algorytm tzw. uczenia ze wzmocnieniem, który używa mapy usług radiowych w celu wyboru zestawu aktywnych stacji bazowych poprawiając efektywność energetyczną (EE) sieci. Algorytm porównano z metodą konwencjonalną w symulatorze systemu 5G używając metody śledzenia promieni do generacji współczynników kanału radiowego.
EN
The 5G networks increase spectral efficiency by using, e.g., heterogenous structure and large antenna arrays. These require more hardware to be used, increasing energy consumption. This paper proposes a reinforcement learning-based algorithm utilizing radio service maps for optimization of the active base station set that increases energy efficiency. The proposed algorithm and a conventional solution are evaluated using a 5G network simulator. The 3D ray tracing technology is utilized to generate radio channel coefficients.
EN
The paper summarizes our efforts to develop a spike timing neural network model of dynamic visual information processing and decision making inspired by the available knowledge about how the human brain performs this complicated task. It consists of multiple layers with functionality corresponding to the main visual information processing structures starting from the early level of the visual system up to the areas responsible for decision making based on accumulated sensory evidence as well as the basal ganglia modulation due to the feedback from the environment. In the present work, we investigated age-related changes in the spike timing dependent plastic synapses of the model as a result of reinforcement learning.
EN
Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCOalgorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.
EN
Compared with the robots, humans can learn to perform various contact tasks in unstructured environments by modulating arm impedance characteristics. In this article, we consider endowing this compliant ability to the industrial robots to effectively learn to perform repetitive force-sensitive tasks. Current learning impedance control methods usually suffer from inefficiency. This paper establishes an efficient variable impedance control method. To improve the learning efficiency, we employ the probabilistic Gaussian process model as the transition dynamics of the system for internal simulation, permitting long-term inference and planning in a Bayesian manner. Then, the optimal impedance regulation strategy is searched using a model-based reinforcement learning algorithm. The effectiveness and efficiency of the proposed method are verified through force control tasks using a 6-DoFs Reinovo industrial manipulator.
first rewind previous Strona / 3 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.