Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!

Znaleziono wyników: 5

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  web scraping
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
In the paper, the authors present the outcome of web scraping software allowing for the automated classification of threats and crisis events detection. In order to improve the safety and comfort of human life, an analysis was made to quickly detect threats using a modern information channel such as social media. For this purpose, social media services that are popular in the examined region were reviewed and the appropriate ones were selected using the criteria of accessibility and popularity. Approximately 300 unique posts from local groups of cities and other administrative centers were collected and analyzed. The decision of which entry was classified as a threat was defined using the ChatGPT tool and the human expert. Both variants were tested using machine learning (ML) methods. The paper tested whether the ChatGPT tool would be effective at detecting presumed events and compared this approach to the classic ML approach.
PL
W artykule autorzy przedstawiają wyniki prac nad oprogramowaniem web scrapingowym pozwalającym na zautomatyzowaną klasyfikację zagrożeń i wykrywanie zdarzeń kryzysowych. W celu poprawy bezpieczeństwa i komfortu życia ludzi przeprowadzono analizę szybkiego wykrywania zagrożeń z wykorzystaniem nowoczesnego kanału informacyjnego jakim są media społecznościowe. W tym celu dokonano przeglądu popularnych w badanym regionie serwisów społecznościowych i wybrano odpowiednie, kierując się kryteriami dostępności i popularności. Zebrano i przeanalizowano około 300 unikalnych postów z lokalnych grup miast i innych ośrodków administracyjnych. Decyzja o tym, który wpis został sklasyfikowany jako zagrożenie, została określona przy użyciu narzędzia ChatGpt oraz przy udziale osoby (eksperta). Oba warianty zostały przetestowane przy użyciu metod uczenia maszynowego (ML). Dodatkowo, w artykule sprawdzono, czy narzędzie ChatGpt będzie skuteczne w wykrywaniu domniemanych zdarzeń i porównano to rozwiązanie z klasycznym podejściem ML, gdzie dane uczące etykietowano przy udziale ekspretra.
EN
The main aim of this paper is to evaluate crawlers collecting the job offers from websites. In particular the research is focused on checking the effectiveness of ensemble machine learning methods for the validity of extracted position from the job ads. Moreover, in order to significantly reduce the training time of the algorithms (Random Forests and XGBoost), granularity methods were also tested to significantly reduce the input training dataset. Both methods achieved satisfactory results in accuracy and F1 measures, which exceeded 96%. In addition, granulation reduced the input dataset by more than 99%, and the results obtained were only slightly worse (accuracy between 1% and 5%, F1 between 3% and 8%). Thus, it can be concluded that the considered methods can be used in the evaluation of job web crawlers.
EN
Purpose: The first objective of this article was an attempt at identifying the major differences between such terms as public relations (PR), digital public relations (DPR) and digital marketing (DM). The second objective was to employ selected web data scraping techniques to analyse DPR of service providers installing photovoltaic systems. Design/methodology/approach: The first objective of this article was achieved by analysing reference works. To achieve the second objective, the author used MS Excel, web scraping and proprietary computer scripts in R and Python. In this way, selected details were obtained from the companies catalogue at panoramafirm.pl and Google search engine, and then the received results were compared and analysed. What is more, the results from Google search engine were obtained and analysed for 964 towns and cities entered in the engine with the “photovoltaics” phrase. Findings: 50 thousand URLs were obtained and 1,755 unique website domain addresses were extracted. Analysing the content of websites at the obtained Internet domains, 6 major categories of websites were identified, which appeared in the first 10 search results for the photovoltaic-related queries. These are: Company Websites (CW), Blog Websites (BW), Announcement Services (AS), SEO Landing Pages (SLP), Public Announcement Pages (PAP) and Social Media Page (SMP). Each of these categories is characterised briefly and a few examples are provided for each of them. Research limitations/implications: The limitations of this article include the focus on one companies catalogue, i.e., panoramafirm.pl, and the results from Google search engine solely for the Polish language. Moreover, only the results of the first 10 links from Google engine for the single “photovoltaics” phrase and town/city name were taken into consideration. Originality/value: This article has a theoretical and practical value. The analysis allowed to identify six categories of websites, which may be analysed with respect to digital public relations in the area of photovoltaic system installation. The most important of them are the websites belonging to the Company Website (CW) and Social Media Page (SMP) types. This article is addressed to anyone interested in obtaining data from the Internet using web scraping technique and data analysis in the area of digital public relations (DPR).
4
EN
The objective of this research is to describe a system to aligned the hard and soft skills of the applicant to the current labor market. For this, a system was implemented which uses Web Scraping to get a general profile of an area, meanwhile for the evaluation of the applicant soft skills is used a Test Cleaver and for the hard skills fuzzy inference system is implemented. Therefore, the data is entered into an Analytic Hierarchy Process, with this, the applier is able to see which area is better to improve according to the hard and soft skills.
5
Content available remote Algorytm wykrywania treści na stronach portali internetowych
PL
W artykule przedstawiono podejście wykorzystane podczas projektowania i implementowania algorytmu automatycznego wykrywania treści na stronach portali internetowych oparte o analizę struktury kodu HTML strony WWW. Za treść strony uznano tekst artykułów wraz z jego nagłówkiem, z pominięciem innych tekstów występujących na stronie (menu, reklamy, komentarze, podpisy pod zdjęciami, itp.).
EN
The paper shows steps, made during designing and implementing automatic web pages contents recognition algorithm, based on HTML structure analysis. A web page contents is the article text with its headline, without any other text like menu, advertisements, user’s comments, image captions, etc.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.