PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Cluo: web-scale text mining system for open source intelligence purposes

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The amount of textual information published on the Internet is considered to be in billions of web pages, blog posts, comments, social media updates and others. Analyzing such quantities of data requires high level of distribution – both data and computing. This is especially true in case of complex algorithms, often used in text mining tasks. The paper presents a prototype implementation of CLUO – an Open Source Intelligence (OSINT) system, which extracts and analyzes significant quantities of openly available information.
Wydawca
Czasopismo
Rocznik
Strony
45--62
Opis fizyczny
Bibliogr. 21 poz., rys., tab.
Twórcy
autor
  • Luminis Research Sp.z o.o., Rzeszów, Poland
  • AGH University of Science and Technology, Krakow, Poland
  • AGH University of Science and Technology, Krakow, Poland
Bibliografia
  • [1] NATO Open Source Intelligence Handbook. NATO, 2001.
  • [2] NATO Intelligence Exploitation of the Internet. NATO, 2002.
  • [3] National Defense Authorization Act for Fiscal Year 2006. 2006.
  • [4] Berger A. L., Pietra V. J. D., Pietra S. A. D.: A maximum entropy approach to natural language processing. Comput. Linguist., 22(1):39–71, March 1996.
  • [5] Cover T., Thomas J.: Elements of Information Theory. Wiley, 1991.
  • [6] Damianos L. E., Ponte J. M., Wohlever S., Reeder F., Wilson D. G., Hirschman L.: Mitap, text and audio processing for bio-security: A case study. In National Conference on Artificial Intelligence, pp. 807–814, 2002.
  • [7] Dean J., Ghemawat S.: Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, January 2008.
  • [8] Dean J., Ghemawat S.: Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107–113, January 2008.
  • [9] Fellbaum C.: WordNet – An Electronic Lexical Database. The MIT Press, 1998.
  • [10] Fielding R. T.: Architectural styles and the design of network-based software architectures. PhD thesis, 2000.
  • [11] Jurafsky D., Martin J. H.: Speech and Language Processing Prentice Hall, 2 ed., 2008.
  • [12] Leskovec J., Backstrom L., Kleinberg J.: Meme-tracking and the dynamics of the news cycle. In Proc. of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, KDD ’09, pp. 497–506, New York, NY, USA, 2009. ACM.
  • [13] Lubaszewski W., Gajęcki M.: Automatic extraction of semantic association from polish text. Computer Science, 4:119–130, 2002.
  • [14] Maciolek P., Dobrowolski G.: Is shallow semantic analysis really that shallow? a study on improving text classification performance. In IMCSIT, pp. 455–460, 2010.
  • [15] Manning C., Raghavan P., Schutze H.: Introduction to Information Retrieval. Cambridge University Press, 1 ed., 2008.
  • [16] Maziarz M., Piasecki M., Szpakowicz S.: Approaching plWordNet 2.0. In Proc. of the 6th Global Wordnet Conference, Matsue, Japan, January 2012.
  • [17] Piasecki M., Szpakowicz S., Broda B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wroclawskiej, Wroclaw, 2009.
  • [18] Porter M. F.: An algorithm for suffix stripping. Program, 1980.
  • [19] Przepiorkowski A., Bańko M., Gorski R. L., Lewandowska-Tomaszczyk B., eds. Narodowy Korpus Języka Polskiego [Eng.: National Corpus of Polish]. Wydawnictwo Naukowe PWN, Warsaw, 2012.
  • [20] Toutanova K., Klein D., Manning C., Singer Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. of HLT-NAACL 2003, 2003.
  • [21] Toutanova K., Manning C. D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 2000.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-2f1552a4-d79d-47c6-8ef4-96e85296406e
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.