Show simple item record

dc.contributor.authorPhilip, Jijo Johnen_US
dc.date.accessioned2013-03-20T19:12:05Z
dc.date.available2013-03-20T19:12:05Z
dc.date.issued2013-03-20
dc.date.submittedJanuary 2012en_US
dc.identifier.otherDISS-11964en_US
dc.identifier.urihttp://hdl.handle.net/10106/11580
dc.description.abstractWe are surrounded by data in various forms such as instant messages, Twitter tweets, Facebook status updates, news, media, blogs and much more. Extracting meaning from such a massive collection of unstructured data would lead to interesting stories. Examples of such stories can be ``\emph{Who was the most popular actor in a particular month}''or ``\emph{Which diseases were people most concerned about in year 2008}''. In this thesis, we propose to discover popular entities mentioned in blog articles based on the concept of prominent streak. Given a sequence of values for a named entity (e.g., a person, a place, etc.), where each value is the occurrence frequency of the entity in blog articles during a corresponding period of time, a prominent streak is a long consecutive subsequence of only large (small) values. Whether a streak is prominent also depends on how it fares against streaks for comparable entities. Using the distributed data processing framework Mapreduce, particularly Hadoop which is one of its open-source implementations, we find entity occurrences in a set of blog articles with a trie-based data structure. Prominent streak discovery algorithms are applied over the detected sequences of entities occurrences to derive interesting stories. Our experiments and evaluation are done over the ICWSM'09 Spinn3r blog dataset, which contains over 44 million blog articles for the months of August and September in 2008.en_US
dc.description.sponsorshipLi, Chengkaien_US
dc.language.isoenen_US
dc.publisherComputer Science & Engineeringen_US
dc.titleProminent Streaks Discovery On Blog Articlesen_US
dc.typeM.S.en_US
dc.contributor.committeeChairLi, Chengkaien_US
dc.degree.departmentComputer Science & Engineeringen_US
dc.degree.disciplineComputer Science & Engineeringen_US
dc.degree.grantorUniversity of Texas at Arlingtonen_US
dc.degree.levelmastersen_US
dc.degree.nameM.S.en_US


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record