Fake News Detection on Social Media: A Data Mining Perspective
Fake News detection 문제를 2가지 관점으로 본다
- Characterization
- Detection
Contribution
- Discuss narrow ans broad definitions of fake news that cover most existing definitions in the literature and further present the unique characteristics of fake news on social media and its implications compared with the traditional media
- Give an overview of existing fake news detection methods with a principled way to group representative methods into different categories
- Discuss several open issues and provide future directions of fake news detection in social media
Fake News Charaterization
- Introduce the basic social and psychological theories
- Discuss more advanced patterns introduced by social media
Definitions of Fake News
좁은 의미의 fake news 는
- 의도를 갖고 있고
- 검증되지 않고
- 읽는 사람들을 misleading 할 수 있는
news를 의미한다.
넓은 의미의 fake news 는 authenticity 또는 intent of news content 둘 중 하나에만 집중한다. 예) 풍자
Definition
이 논문에서 사용하는 정의는 아래와 같음
Fake news is a news article that is intentionally and verifiably false
위의 정의에 따르면 아래 5가지는 fake news가 아니다
- satire news with proper context
- rumors that did not orginate from news events
- conspiracy theories
- misinformation that is created unintentionally
- hoaxes that are only motivated by fun or to scam targeted individuals
Psychological Foundations of Fake News
Humans are naturally not very good at differentiating between real and fake news
There are two major factors which make consumers naturally vulnerable to fake news
- Naive Realism: consumer 들은 자신의 인식만이 유일한 견해라고 믿는 경향이 있고, 동의 하지 않는 사람들은 정보가 없거나 편향된 것이라고 믿음
- Confirmation Bias: consumer 들은 기존 자신의 의견을 확인하는 정보를 선호
Fake News on Social Media
Unique characteristics of fake news on social media
- Malicious Accounts on Social Media for Propaganda
social bots, cyborg users, and trolls - Echo Chamber Effect
Consumers are selectively exposed to certain kinds of news because of the way news feed appear on their homepage in social media
Fake News Detection
- problem definition
- propose approaches for fake news detection
Problem Definition
Notation:
-
: News Article: It consists of Publisher and Content
Publisher includes a set of profile features to describe the original author (name, domain, age, among other attributes)
Content consists of a set of attributes that represent the news article (headline, text, image, etc.) -
Define Social News Engagements as a set of tuples to represent the process of how news spread over time among users
posts on social media regarding news article
represents that a user spreads news article using at time
article 가 engagement가 없을 경우 , 는 publisher
Given the social news engagements among users for news article , the task of news detection is to predict whether the news article is a fake news piece or not, ,
Binary Classification problem
General data mining framework for fake news detection which includes two phases
- feature extraction
- model construction
Feature Extraction
- 기존 news media에서는 news content에 집중했다
- Social media에서는 다른 information이 추가 될 수 있다
News Content Features
source, headline, body text, image/video
- lexical features, including character level and word-level features such as total words, characters per word, frequency of large words, and unique words
- syntactic features, frequency of function words and phrases, BOW or POS tagging
- sensational or even fake images
Social Context Features
social context features can also be derived from the user-driven social engagements of news consumption on social media platform.
-
User-based
Fake news pieces are likely to be created and spread by non-human accounts, such as social bots or cyborgs.Individual level: registration age, number of followers/followees, number of tweets, etc.
group level: The assumption is that the spreaders of fake news and real news may form different communities with unique characteristics. averaging and weighting individual level features.
-
Post-based
Post level features: stance(supporting/denying), topic(LDA) and credibilityGroup level features: aggregate the features values for all relevant posts for specific news articles
Temporal level features: consider the temporal variations of post level features (RNN)
-
Network-based
features are extracted via constructing specific networks among the users who published related social media postsstance network nodes indicate all the tweets relevant to the news and edges indicate the weights of similarity of stances
co-occurrence network based on the user engagements by counting whether those users write posts relevant to the same news articles
friendship network indicates the following/followee structure of users who post related tweets
Model Construction
Discuss the details of the model construction process for several existing approaches, categorizing existing methods based on their main input sources as : News Content Models and Social Context Models
News Content Models
- Knowledge-based Use external sources to fact-check
- Style-based Detect fake news by capturing the manipulators in the writing style of news content
Social Context Models
- Stance-based Utilize users’ viewpoints from relevant post contents to infer the veracity of original news articles
- Propagation-based The basic assumption is that the credibility of a news event is highly related to the credibilities of relevant social media posts
Datasets
-
BuzzFeedNews
Sample of news published in Facebook from 9 news agencies over a week close to the 201 U.S. election from September 19 to 23 and September 26 and 27 -
LIAR
No link. PolitiFact API를 이용해 수집한 데이터 -
BS Detector
Collected from a browser extension developed for checking news veracity GitHub -
CREDBANK
This is a large scale crowd sourced dataset of approximately 60 million tweets that cover 96 days. All the tweets are broken down to be related to over 1,000 news envets, with each event assessed for credibilities by 30 annotators from Amazon Mechanical TrukCREDBANK was originally collected for tweet credibility assessment, so the tweets in this dataset are not really the social engagements for specific news articles
-
FakeNewsNet
This dataset includes all mentioned news content and social context features with reliable ground truth fake news labels (presented from this paper)꽤 좋은 dataset. 한번 사용해보는 것도 괜찮을 듯 하다
실험에 대한 결과 없이 논문이 그냥 끝나버리네…
'Paper Review' 카테고리의 다른 글
BiDAF 리뷰 및 기록 (0) | 2018.11.02 |
---|---|
Enriching Word Vectors with Subword Information (0) | 2018.03.22 |
Joint Many-Task(JMT) Model 관련 paper 리뷰 (0) | 2017.09.07 |
간단한 Softmax Regression (0) | 2017.04.17 |
간단한 Logistic Regression (0) | 2017.04.17 |