Entity-centric implicit network data set extracted from a set of English
news articles from the outlets CNN, LA Times, NY Times, USA Today, CBS News,
The Washington Post, IBTimes, BBC, The Independent, Reuters, SkyNews,
The Telegraph, The Guardian, and Sidney Morning Herald with a focus on political
news between June 1, 2016 and November 30, 2016.

The network is constructed from 127,485 articles and contains 27.7k locations,
72.0k actors, 19.6k organizations, and 329k terms, which are connected by 10.6M
distinct edges (i.e., if parallel edges are aggregated).

This data was used for the extraction of entity-centric network topics
as described in the publication (for details, see the provided link):

Andreas Spitz and Michael Gertz.
"Entity-centric Topic Extraction and Exploration: A Network-based Approach"
ECIR, Grenoble, France, March 26-29, 2018
https://dbs.ifi.uni-heidelberg.de/resources/nwtopics/


FILE CONTENTS:

The data is distributed over three files
news_articles.tsv
news_edgelist.tsv
news_nodelist.tsv

All files are formatted as tab-separated plain text files.


news_articles.tsv
<id> <outlet> <date> <url> <title>

<id>     integer ID of the article in the data set
<outlet> label of the news outlet
<date>   publication date of the article
<url>    URL of the article
<title>  plain text title of the article


news_edgelist.tsv
<id1> <id2> <type1> <type2> <date> <articleID> <outlet> <senDist>

<id1>       integer ID of the first entity of the edge
<id2>       integer ID of the second entity of the edge
<type1>     entity type of the first entity {L, A, O, T}
<type2>     entity type of the first entity {L, A, O, T}
<date>      date of publication of the article in which this edge occurs
<articleID> integer ID of the article in which this edge occurs
<outlet>    news outlet of the article in which this edge occurs
<senDist>   distance (in sentences) between the two entities

Note that edges are undirected and stored in only one direction
(that is, the lower of the two IDs is always the first component)


news_nodelist.tsv
<id> <type> <wiki> <label>

<id>    integer ID of the entity
<type>  entity type of the entity {L, A, O, T}
<wiki>  Wikidata ID of the entitiy (empty if type = T)
<label> label of the entity



