Product detail

WTF-LOD Extractor

OTRUSINA, L. SMRŽ, P.

Product type

software

Abstract

This software creates the Web TextFull linkage to Linked Open Data (WTF-LOD) dataset intended for large-scale evaluation of named entity recognition (NER) systems from the largest publically-available textual corpora, including Wikipedia dumps, monthly runs of the CommonCrawl, and ClueWeb09/12. The software performs de-duplication of the data and advanced cleaning procedures.

Keywords

named entity evaluation, linked open data, CommonCrawl, ClueWeb, Wikipedia

Create date

31.12.2015

Location

http://www.fit.vutbr.cz/research/prod/index.php?id=480

www

Documents