Product detail

BUT Recognizer of GeoNames

OTRUSINA, L. SMRŽ, P. SZNAPKA, J. ŠAFÁŘ, M.

Product type

software

Abstract

There are many aspects and objectives we had in mind when designing the new NER. First, it should avoid the performance bottlenecks common for the webbased APIs such as OpenCalais or AlchemyAPI. Second, it needs to achieve excellent precision and recall for geographical features, especially for places in Europe. Finally, the tool should perform disambiguation and normalization alongside the recognition process. To meet the second objective, we utilized the Geonames.org data exported from the database, which contains over 10 millions geographical features. Efficiency is brought by the finite state automaton (FSA) technology that can deal with the huge lists of names and is very fast in searching the input texts. We employed an efficient algorithm for constructing the minimal FSA described in Daciuk et al. (1998). A freely available package provided by the first author of the paper allows building a minimal FSA from a list of pre-defined keywords. The resulting representation for all the relevant data from GeoNames (originally over 1.1 GB) takes only 71MB and the processing is extremely fast.

Keywords

name entitiy recognition, geonames.org, finite state automaton

Create date

15. 12. 2011

Location

www.fit.vutbr.cz/~iotrusina/BURGeoN-0.1.tar.gz

Possibilities of use

K využití výsledku jiným subjektem je vždy nutné nabytí licence

Licence fee

Poskytovatel licence na výsledek nepožaduje licenční poplatek

www