Exploiting Corpora for Extracting and Describing Specialized Lexicon

Towards a Solid and Sustained Methodology

Authors

DOI:

https://doi.org/10.11606/issn.2236-4242.v33i1p69-104

Keywords:

Specialized Lexicon Extraction, Methodology, Corpora, Concordances, Collocations

Abstract

The use of corpora for specialized lexicon extraction is a common and consensual method for building lexical resources. However, the methodologies used to achieve this are not openly discussed, rendering the comparison and determination of robust approaches difficult. In order to fill in this gap, in this paper we present and discuss a detailed methodology for extracting specialized lexicon from corpus, combining linguistic and statistical approaches. The proposed method uses specialized and monitor corpora and comprises i) frequency information analyses; ii) concordances and collocations extraction; and iii) textual organization information; accounting for core single and multiword expressions and salient semantic relations extraction. This way, our goal is the determination of a solid and accurate list of potential specialized lexical units that will allow for a swifter final validation and for maximizing the informational value of the interaction with the experts.

Downloads

Download data is not yet available.

Published

2020-03-25

How to Cite

BARBERO, Chiara; AMARO, Raquel. Exploiting Corpora for Extracting and Describing Specialized Lexicon: Towards a Solid and Sustained Methodology. Linha D’Água, São Paulo, v. 33, n. 1, p. 69–104, 2020. DOI: 10.11606/issn.2236-4242.v33i1p69-104. Disponível em: https://periodicos.usp.br/linhadagua/article/view/165214.. Acesso em: 10 jun. 2024.

Funding data