Assessing the Ability of Self-Attention Networks to Learn Word Order

Yang, Baosong; Wang, Longyue; Wong, Derek F.; Chao, Lidia S.; Tu, Zhaopeng

Computer Science > Computation and Language

arXiv:1906.00592 (cs)

[Submitted on 3 Jun 2019]

Title:Assessing the Ability of Self-Attention Networks to Learn Word Order

Authors:Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu

View PDF

Abstract:Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when "lacking positional information" have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word reordering detection indeed has difficulty learning the positional information even with the position embedding; and 2) SAN trained on machine translation learns better positional information than its RNN counterpart, in which position embedding plays a critical role. Although recurrence structure make the model more universally-effective on learning word order, learning objectives matter more in the downstream tasks such as machine translation.

Comments:	ACL 2019
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:1906.00592 [cs.CL]
	(or arXiv:1906.00592v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.00592

Submission history

From: Baosong Yang [view email]
[v1] Mon, 3 Jun 2019 06:32:29 UTC (3,749 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 1906

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Baosong Yang
Longyue Wang
Derek F. Wong
Lidia S. Chao
Zhaopeng Tu

export BibTeX citation

Computer Science > Computation and Language

Title:Assessing the Ability of Self-Attention Networks to Learn Word Order

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Assessing the Ability of Self-Attention Networks to Learn Word Order

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators