Share this page:

A multi-task learning approach to adapting bilingual word embeddings for cross-lingual named entity recognition

Dingquan Wang, Nanyun Peng, and Kevin Duh, in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2017.

Download the full text


Abstract


Bib Entry

@inproceedings{wang2017multi,
  title = {A multi-task learning approach to adapting bilingual word embeddings for cross-lingual named entity recognition},
  author = {Wang, Dingquan and Peng, Nanyun and Duh, Kevin},
  booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  pages = {383--388},
  year = {2017}
}

Related Publications

  1. What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

    Xiaolei Huang, Jonathan May, and Nanyun Peng, in 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), short, 2019.
    Full Text BibTeX Details
    @inproceedings{huang2019matters,
      title = {What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis},
      author = {Huang, Xiaolei and May, Jonathan and Peng, Nanyun},
      booktitle = {2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), short},
      year = {2019}
    }
    
    Details
  2. Learning A Unified Named Entity Tagger From Multiple Partially Annotated Corpora For Efficient Adaptation

    Xiao Huang, Li Dong, Elizabeth Boschee, and Nanyun Peng, in The 2019 SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2019.
    Full Text Code Abstract BibTeX Details
    Named entity recognition (NER) identifies typed entity mentions in raw text. While the task is well-established, there is no universally used tagset: often, datasets are annotated for use in downstream applications and accordingly only cover a small set of entity types relevant to a particular task. For instance, in the biomedical domain, one corpus might annotate genes, another chemicals, and another diseases—despite the texts in each corpus containing references to all three types of entities. In this paper, we propose a deep structured model to integrate these “partially annotated” datasets to jointly identify all entity types appearing in the training corpora. By leveraging multiple datasets, the model can learn robust input representations; by building a joint structured model, it avoids potential conflicts caused by combining several models’ predictions at test time. Experiments show that the proposed model significantly outperforms strong multi-task learning baselines when training on multiple, partially annotated datasets and testing on datasets that contain tags from more than one of the training corpora
    @inproceedings{huang2019learning,
      title = {Learning A Unified Named Entity Tagger From Multiple Partially Annotated Corpora For Efficient Adaptation},
      author = {Huang, Xiao and Dong, Li and Boschee, Elizabeth and Peng, Nanyun},
      booktitle = {The 2019 SIGNLL Conference on Computational Natural Language Learning (CoNLL)},
      year = {2019}
    }
    
    Details
  3. Multi-task multi-domain representation learning for sequence tagging

    Nanyun Peng and Mark Dredze, in Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017.
    Full Text BibTeX Details
    @inproceedings{peng2017multi,
      title = {Multi-task multi-domain representation learning for sequence tagging},
      author = {Peng, Nanyun and Dredze, Mark},
      booktitle = {Proceedings of the 2nd Workshop on Representation Learning for NLP},
      year = {2017}
    }
    
    Details
  4. A multi-task learning approach to adapting bilingual word embeddings for cross-lingual named entity recognition

    Dingquan Wang, Nanyun Peng, and Kevin Duh, in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2017.
    Full Text BibTeX Details
    @inproceedings{wang2017multi,
      title = {A multi-task learning approach to adapting bilingual word embeddings for cross-lingual named entity recognition},
      author = {Wang, Dingquan and Peng, Nanyun and Duh, Kevin},
      booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
      pages = {383--388},
      year = {2017}
    }
    
    Details
  5. Improving named entity recognition for chinese social media with word segmentation representation learning

    Nanyun Peng and Mark Dredze, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016.
    Full Text BibTeX Details
    @inproceedings{peng2016improving,
      title = {Improving named entity recognition for chinese social media with word segmentation representation learning},
      author = {Peng, Nanyun and Dredze, Mark},
      booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics},
      year = {2016}
    }
    
    Details
  6. An Empirical Study of Chinese Name Matching and Applications

    Nanyun Peng, Mo Yu, and Mark Dredze, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), 2015.
    BibTeX Details
    @inproceedings{peng2015empirical,
      title = {An Empirical Study of Chinese Name Matching and Applications},
      author = {Peng, Nanyun and Yu, Mo and Dredze, Mark},
      booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL)},
      year = {2015}
    }
    
    Details
  7. Named entity recognition for chinese social media with jointly trained embeddings

    Nanyun Peng and Mark Dredze, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
    Full Text BibTeX Details
    @inproceedings{peng2015named,
      title = {Named entity recognition for chinese social media with jointly trained embeddings},
      author = {Peng, Nanyun and Dredze, Mark},
      booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
      pages = {548--554},
      year = {2015}
    }
    
    Details