
Nepali Dialogue Corpus
This research aims to study the existing Nepali dialogue corpus (if any), identify their limitations, and create a new Nepali dialogue benchmark.
We will also explore methods to collect dialogues in a weakly supervised manner eg. conversation on Twitter or similar other platforms. Finally, the scope of the project is to apply the machine learning baseline methods to validate the generalization of a curated data set.
References:
Lowe, Ryan, et al. “The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems.” arXiv preprint arXiv:1506.08909 (2015).
Research Themes:
B Bhattarai MultiModal Learning Lab (MMLL)
Project Category:
NLP