To: marcus.hutter@anu.edu.au Aug 27 at 11:00 AM
I would like to propose a research task of Blind Language Translation, and get your opinion. The task would consist of automatically translating a large (100 terabyte or more) corpus of Spanish texts to English without utilizing any correspondence information (like aligned texts). My understanding is the consensus of linguists says this should be impossible, however from a Big Data perspective, I disagree. I'm sure you're aware of a lot of unsupervised learning NLP papers. I think BLT task is a tangible path towards developing the features and frame representations needed for AI. And I think BLT offers a framework for understanding intelligence, both human and otherwise, such as in Searle's Chinese Room. It is tantalizing to consider that a corpus with sufficient semantic breadth, depth, and redundancy may "objectively" encode information. However even if a BLT task succeeded it may be considered to have implicitly cheated. I hope this raises questions of "objectivity", humanity, and even (Alien) Civilization Models. From: Marcus Hutter > I would like to propose a research task of Blind Language Translation, and get your opinion. The task would consist of automatically translating a large (100 terabyte or more) corpus of Spanish texts to English without utilizing any correspondence information (like aligned texts). Good idea, but it has already been done 20 years ago http://arxiv.org/abs/cmp-lg/9505037 http://dx.doi.org/10.3115/981658.981709 with lots of follow-up work http://scholar.google.com.au/scholar?cites=5467964337065257077 I think Google translate uses this too. And you're right to disagree :-)
No comments:
Post a Comment