Sunday, August 30, 2015

Hutter says BLT good idea done 20 years ago, cites baby steps




To: marcus.hutter@anu.edu.au Aug 27 at 11:00 AM

I would like to propose a research task of Blind Language Translation, and get your opinion.  The task would consist of automatically translating a large (100 terabyte or more) corpus of Spanish texts to English without utilizing any correspondence information (like aligned texts).  My understanding is the consensus of linguists says this should be impossible, however from a Big Data perspective, I disagree.  I'm sure you're aware of a lot of unsupervised learning NLP papers.

I think BLT task is a tangible path towards developing the features and frame representations needed for AI.  And I think BLT offers a framework for understanding intelligence, both human and otherwise, such as in Searle's Chinese Room.

It is tantalizing to consider that a corpus with sufficient semantic breadth, depth, and redundancy may "objectively" encode information.  However even if a BLT task succeeded it may be considered to have implicitly cheated.  I hope this raises questions of "objectivity", humanity, and even (Alien) Civilization Models.



From: Marcus Hutter

> I would like to propose a research task of Blind Language Translation, and get your opinion.  The task would consist of automatically translating a large (100 terabyte or more) corpus of Spanish texts to English without utilizing any correspondence information (like aligned texts).

Good idea, but it has already been done 20 years ago
http://arxiv.org/abs/cmp-lg/9505037
http://dx.doi.org/10.3115/981658.981709
with lots of follow-up work
http://scholar.google.com.au/scholar?cites=5467964337065257077
I think Google translate uses this too.

And you're right to disagree :-)








Thursday, June 27, 2013

Ars Technica Begins Censoring Forum Searches

Ars Technica began censoring forum searches around 5 P.M. PST today after publishing an "exclusive" revealing Edward Snowden had used the username TheTrueHOOHA on the social news site.

Would-be web-sleuths' searches were denied stating, "Sorry but you are not permitted to use the search system."

However we had already completed several searches.  Enjoy!

Edward Snowden in May 2003 on "Why XaiaX is clueless in matters concerning the English language".

"I've lost track of how many times you've been wrong now. I'll take the time to educate you and show you how you are wrong, though. The dictionary is actually the definitive source of word meaning. Think about it: the root word of dictionary is diction, which is the choice and use of words used in writing or speech. Pretty hard to argue with that. More literally, the dictionary is a definitive source: it supplies us with definitions. Can you redeem your degree for a cash prize? Maybe a stuffed animal? That's about all it appears it will be used for, Captain Semantics. I think you should try being a bit more open-minded."