Carl Sable

Associate Professor of Electrical Engineering

Natural Language Processing

Natural Language Processing (NLP) refers to applications of computer science that involve the automatic processing of written or spoken human languages (as opposed to formal languages such as programming languages). The field relies on many techniques from artificial intelligence and machine learning as well tools from linguistics. Huge successes of the field include advanced web search, spam filtering, and speech recognition; other tasks being researched by NLP practicioners include automatic summarization, grammar checking, and machine translation.

My background in NLP stems from my days as a computer science Ph.D. student in the NLP Group at Columbia University under the guidance of Prof. Kathleen McKeown. My thesis involved the automatic categorization of images based on associated text (e.g., the classification of photographs from news articles into topical categories based on their captions). This involved a combination of machine learning techniques to train and the system as well as the use of advanced computational linguistic approaches to improve the performance for some categories. I was also one of the original creators of Columbia Newsblaster. Newsblaster starts by spidering the web looking for news articles. It then clusters the articles into groups such that each group contains articles dicsussing a single news event. Newsblaster then automatically classifies and summarizes each cluster. Newsblaster pre-dated Google News, and at one point it received tens of thousands of hits every day.

At Cooper Union, I have continued to pursue my interest in NLP in a variety of ways. I created a new Master's level elective, ECE 467: Natural Language Processing, which I have already taught multiple times. The course covers both statistical approaches and computational linguistic approaches to NLP as well as some linguistic theory; students in the course implement their own text categorization systems plus they work on open-ened final projects. I have also advised, or am advising, multiple Master's students whose theses directly relate to NLP, plus several other Master's students whose theses relate to the related field of machine learning. Additionally, I have advised multiple independent study projects related to NLP. I look forward to pursuing various applications of the field further with interested students.