Carl Sable

Professor of Computer Engineering

Natural Language Processing

Natural Language Processing (NLP) refers to applications of computer science that involve the automatic processing of written or spoken human languages (as opposed to formal languages such as programming languages). The field relies on many techniques from artificial intelligence and machine learning, and occasionally tools from linguistics. Conventional successes of the field included web search, spam filtering, text categorization, and speech recognition. More recently, the field has been dominated by deep learning, which has led to huge improvements for tasks such as machine translation and natural language generation.

My background in NLP stems from my days as a computer science Ph.D. student in the NLP Group at Columbia University under the guidance of Prof. Kathleen McKeown. My thesis involved the automatic categorization of images based on associated text (e.g., the classification of photographs from news articles into topical categories based on their captions). This involved a combination of machine learning techniques to train and the system as well as the use of advanced computational linguistic approaches to improve the performance for some categories. I was also one of the original creators of Columbia Newsblaster. Newsblaster spidered the web looking for news articles. It then clustered the articles into groups such that each group contains articles dicsussing a single news event. Newsblaster then automatically classified and summarized each cluster. Newsblaster pre-dated Google News, and at one point it received tens of thousands of hits every day.

At Cooper Union, I have continued to pursue my interest in NLP in a variety of ways. I created a new Master's level elective, ECE 467: Natural Language Processing, which I have already taught multiple times. The course covers both statistical approaches and computational linguistic approaches to NLP as well as some linguistic theory; students in the course implement their own text categorization systems plus they work on open-ened final projects. I have also advised, or am advising, multiple Master's students whose theses directly relate to NLP, plus several other Master's students whose theses relate to the related field of machine learning. Additionally, I have advised many independent study courses related to NLP. I look forward to pursuing various applications of the field further with interested students.