CP7018 LANGUAGE TECHNOLOGIES Syllabus - Anna University ME CSE 2nd Semester Syllabus Regulation 2013

CP7018 LANGUAGE TECHNOLOGIES Syllabus - Anna University ME CSE 2nd Semester Syllabus Regulation 2013 - www.annauniv.edu

OBJECTIVES:

 To understand the mathematical foundations needed for language processing
 To understand the representation and processing of Morphology and Part-of Speech Taggers
 To understand different aspects of natural language syntax and the various methods used for processing syntax
 To understand different methods of disambiguating word senses
 To know about various applications of natural language processing
 To learn the indexing and searching processes of a typical information retrieval system and to study NLP based retrieval systems
 To gain knowledge about typical text categorization and clustering techniques

UNIT I INTRODUCTION

Natural Language Processing – Mathematical Foundations – Elementary Probability Theory – Essential information Theory - Linguistics Essentials - Parts of Speech and Morphology – Phrase Structure – Semantics – Corpus Based Work.

UNIT II WORDS

Collocations – Statistical Inference – n-gram Models – Word Sense Disambiguation –Lexical Acquisition.

UNIT III GRAMMAR

Markov Models – Part-of-Speech Tagging – Probabilistic Context Free Grammars - Parsing.

UNIT IV INFORMATION RETRIEVAL

Information Retrieval Architecture – Indexing - Storage – Compression Techniques – Retrieval Approaches – Evaluation - Search Engines - Commercial Search Engine Features – Comparison - Performance Measures – Document Processing - NLP based Information Retrieval – Information Extraction.

UNIT V TEXT MINING

Categorization – Extraction Based Categorization – Clustering - Hierarchical Clustering - Document Classification and Routing - Finding and Organizing Answers from Text Search – Text Categorization and Efficient Summarization using Lexical Chains – Machine Translation - Transfer Metaphor - Interlingual and Statistical Approaches.

OUTCOMES:

Upon completion of the course, the students will be able to
 Identify the different linguistic components of given sentences
 Design a morphological analyser for a language of your choice using finite state automata concepts
 Implement a parser by providing suitable grammar and words
 Discuss algorithms for word sense disambiguation
 Build a tagger to semantically tag words using WordNet
 Design an application that uses different aspects of language processing.

REFERENCES:

1. Christopher D.Manning and Hinrich Schutze, “ Foundations of Statistical Natural Language Processing “, MIT Press, 1999.
2. Daniel Jurafsky and James H. Martin, “ Speech and Language Processing” , Pearson, 2008.
3. Ron Cole, J.Mariani, et.al “Survey of the State of the Art in Human Language Technology”, Cambridge University Press, 1997.
4. Michael W. Berry, “ Survey of Text Mining: Clustering, Classification and Retrieval”, Springer Verlag, 2003.