6 Degree Plans - Computational Linguistics

Coordinated by: Institute of Formal and Applied Linguistics
Study branch coordinator: Doc. RNDr. Markéta Lopatková, Ph.D.

Specializations:

Computational and formal linguistics
Statistical methods and machine learning in computational linguistics

The aim of the study branch Computational Linguistics is to get the students ready for research in the area of natural language processing and development of applications dealing with both written and spoken language. Examples of such applications are systems of information retrieval, machine translation, grammar checking, text summarization and information extraction, automatic speech recognition, voice control, spoken dialogue systems, and speech synthesis. The emphasis is put on deep understanding of formal foundations and their practical applicability. The study branch Computational Linguistics can be studied in two specializations: (i) computational and formal linguistics, and (ii) statistical methods and machine learning in computational linguistics.

The graduate is familiar with the theoretical foundations of the formal description of natural languages, the mathematical and algorithmic foundations of automatic natural language processing, and state-of-the-art machine learning techniques. Graduates have the ability to apply the knowledge acquired during their studies in the design and development of systems automatically processing natural language and large quantities of both structured and unstructured data, such as information retrieval, question answering, summarization and information extraction, machine translation and speech processing. They are equipped with reasonable knowledge, skills, and experience in software development and teamwork applicable in all areas involving the development of applications aiding human-computer interaction and/or machine learning.

6.1 Obligatory courses

CodeSubjectCreditsWinterSummer
NTIN090Introduction to Complexity and Computability 52/1 C+Ex
NTIN066Data Structures I 52/1 C+Ex
NPFL063Introduction to General Linguistics 52/1 C+Ex
NPFL067Statistical Methods in Natural Language Processing I 62/2 C+Ex
NPFL092NLP Technology 51/2 MC
NSZZ023Diploma Thesis I 60/4 C0/4 C
NSZZ024Diploma Thesis II 90/6 C0/6 C
NSZZ025Diploma Thesis III 150/10 C0/10 C

6.2 Elective courses

The student needs to obtain at least 42 credits for the courses from the following set:

CodeSubjectCreditsWinterSummer
NPFL006Introduction to Formal Linguistics 32/0 Ex
NPFL038Fundamentals of Speech Recognition and Generation 62/2 C+Ex
NPFL068Statistical Methods in Natural Language Processing II 62/2 C+Ex
NPFL070Language Data Resources 51/2 MC
NPFL075Prague Dependency Treebank 62/2 C+Ex
NPFL079Algorithms in Speech Recognition 62/2 C+Ex
NPFL082Information Structure of Sentences and Discourse Structure 30/2 C
NPFL083Linguistic Theory and Grammar Formalisms 62/2 C+Ex
NPFL087Statistical Machine Translation 62/2 C+Ex
NPFL093NLP Applications 52/1 MC
NPFL094Morphological and Syntactic Analysis I 32/0 MC
NPFL095Modern Methods in Computational Linguistics 30/2 C
NPFL096Computational Morphology 42/1 Ex
NPFL099Statistical Dialogue Systems 52/1 C+Ex
NPFL103Information Retrieval 62/2 C+Ex
NPFL104Machine Learning Methods 51/2 C+Ex
NPRG027Credit for Project 60/4 C0/4 C
NPRG023Software Project 90/6 C0/6 C
NPFL114Deep Learning 73/2 C+Ex

6.3 State Final Exam

In addition to the two examination areas that are obligatory for all study branches, there is one obligatory area for this study branch, one obligatory area dependent on the specialization, and one elective examination area. As the last examination area, the student may also select the obligatory area of the other specialization of the study branch Computational Linguistics, or any area from the specialization Intelligent agents or the specialization Machine learning of the study branch Artificial Intelligence, or any area from the specialization Computer graphics of the study branch Computer Graphics and Game Development. In total, each student will get five questions from the five examination areas.

Examination areas

1. Fundamentals of natural language processing (obligatory for both specializations)
2. Linguistic theories and formalisms (obligatory for the specialization Computational and formal linguistics)
3. Statistical methods and machine learning in computational linguistics (obligatory for the specialization Statistical methods and machine learning in computational linguistics)
4. Multimodal technologies and data (elective)
5. Applications in natural language processing (elective)

Knowledge requirements

1. Fundamentals of natural language processing
Fundamentals of general linguistics. System of layers in language description. Dependency syntax, formal definition of dependency trees and their characteristics. The Chomsky hierarchy of languages, context free languages, phrase grammars, unification-based grammars and categorial grammars for a natural language. Design and evaluation of linguistic experiments, evaluation metrics. Basic stochastic methods. Language modeling, basic methods for training stochastic models. Basic algorithms.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL067Statistical Methods in Natural Language Processing I 62/2 C+Ex
NPFL063Introduction to General Linguistics 52/1 C+Ex

2. Linguistic theories and formalisms
Functional Generative Description. Prague Dependency Treebank. Other basic grammar formalisms (Government and Binding, unification-based grammars, feature structures, HPSG, LFG, categorial grammars, (L)TAG). Phonetics, phonology. Computational Morphology. Syntax. Computational lexicography. Topic-focus articulation; information structure, discourse. Coreference. Linguistic typology. Formal grammars and their application in rule-based morphology and parsing.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL063Introduction to General Linguistics 52/1 C+Ex
NPFL083Linguistic Theory and Grammar Formalisms 62/2 C+Ex
NPFL075Prague Dependency Treebank 62/2 C+Ex
NPFL094Morphological and Syntactic Analysis I 32/0 MC
NPFL006Introduction to Formal Linguistics 32/0 Ex

3. Statistical methods and machine learning in computational linguistics
Generative and discriminative models. Supervised machine learning for classification and regression (linear models, other methods: Naive Bayes, decision trees, example-based learning). Support Vector Machines and Kernel functions. Logistic regression. Unsupervised machine learning methods. Bayesian Networks. Bias-variance tradeoff. Language models and noisy channel models. Smoothing, model combination. HMM, trellis, Viterbi, Baum-Welch. Algorithms for statistical tagging. Algorithms for phrase-based and dependency-based statistical parsing.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL067Statistical Methods in Natural Language Processing I 62/2 C+Ex
NPFL068Statistical Methods in Natural Language Processing II 62/2 C+Ex
NPFL104Machine Learning Methods 51/2 C+Ex
NPFL087Statistical Machine Translation 62/2 C+Ex

4. Multimodal technologies and data
Fundamentals of speech production and perception. Methods of speech signal processing. HMM acoustic modeling of phonemes. The implementation of the Baum-Welch and Viterbi algorithms in speech recognition systems. Continuous speech recognition using large dictionaries. Adaptation techniques. Speech summarization. Topic and key-word spotting in speech corpora. Speaker recognition. Methods of speech synthesis. Text processing for speech synthesis. Prosody modeling. Basic components of a dialog system. Spoken language understanding. Dialog control – MDP and POMDP systems. Reinforcement learning. Dialogue state tracking in MDP and POMDP systems. User simulation. Speech generation. Dialog systems quality evaluation. Search and indexing in audio-visual archives.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL038Fundamentals of Speech Recognition and Generation 62/2 C+Ex
NPFL079Algorithms in Speech Recognition 62/2 C+Ex
NPFL099Statistical Dialogue Systems 52/1 C+Ex

5. Applications in natural language processing
Spell-checking and grammar-checking. Input methods. Machine translation. Machine-aided translation. Statistical methods in machine translation. Quality evaluation of machine translation. Information retrieval, models for information retrieval. Query expansion and relevance feedback. Document clustering. Web search. Duplicate detection and plagiarism detection. Information retrieval evaluation. Sentiment analysis, social network analysis. Search systems (Lucene, SOLR, Terrier). NLP toolkits (GATE, NLTK, NLPTools).

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL087Statistical Machine Translation 62/2 C+Ex
NPFL103Information Retrieval 62/2 C+Ex
NPFL093NLP Applications 52/1 MC

© 2013–2018 Charles University, Faculty of Mathematics and Physics. Design noBrother.
Content responsibility: Student Affairs Department.