Computational Linguistics

Coordinated by: Institute of Formal and Applied Linguistics
Study branch coordinator: Doc. RNDr. Markéta Lopatková, Ph.D.

Specializations:

 Computational and formal linguistics
 Statistical methods and machine learning in computational linguistics

The aim of the study branch Computational Linguistics is to get the students ready for research in the area of natural language processing and development of applications dealing with both written and spoken language. Examples of such applications are systems of information retrieval, machine translation, grammar checking, text summarization and information extraction, automatic speech recognition, voice control, spoken dialogue systems, and speech synthesis. The emphasis is put on deep understanding of formal foundations and their practical applicability. The study branch Computational Linguistics can be studied in two specializations: (i) computational and formal linguistics, and (ii) statistical methods and machine learning in computational linguistics.

The graduate is familiar with the theoretical foundations of the formal description of natural languages, the mathematical and algorithmic foundations of automatic natural language processing, and state-of-the-art machine learning techniques. Graduates have the ability to apply the knowledge acquired during their studies in the design and development of systems automatically processing natural language and large quantities of both structured and unstructured data, such as information retrieval, question answering, summarization and information extraction, machine translation and speech processing. They are equipped with reasonable knowledge, skills, and experience in software development and teamwork applicable in all areas involving the development of applications aiding human-computer interaction and/or machine learning.

6.1 Obligatory courses

Code Subject Credits Winter Summer
NTIN090 Introduction to Complexity and Computability   5 2/1 C+Ex
NTIN066 Data Structures I   5 2/1 C+Ex
NPFL063 Introduction to General Linguistics   5 2/1 C+Ex
NPFL067 Statistical Methods in Natural Language Processing I   6 2/2 C+Ex
NPFL092 NLP Technology   5 1/2 MC
NSZZ023 Diploma Thesis I   6 0/4 C 0/4 C
NSZZ024 Diploma Thesis II   9 0/6 C 0/6 C
NSZZ025 Diploma Thesis III   15 0/10 C 0/10 C

6.2 Elective courses

The student needs to obtain at least 42 credits for the courses from the following set:

Code Subject Credits Winter Summer
NPFL006 Introduction to Formal Linguistics   3 2/0 Ex
NPFL038 Fundamentals of Speech Recognition and Generation   6 2/2 C+Ex
NPFL068 Statistical Methods in Natural Language Processing II   6 2/2 C+Ex
NPFL070 Language Data Resources   5 1/2 MC
NPFL075 Prague Dependency Treebank   6 2/2 C+Ex
NPFL079 Algorithms in Speech Recognition   6 2/2 C+Ex
NPFL082 Information Structure of Sentences and Discourse Structure   3 0/2 C
NPFL083 Linguistic Theory and Grammar Formalisms   6 2/2 C+Ex
NPFL087 Statistical Machine Translation   6 2/2 C+Ex
NPFL093 NLP Applications   5 2/1 MC
NPFL094 Morphological and Syntactic Analysis I   3 2/0 MC
NPFL095 Modern Methods in Computational Linguistics   3 0/2 C
NPFL096 Computational Morphology   4 2/1 Ex
NPFL099 Statistical Dialogue Systems   5 2/1 C+Ex
NPFL103 Information Retrieval   6 2/2 C+Ex
NPFL104 Machine Learning Methods   5 1/2 C+Ex
NPRG027 Credit for Project   6 0/4 C 0/4 C
NPRG023 Software Project   9 0/6 C 0/6 C
NPFL114 Deep Learning   7 3/2 C+Ex

6.3 State Final Exam

In addition to the two examination areas that are obligatory for all study branches, there is one obligatory area for this study branch, one obligatory area dependent on the specialization, and one elective examination area. As the last examination area, the student may also select the obligatory area of the other specialization of the study branch Computational Linguistics, or any area from the specialization Intelligent agents or the specialization Machine learning of the study branch Artificial Intelligence, or any area from the specialization Computer graphics of the study branch Computer Graphics and Game Development. In total, each student will get five questions from the five examination areas.

Examination areas

1. Fundamentals of natural language processing (obligatory for both specializations)
2. Linguistic theories and formalisms (obligatory for the specialization Computational and formal linguistics)
3. Statistical methods and machine learning in computational linguistics (obligatory for the specialization Statistical methods and machine learning in computational linguistics)
4. Multimodal technologies and data (elective)
5. Applications in natural language processing (elective)

Knowledge requirements

1. Fundamentals of natural language processing
Fundamentals of general linguistics. System of layers in language description. Dependency syntax, formal definition of dependency trees and their characteristics. The Chomsky hierarchy of languages, context free languages, phrase grammars, unification-based grammars and categorial grammars for a natural language. Design and evaluation of linguistic experiments, evaluation metrics. Basic stochastic methods. Language modeling, basic methods for training stochastic models. Basic algorithms.

Recommended courses

Code Subject Credits Winter Summer
NPFL067 Statistical Methods in Natural Language Processing I   6 2/2 C+Ex
NPFL063 Introduction to General Linguistics   5 2/1 C+Ex

2. Linguistic theories and formalisms
Functional Generative Description. Prague Dependency Treebank. Other basic grammar formalisms (Government and Binding, unification-based grammars, feature structures, HPSG, LFG, categorial grammars, (L)TAG). Phonetics, phonology. Computational Morphology. Syntax. Computational lexicography. Topic-focus articulation; information structure, discourse. Coreference. Linguistic typology. Formal grammars and their application in rule-based morphology and parsing.

Recommended courses

Code Subject Credits Winter Summer
NPFL063 Introduction to General Linguistics   5 2/1 C+Ex
NPFL083 Linguistic Theory and Grammar Formalisms   6 2/2 C+Ex
NPFL075 Prague Dependency Treebank   6 2/2 C+Ex
NPFL094 Morphological and Syntactic Analysis I   3 2/0 MC
NPFL006 Introduction to Formal Linguistics   3 2/0 Ex

3. Statistical methods and machine learning in computational linguistics
Generative and discriminative models. Supervised machine learning for classification and regression (linear models, other methods: Naive Bayes, decision trees, example-based learning). Support Vector Machines and Kernel functions. Logistic regression. Unsupervised machine learning methods. Bayesian Networks. Bias-variance tradeoff. Language models and noisy channel models. Smoothing, model combination. HMM, trellis, Viterbi, Baum-Welch. Algorithms for statistical tagging. Algorithms for phrase-based and dependency-based statistical parsing.

Recommended courses

Code Subject Credits Winter Summer
NPFL067 Statistical Methods in Natural Language Processing I   6 2/2 C+Ex
NPFL068 Statistical Methods in Natural Language Processing II   6 2/2 C+Ex
NPFL104 Machine Learning Methods   5 1/2 C+Ex
NPFL087 Statistical Machine Translation   6 2/2 C+Ex

4. Multimodal technologies and data
Fundamentals of speech production and perception. Methods of speech signal processing. HMM acoustic modeling of phonemes. The implementation of the Baum-Welch and Viterbi algorithms in speech recognition systems. Continuous speech recognition using large dictionaries. Adaptation techniques. Speech summarization. Topic and key-word spotting in speech corpora. Speaker recognition. Methods of speech synthesis. Text processing for speech synthesis. Prosody modeling. Basic components of a dialog system. Spoken language understanding. Dialog control – MDP and POMDP systems. Reinforcement learning. Dialogue state tracking in MDP and POMDP systems. User simulation. Speech generation. Dialog systems quality evaluation. Search and indexing in audio-visual archives.

Recommended courses

Code Subject Credits Winter Summer
NPFL038 Fundamentals of Speech Recognition and Generation   6 2/2 C+Ex
NPFL079 Algorithms in Speech Recognition   6 2/2 C+Ex
NPFL099 Statistical Dialogue Systems   5 2/1 C+Ex

5. Applications in natural language processing
Spell-checking and grammar-checking. Input methods. Machine translation. Machine-aided translation. Statistical methods in machine translation. Quality evaluation of machine translation. Information retrieval, models for information retrieval. Query expansion and relevance feedback. Document clustering. Web search. Duplicate detection and plagiarism detection. Information retrieval evaluation. Sentiment analysis, social network analysis. Search systems (Lucene, SOLR, Terrier). NLP toolkits (GATE, NLTK, NLPTools).

Recommended courses

Code Subject Credits Winter Summer
NPFL087 Statistical Machine Translation   6 2/2 C+Ex
NPFL103 Information Retrieval   6 2/2 C+Ex
NPFL093 NLP Applications   5 2/1 MC
 

Charles University, Faculty of Mathematics and Physics
Ke Karlovu 3, 121 16 Praha 2, Czech Republic
VAT ID: CZ00216208

HR Award at Charles University

4EU+ Alliance