Language Technologies and Computational Linguistics

Coordinated by: Institute of Formal and Applied Linguistics
Study branch coordinator: Doc. RNDr. Markéta Lopatková, Ph.D.

Specializations:

Computational and formal linguistics
Statistical and machine learning methods in Natural Language Processing

The graduate is familiar with mathematical and algorithmic foundations of automatic natural language processing, with theoretical foundations of formal description of natural languages, as well as with state-of-the-art machine learning techniques. The student acquires the skills in designing and development of systems to automatically process large quantities of language data, written and spoken, structured and unstructured alike, and to solve language-related tasks, such as information retrieval, question answering, summarization and information extraction, machine translation, and speech processing.

The graduate is well prepared for doctoral studies in computational linguistics and language technologies, as well as for a professional career in the public or private sector. Given the general applicability of machine learning and data driven methods, the graduate is well equipped to use these methods not only in natural language processing tasks but also in other domains where large quantities of both structured and unstructured data are being analyzed (finances, economy, biology, medicine, and other domains). The student acquires programming experience and soft skills required for team work on applications that involve machine learning or human-computer interaction.

5.1 Obligatory Courses

CodeSubjectCreditsWinterSummer
NTIN066Data Structures 1 62/2 C+Ex
NTIN090Introduction to Complexity and Computability 42/1 C+Ex
NPFL063Introduction to General Linguistics 42/1 C+Ex
NPFL067Statistical Methods in Natural Language Processing I 52/2 C+Ex
NPFL114Deep Learning 73/2 C+Ex
NSZZ023Diploma Thesis I 60/4 C
NSZZ024Diploma Thesis II 90/6 C
NSZZ025Diploma Thesis III 150/10 C

5.2 Elective Courses - Set 1

The student needs to obtain at least 40 credits in total for the elective courses. Of these 40 required credits, at most 6 credits can be obtained from project courses (set 2 below) and at most 10 credits from the additional set of elective courses (set 3 below).

CodeSubjectCreditsWinterSummer
NPFL006Introduction to Formal Linguistics 32/0 Ex
NPFL038Fundamentals of Speech Recognition and Generation 52/2 C+Ex
NPFL068Statistical Methods in Natural Language Processing II 52/2 C+Ex
NPFL070Language Data Resources 41/2 MC
NPFL075Dependency Grammars and Treebanks 52/2 C+Ex
NPFL079Algorithms in Speech Recognition 52/2 C+Ex
NPFL082Information Structure of Sentences and Discourse Structure 20/2 C
NPFL083Linguistic Theories and Grammar Formalisms 52/2 C+Ex
NPFL087Statistical Machine Translation 52/2 C+Ex
NPFL093NLP Applications 42/1 MC
NPFL094Morphological and Syntactic Analysis 32/0 MC
NPFL095Modern Methods in Computational Linguistics 30/2 C
NPFL097Unsupervised Machine Learning in NLP 31/1 C
NPFL099Statistical Dialogue Systems 42/1 C+Ex
NPFL100Variability of Languages in Time and Space 21/1 C
NPFL103Information Retrieval 52/2 C+Ex
NPFL104Machine Learning Methods 41/2 C+Ex
NPFL122Deep Reinforcement Learning 52/2 C+Ex
NPFL128Language Technologies in Practice 42/1 MC

5.3 Elective Courses - Set 2 - Team Project Courses

The student can select at most one of the project courses as an elective course; at most 6 credits count as credits for elective courses. (Other potential credits for courses from this set count as credits for free courses.)

CodeSubjectCreditsWinterSummer
NPRG069Software Project 120/8 C0/8 C
NPRG070Research Project 90/6 C0/6 C
NPRG071Company Project 60/4 C0/4 C

5.4 Elective Courses - Set 3

The student can select any course from the following set of additional courses; at most 10 credits count as credits for elective courses. (Other potential credits for courses from this set count as credits for free courses.)

CodeSubjectCreditsWinterSummer
NAIL025Evolutionary Algorithms 1 52/2 C+Ex
NAIL069Artificial Intelligence 1 42/1 C+Ex
NAIL070Artificial Intelligence 2 32/0 Ex
NAIL104Probabilistic graphical models 32/0 Ex
NPGR036Computer Vision 52/2 C+Ex

5.5 State Final Exam

The state final exam for the program Language Technologies and Computational Linguistics consists of one obligatory examination area for both specializations (examination area 1), one obligatory area dependent on the selected specialization (examination area 2 or examination area 3), and one elective examination area (examination areas 4 and 5). As the last examination area, the student may also select the obligatory area of the other specialization of this study program. In total, each student gets questions from three examination areas.

Examination areas

1. Fundamentals of natural language processing (obligatory for both specializations)
2. Linguistic theories and formalisms (obligatory for the specialization Computational and formal linguistics)
3. Statistical methods and machine learning in computational linguistics (obligatory for the specialization Statistical and machine learning methods in Natural Language Processing)
4. Speech, dialogue and multimodal systems (elective)
5. Applications in natural language processing (elective)

Knowledge requirements

1. Fundamentals of natural language processing
Phonetics, phonology, morphology, syntax, semantics, pragmatics. Ambiguity, arbitrariness. Description and prescription. Diachronic and synchronic language description. Fundamentals of information theory. Markov models. Language modeling and smoothing. Word classes. Annotated corpora. Design and evaluation of linguistic experiments, evaluation metrics. Morphological disambiguation and syntactic analysis. Basic classification and regression algorithms.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL063Introduction to General Linguistics 42/1 C+Ex
NPFL067Statistical Methods in Natural Language Processing I 52/2 C+Ex

2. Linguistic theories and formalisms
Functional Generative Description. Prague Dependency Treebank. Universal Dependencies. Other grammar formalisms (overview and basic characteristics). Phonetics, phonology. Computational Morphology. Surface and deep syntactic structure; valency. Computational lexicography. Topic-focus articulation; information structure, discourse. Coreference. Linguistic typology. Formal grammars and their application in rule-based morphology. Parsing.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL063Introduction to General Linguistics 42/1 C+Ex
NPFL006Introduction to Formal Linguistics 32/0 Ex
NPFL075Dependency Grammars and Treebanks 52/2 C+Ex
NPFL083Linguistic Theories and Grammar Formalisms 52/2 C+Ex
NPFL094Morphological and Syntactic Analysis 32/0 MC

3. Statistical methods and machine learning in computational linguistics
Generative and discriminative models. Supervised machine learning methods for classification and regression (linear models, other methods: naive Bayes, decision trees, instance-based learning, SVM and kernels, logistic regression). Unsupervised machine learning methods. Language models, noisy channel model. Model smoothing, model combination. HMM, trellis, Viterbi, Baum-Welch. Algorithms for statistical tagging. Algorithms for constituency and dependency statistical parsing. Neural networks in machine learning. Convolution and recurrent networks. Word embeddings.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL067Statistical Methods in Natural Language Processing I 52/2 C+Ex
NPFL114Deep Learning 73/2 C+Ex
NPFL068Statistical Methods in Natural Language Processing II 52/2 C+Ex

4. Speech, dialogue and multimodal systems
Fundamentals of speech production and perception. Methods of speech signal processing. HMM acoustic modeling of phonemes. The implementation of the Baum-Welch and Viterbi algorithms in speech recognition systems. Neural models for speech. Methods of speech synthesis. Speech applications. Basic components of a dialogue system. Natural language understanding in dialogue systems. Dialogue state tracking. Methods for dialogue management. User simulation. End-to-end neural dialogue systems. Open-domain dialogue system architectures. Natural language generation. Dialogue systems evaluation. Visual dialogue and multimodal systems.

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL038Fundamentals of Speech Recognition and Generation 52/2 C+Ex
NPFL079Algorithms in Speech Recognition 52/2 C+Ex
NPFL099Statistical Dialogue Systems 42/1 C+Ex

5. Applications in natural language processing
Spell-checking and grammar-checking. Machine translation. Machine-aided translation. Statistical methods in machine translation. Quality evaluation of machine translation. Speech translation. Information retrieval, models for information retrieval. Query expansion and relevance feedback. Document clustering. Duplicate detection and plagiarism detection. Information retrieval evaluation. Sentiment analysis. Toolkits (GATE, NLTK, NLPTools, Lucene, Terrier).

Recommended courses

CodeSubjectCreditsWinterSummer
NPFL087Statistical Machine Translation 52/2 C+Ex
NPFL093NLP Applications 42/1 MC
NPFL103Information Retrieval 52/2 C+Ex
NPFL128Language Technologies in Practice 42/1 MC
 

Charles University, Faculty of Mathematics and Physics
Ke Karlovu 3, 121 16 Praha 2, Czech Republic
VAT ID: CZ00216208

HR Award at Charles University

4EU+ Alliance