Logic and
Theory of
Discrete Systems

Informatik 7

String Processing and Compression

Lecture during Winter Term 2015 / 2016

Content

String data is ubiquitous in many areas of computer science. For example, web applications use encodings as strings to exchange data, and most data in bioinformatics, like DNA sequences, are strings. The lecture presents algorithms and data structures tailored to process string-encoded data. This includes the problem of finding patterns in strings and aligning sequences as well as sorting strings and searching for strings.

Compressing data is important if we want to save resources like memory and bandwidth. The lecture starts to introduce the information-theoretic background of compression techniques. Based on this introduction, the lecture presents lossless compression techniques for strings.

Topics

String-Processing Algorithms

  • Radix Trees
  • Radix Sorts
  • Pattern Matching
  • Approximate Pattern Matching
  • Probabilistic Pattern Matching
  • Suffix Trees
  • Suffix Arrays
  • String Assembly
  • Hard String Problems

Data-Compression Techniques

  • Coding Theory
  • Information Theory
  • Dictionary-Based Compression
  • Context-Based Compression
  • Grammar-Based Compression

Organizational