BARC talk by Teresa Anna Steiner

Friday, March 20, 2026, 11-12 AM, Teresa Anna Steiner, Assistant Professor at SDU, will be giving a BARC talk on "A Differentially Private Data Structure for Substring and Document Counting".

Abstract

For databases consisting of many text documents, one of the most fundamental data analysis tasks is counting (i) how often a pattern appears as a substring in the database (substring counting) and (ii) how many documents in the collection contain the pattern as a substring (document counting). If such a database contains sensitive data, it is crucial to protect the privacy of individuals in the database.

In this talk, we study the problem of substring and document counting under differential privacy. In our setting, entire documents are kept private, that is, neighboring datasets differ in one document. We design the first differentially private data structure for this problem, which means that the construction algorithm is differentially private, and the data structure can be queried ad libitum without further privacy loss. The additive error of our data structure is optimal for ϵ-differential privacy up to polylogarithmic factors. Our data structure immediately leads to improved algorithms for related problems, such as privately mining frequent substrings and q-grams. Our techniques combine ideas from differential privacy under continual observation, string algorithms, and tree data structures.

Based on joint work with Giulia Bernardini, Philip Bille, and Inge Li Gørtz.

Host

Rasmus Pagh