Anonymizer for Polish Language
Data
2023
Tytuł czasopisma
ISSN czasopisma
Tytuł tomu
Wydawca
Wydawnictwo Politechniki Łódzkiej
Lodz University of Technology Press
Lodz University of Technology Press
Abstrakt
Researchers and enterprises require anonymization of unstructured
text. This is not only due to the GDPR regulation, but also due to
the increasing use of large language models (LLMs) such as GPT-3, where
there is growing concern about the privacy and security risks associated
with these models. The texts to be processed by such models need to be
anonymized beforehand, and very often they need to be anonymized at the
data providers’ premises rather than at the machine learning teams. In this
paper, we present an effective anonymization pipeline for Polish. It provides
a modular and configurable solution that employs different modes, including
the challenging pseudo-anonymization mode in languages with complex inflectional
systems. The system can be easily integrated with existing systems
and deployed in different environments using a microservices architecture
solution with a REST interface.
Opis
Słowa kluczowe
natural language processing, anonymization, Polish language, Kubernetes, przetwarzanie języka naturalnego, anonimizacja, język polski, Kubernetes
Cytowanie
Walkowiak T., Gniewkowski M., Pogoda M., Ropiak N., Anonymizer for Polish Language. W: Progress in Polish Artificial Intelligence Research 4, Wojciechowski A. (Ed.), Lipiński P. (Ed.)., Seria: Monografie Politechniki Łódzkiej Nr. 2437, Wydawnictwo Politechniki Łódzkiej, Łódź 2023, s. 281-284, ISBN 978-83-66741-92-8,
doi: 10.34658/9788366741928.44.