Anonymizer for Polish Language

dc.contributor.authorWalkowiak, Tomasz
dc.contributor.authorGniewkowski, Mateusz
dc.contributor.authorPogoda, Michał
dc.contributor.authorRopiak, Norbert
dc.date.accessioned2023-09-22T11:01:56Z
dc.date.available2023-09-22T11:01:56Z
dc.date.issued2023
dc.description.abstractResearchers and enterprises require anonymization of unstructured text. This is not only due to the GDPR regulation, but also due to the increasing use of large language models (LLMs) such as GPT-3, where there is growing concern about the privacy and security risks associated with these models. The texts to be processed by such models need to be anonymized beforehand, and very often they need to be anonymized at the data providers’ premises rather than at the machine learning teams. In this paper, we present an effective anonymization pipeline for Polish. It provides a modular and configurable solution that employs different modes, including the challenging pseudo-anonymization mode in languages with complex inflectional systems. The system can be easily integrated with existing systems and deployed in different environments using a microservices architecture solution with a REST interface.en_EN
dc.identifier.citationWalkowiak T., Gniewkowski M., Pogoda M., Ropiak N., Anonymizer for Polish Language. W: Progress in Polish Artificial Intelligence Research 4, Wojciechowski A. (Ed.), Lipiński P. (Ed.)., Seria: Monografie Politechniki Łódzkiej Nr. 2437, Wydawnictwo Politechniki Łódzkiej, Łódź 2023, s. 281-284, ISBN 978-83-66741-92-8, doi: 10.34658/9788366741928.44.
dc.identifier.doi10.34658/9788366741928.44
dc.identifier.isbn978-83-66741-92-8
dc.identifier.urihttp://hdl.handle.net/11652/4820
dc.identifier.urihttps://doi.org/10.34658/9788366741928.44
dc.language.isoenen_EN
dc.page.numbers. 281-284
dc.publisherWydawnictwo Politechniki Łódzkiejpl_PL
dc.publisherLodz University of Technology Pressen_EN
dc.relation.ispartofWojciechowski A. (Ed.), Lipiński P. (Ed.)., Progress in Polish Artificial Intelligence Research 4, Seria: Monografie Politechniki Łódzkiej Nr. 2437, Wydawnictwo Politechniki Łódzkiej, Łódź 2023, ISBN 978-83-66741-92-8, doi: 10.34658/9788366741928.
dc.rightsDla wszystkich w zakresie dozwolonego użytkupl_PL
dc.rightsFair use conditionen_EN
dc.rights.licenseLicencja PŁpl_PL
dc.rights.licenseLUT Licenseen_EN
dc.subjectnatural language processingen_EN
dc.subjectanonymizationen_EN
dc.subjectPolish languageen_EN
dc.subjectKubernetesen_EN
dc.subjectprzetwarzanie języka naturalnegopl_PL
dc.subjectanonimizacjapl_PL
dc.subjectjęzyk polskipl_PL
dc.subjectKubernetespl_PL
dc.titleAnonymizer for Polish Languageen_EN
dc.typeRozdział - monografiapl_PL
dc.typeChapter - monographen_EN

Pliki

Oryginalne pliki
Teraz wyświetlane 1 - 1 z 1
Brak miniatury
Nazwa:
44. Anonymizer_polish_language_Walkowiak_Gniewkowski_2023.pdf
Rozmiar:
216.23 KB
Format:
Adobe Portable Document Format
Opis:
Licencja
Teraz wyświetlane 1 - 1 z 1
Brak miniatury
Nazwa:
license.txt
Rozmiar:
1.71 KB
Format:
Item-specific license agreed upon to submission
Opis:

Kolekcje