Engineering the counting filter for string matching algorithms
MetadataShow full item record
We consider a new approach to the popular counting filter technique for approximate pattern matching. Our solution is based on q-grams combined with alphabet reduction by bin packing and using Streaming SIMD Extensions (SSE). We present a few variants that use the mentioned techniques and discuss pros and cons of them considering two approximate pattern matching problems. The first one is the well- known matching with k-differences and the second one is a biological problem of DNA sequence mutation called matching with inversions and translocations. The experimental results show the effectiveness of our ideas that speed up the counting filter and reduce the number of verifications by orders of magnitude.