dovecot icu normalization does not work
Hi,
I expect the 'normalizer-icu' filter of dovecot to perform Unicode_equivalence and transpose non-ascii to ascii (things like 'é' to 'e' in french or 'ß' to 'ss' in german) in indexes to reduce their sizes and improve search results.
It currently doesn't and I am not sure why.
$ apk list|grep -E 'icu|dovecot'
dovecot-2.3.21-r17 aarch64 {dovecot} (MIT AND LGPL-2.1-or-later) [installed]
dovecot-fts-flatcurve-1.0.1-r0 aarch64 {dovecot-fts-flatcurve} (LGPL-2.1-or-later) [installed]
dovecot-lmtpd-2.3.21-r17 aarch64 {dovecot} (MIT AND LGPL-2.1-or-later) [installed]
dovecot-pigeonhole-plugin-2.3.21-r17 aarch64 {dovecot} (MIT AND LGPL-2.1-or-later) [installed]
dovecot-pop3d-2.3.21-r17 aarch64 {dovecot} (MIT AND LGPL-2.1-or-later) [installed]
dovecot-submissiond-2.3.21-r17 aarch64 {dovecot} (MIT AND LGPL-2.1-or-later) [installed]
icu-data-full-74.2-r0 aarch64 {icu} (ICU) [installed]
icu-libs-74.2-r0 aarch64 {icu} (ICU) [installed]
$ ldd /usr/lib/dovecot/libdovecot-fts.so|grep icu
libicui18n.so.74 => /usr/lib/libicui18n.so.74 (0xffffad3f5000)
libicuuc.so.74 => /usr/lib/libicuuc.so.74 (0xffffad202000)
libicudata.so.74 => /usr/lib/libicudata.so.74 (0xffffacd58000)
$ grep -iE 'filter|fts_(filter|tokenizers)' /etc/dovecot/dovecot.conf
fts_tokenizers = generic email-address
fts_filters = normalizer-icu snowball stopwords
fts_filters_en = lowercase normalizer-icu snowball english-possessive stopwords
fts_filters_fr = lowercase normalizer-icu snowball contractions stopwords
$ doveadm fts tokenize --lang fr "èéêçîàœôûù"
èéêçîàœôûù