Text normalization
January 14, 2023
Check if string not use standard ASCII characters
import ( "unicode")func isASCII(s string) bool { for _, c := range s { if c > unicode.MaxASCII { return false } } return true}// isASCII("홍길동") false// isASCII("Hong Gildong") true
Use case: If I have duplicate email but different name, so prefer the native string or ASCII string we can easily determine which name we use
- González vs Gonzalez - pick González
- Yamada vs 山田 - pick 山田
- 홍길동 vs Hong Gildong - pick 홍길동
Normalize normal alphanumeric characters if using non ASCII
import ( "unicode" "golang.org/x/text/unicode/norm")func isASCII(s string) bool { for _, c := range s { if c > unicode.MaxASCII { return false } } return true}func Normalize(s string) string { if !isASCII(s){ return norm.NFKD.String(s) } return s}// Normalize("James") -> James
Use case: if the string is more commonly written in the common ASCII set of characters, prefer the ASCII version.