Fixing missencoded file names
Recenty I pulled some files from an archive to republish and index them. But I realised, that some file names hat wired characters like %9C in them. First I thought it were utf-8 names red as latin1 or the other way around. But the percent sign made me suspicious. It turned out, the file names were url-encoded.
The solution was to url-decode them.
In python: urllib.unquote(string).decode('utf-8')












