How can I ensure that Python encode utf-8 is properly used when dealing with strings containing special characters (like é or è) in my scripts?
I am creating a string in Python that I save in a file, which contains a lot of data, including directory names and filenames. I want to keep everything in UTF-8 since I will save it in MySQL later. My MySQL database is also set to UTF-8, but I am encountering issues with some characters (like é or è), even though the string in the file is displayed correctly.
Here’s my script:
#!/usr/bin/python
# -*- coding: utf-8 -*-
def createIndex():
    import codecs
    toUtf8 = codecs.getencoder('UTF8')
    # lot of operations & building indexSTR the string that matters
    findex = open('config/index/music_vibration_' + date + '.index', 'a')
    findex.write(codecs.BOM_UTF8)
    findex.write(toUtf8(indexSTR))  # This throws an error!
When I run this script, I encounter the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2171: ordinal not in range(128)
I noticed that after creating this file, I can read it and write it into MySQL, but I face issues with encoding. My MySQL database is set to utf8 (as confirmed by the SQL query SHOW variables LIKE 'char%', which only returns utf8 or binary).
Here’s the MySQL code I’m using:
#!/usr/bin/python
# -*- coding: utf-8 -*-
def saveIndex(index, date):
    import MySQLdb as mdb
    import codecs
    sql = mdb.connect('localhost', 'admin', '*******', 'music_vibration')
    sql.charset = "utf8"
    findex = open('config/index/' + index, 'r')
    lines = findex.readlines()
    for line in lines:
        if line.find('#artiste') != -1:
            artiste = line.split('[:::]')
            artiste = artiste[1].replace('\n', '')
            c = sql.cursor()
            c.execute('SELECT COUNT(id) AS nbr FROM artistes WHERE nom="' + artiste + '"')
            nbr = c.fetchone()
            if nbr[0] == 0:
                c = sql.cursor()
                iArt += 1
                c.execute('INSERT INTO artistes(nom, status, path) VALUES("' + artiste + '", 99, "' + artiste + '/")'.encode('utf8'))
Even though the artiste string is correctly displayed in the file, it is being written incorrectly into the MySQL database. What might be causing this issue?