
EncodingConverter
Source (link to git-repo or to original if based on someone elses unmodified work): Add the source-code for this project on opencode.net
EncodingConverter is a plugin for Amarok that offers a conversion from MP3
ID3v1 tags with non-latin1 data to unicode with ID3v2.
Important: This script is in alpha status. As it makes changes to ID3 tag
information please make a backup of relevant files.
Furthermore only a limited number of languages is supported though more can be
easily added. Please submit encodings and their general language or language
family name (e.g. ISO 8859-1 = west european) to the author.
ID3v1 doesn't specify in which encoding the meta data is given. Many people
believe that only latin1 (ISO 8859-1) should be specified, though a lot of
music is tagged in the encoding used locally and a lot of music players will
interpret ID3v1 meta data as being in latin1 or in a previously specified
encoding set for all system wide files.
With mixing music of different languages from different sources, ID3v1 tags
will be mixed with different encodings, so that a default encoding can't work
for all files.
ID3v2 does allow Unicode, so an easy solution is to just convert ID3v1 tags to
ID3v2. This script will offer the user a method to select a supported encoding
and will even guess the encoding as to try to offer the right encoding as the
first match.
Usage:
Right clicking on the file and choosing "EncodingConverter" and "Convert to
Unicode" will open a window that shows all relevant tag fields and an encoding
chooser. The encoding guesser will try to find a best match for the given
content but if the tag fields do show wrong content and weired characters the
current encoding is wrong and another one needs to be chosen.
The fields actually to be converted can be chosen by ticking the checkboxes. By
default only fields will be selected for conversion where no ID3v2 information
exists, or the ID3v1 and ID3v2 entries are the same. On conversion the ID3v2
tag will be overwritten, the ID3v1 tag stays untouched.
See README for more information.
13 years ago
2007.11.26 (0.1):
initial release, alpha
2007.12.10 (0.2):
new version of ngram.py, includes language/encoding pairs even if Textcat not available
updated documentation
13 years ago
2007.11.26 (0.1):
initial release, alpha
2007.12.10 (0.2):
new version of ngram.py, includes language/encoding pairs even if Textcat not available
updated documentation
fenixk19
13 years ago
Quote:
437a438,439
>
> os.system("touch '"+os.path.dirname(file)+"'");
448a451,452
> os.popen("dcop amarok collection scanCollectionChanges")
>
It should be applied to encodingconverter.py file. Sorry, if something is bad, but i'm newby in python and in patch making.
Report
gecko
13 years ago
how can I convert ID3V2 non UTF-8 Tags to UTF-8?
Report
chrisKA
13 years ago
I know about this problem, but as this shouldn't be found too often I am just to lazy to change this script, which for me isn't more then a quick hack (though I wish to keep the encoding guesser clean).
What you can do is try to copy the ID3V2 tag to ID3V1 and then use this script. This isn't the best solution as ID3V1 tags have less space so it only works for short tags. Try to make a backup of your mp3 though first.
I'm sorry that there isn't any more elegant solution available here. Well, unless you don't mind changing some stuff in the script, which I would warmly welcome.
Report
4k1r4
13 years ago
What `bout russian CP-1251?
Tnx
Report
chrisKA
13 years ago
Report
emnik
13 years ago
The script 'EncodingConverter' exited with error code: 1
Traceback (most recent call last):
File "/home/manos/.kde/share/apps/amarok/scripts/encodingconverter/encodingconverter.py", line 69, in <module>
id3v2.FrameFactory.instance().setDefaultTextEncoding(tagpy.StringType.UTF8)
AttributeError: 'id3v2_FrameFactory' object has no attribute 'setDefaultTextEncoding'
the same was happening with 0.1
If you need any information let me know..
I'm using ubuntu 7.10 (with kde libraries - not the whole kde installed)
Report
drchaos
13 years ago
I'v tried to comment this line in script (python-tagpy 0.91 from ubuntu gusty) with same result.
In console I get this:
kdecore (KAction): WARNING: KAction::insertKAccel( kaccel = 0x80ef60 ): KAccel o bject already contains an action name "play_pause"
QLayout "unnamed" added to QVBox "unnamed", which already has a layout
kdecore (KAction): WARNING: KAction::insertKAccel( kaccel = 0x80ef60 ): KAccel o bject already contains an action name "play_pause"
QLayout: Adding KToolBar/mainToolBar (child of QVBox/unnamed) to layout for Play listWindow/PlaylistWindow
QObject::connect: Incompatible sender/receiver arguments
StarManager::ratingsColorsChanged() --> ContextBrowser::ratingOrScoreOrL abelsChanged(const QString&)
PS Kubuntu 7.10 amd64
Report
chrisKA
13 years ago
Please try to give more detail. For example you can check the debug output in /tmp/encodingconverter.debug if you have debugging turned on (default).
Furthermore you can see the script output in Amarok if you go to the script settings and right click on the script's entry.
Report
drchaos
13 years ago
[encodingconverter] Started.
[encodingconverter] reading settings
[encodingconverter] No config file found, using defaults.
[encodingconverter] config read settings: textcat_LM_path='/usr/share/libtextcat/LM', language_encoding_pref='', keep log file: True
the script output
Traceback (most recent call last):
File "/home/drchaos/.kde/share/apps/amarok/scripts/encodingconverter/encodingconverter.py", line 528, in <module>
main( sys.argv )
File "/home/drchaos/.kde/share/apps/amarok/scripts/encodingconverter/encodingconverter.py", line 524, in main
app = EncodingConverter(sys.argv)
File "/home/drchaos/.kde/share/apps/amarok/scripts/encodingconverter/encodingconverter.py", line 269, in __init__
self.readSettings()
File "/home/drchaos/.kde/share/apps/amarok/scripts/encodingconverter/encodingconverter.py", line 295, in readSettings
language_order=language_order)
File "/home/drchaos/.kde/share/apps/amarok/scripts/encodingconverter/encoding.py", line 80, in __init__
language_order=language_order)
File "/home/drchaos/.kde/share/apps/amarok/scripts/encodingconverter/ngram.py", line 135, in __init__
raise ValueError("no language files found")
ValueError: no language files found
[encodingconverter] Started.
[encodingconverter] reading settings
[encodingconverter] No config file found, using defaults.
[encodingconverter] config read settings: textcat_LM_path='/usr/share/libtextcat/LM', language_encoding_pref='', keep log file: True
Report
chrisKA
13 years ago
Once running, you can see a new entry in the context menu when clicking on a playlist entry.
If you wan't quicker support, please write me an email.
Report
chrisKA
13 years ago
If you want quicker support, please write me an email.
Report
chrisKA
13 years ago
I'm running 0.93-1 here and the function that's missing in your version has been added just lately.
You can alternatively try to disable the one line
id3v2.FrameFactory.instance().setDefaultTextEncoding(tagpy.StringType.UTF8)
that's quite at the beginning of the whole script. This might render some conversions unuseful though.
Report
emnik
13 years ago
Nice work!
Report
chrisKA
13 years ago
But if you open the Amarok "metadata" dialog where you can see the tag info you should be at least able to see the changed data immediately.
If not and you have removed the line as I proposed as an alternate (though bad) solution, then the tags haven't been converted properly.
This script could be made nicer, with some more simple options, but I hope it is useable for the beginning.
Report
chrisKA
13 years ago
defaultEncodings = {'windows1250': ('slovak', ),
'windows1256': ('arabic', ), 'cp874': ('thai', ),
'gb2312': ('chinese', ), 'big5': ('chinese', ), 'gbk': ('chinese', ),
'euc_jp': ('japanese', ), 'shift_jis': ('japanese', ),
'euc_kr': ('korean', ),
'iso8859_1': ('english', 'german', 'french', 'africaans', 'spanish',
'danish', 'swahili', 'finnish', 'portuguese', 'norwegian',
'swedish', 'rumantsch', 'catalan', 'basque', 'latin'),
'iso8859_2': ('bosnian', 'polish'), 'iso8859_6': ('arabic', ),
'iso88597': ('greek', ), 'iso8859_8': ('hebrew', )}
# 'tscii': ('tamil', ) # seems not to be supported in python
Adding your language will mean: a) making its encoding show up in the selector, b) make textcat guess your language.
As 0 people already made a comment your feedback is appreciated. Does this script work for you?
Report