中文字幕批量转码 UTF-8
下面的程序会判断字幕文件的编码,并转换为 UTF-8。比如 GBK,UTF-16 都可以转为 UTF-8。
把下面的 Ruby 文件保存为 ~/convert.rb。安装 charlock_holmes gem。
require "charlock_holmes"
require "pathname"
file = ARGV.first
input_path = Pathname.new(file)
if input_path.file?
if %w[ .ass .ssa .srt ].include?(input_path.extname)
warn input_path
content = IO.read(input_path)
detection = CharlockHolmes::EncodingDetector.detect(content)
if detection[:encoding] and detection[:encoding] != 'UTF-8'
warn "#{detection[:encoding]} -> UTF-8"
convert = CharlockHolmes::Converter.convert(content, detection[:encoding], 'UTF-8')
new_file_name = File.join(input_path.parent, [ input_path.basename(input_path.extname), ".utf8", input_path.extname ].join)
File.write(new_file_name, convert)
File.rename(file, "#{file}.orig")
end
end
end
在 Mac 系统下 ,进入字幕的目录,执行:
find . -exec ruby ~/convert.rb '{}' \;