中文字幕批量转码 UTF-8

下面的程序会判断字幕文件的编码,并转换为 UTF-8。比如 GBK,UTF-16 都可以转为 UTF-8。

把下面的 Ruby 文件保存为 ~/convert.rb。安装 charlock_holmes gem。

require "charlock_holmes"
require "pathname"

file = ARGV.first

input_path = Pathname.new(file)

if input_path.file?
  if %w[ .ass .ssa .srt ].include?(input_path.extname)
    warn input_path
    
    content = IO.read(input_path)
  
    detection = CharlockHolmes::EncodingDetector.detect(content)
    
    if detection[:encoding] and detection[:encoding] != 'UTF-8'
      warn "#{detection[:encoding]} -> UTF-8"
      convert = CharlockHolmes::Converter.convert(content, detection[:encoding], 'UTF-8')
      new_file_name = File.join(input_path.parent, [ input_path.basename(input_path.extname),  ".utf8", input_path.extname ].join)
      File.write(new_file_name, convert)
      
      File.rename(file, "#{file}.orig")
    end
    
  end
end

在 Mac 系统下 ,进入字幕的目录,执行:

find . -exec ruby ~/convert.rb '{}' \;