mkv2sup - Auto export and determine forced subtitles

Discussion of advanced MakeMKV functionality, expert mode, conversion profiles
Post Reply
mgutt
Posts: 141
Joined: Sun May 05, 2019 6:38 pm

mkv2sup - Auto export and determine forced subtitles

Post by mgutt »

I finished this "little" bash script to automatically export all subtitles of all MKVs and determine the forced subtitles of them. Forced subtitles are determined by their filesize (<3MB works for 99% of my Blu-Ray rips) or if the subtitle track name already has the track name "Forced". For example if you have named them through MakeMKV:
2019-11-19 22_43_17.jpg
2019-11-19 22_43_17.jpg (48.49 KiB) Viewed 22661 times

Everytime the script is executed it processes the next MKV file. So you need to restart it by cronjob or task schedulers. You can copy & paste this script directly into the command line as well.

P.S. Please vote for auto-naming of forced subtitles.

Settings
- movies_path: The path to your movie collection
- docker_config_path: Compare with your docker configuration
- sub_langs: "all" exports all subtitle languages or "ger,eng,tur" exports these specific languages
- sub_forced_max_size maybe you want to reduce the subtitle filesize for TV shows episodes

Requirements
- Linux
- Docker
- MKVToolNix Container
- Movie collection in sub folders like "/volume1/movies/Ben Hur (1959)/Ben Hur (1959).mkv"

Script

Code: Select all

#!/bin/bash
# #####################################
# mkv2sup v0.5
# 
# Notes:
# mkv2sup automatically exports all subtitles of all MKV files in a specific folder.
# After that it determines the forced subtitles. MKVs without (compatible) subtitles will be skipped.
# It works only with MKVs in subfolders like /volume1/movies/Ben Hur/Ben Hur.mkv
# 
# Changelog:
# 0.5
# - "bin/bash" added to the head of the script to force the usage of the correct interpreter
# 0.4
# - check mkv modification filetime to ensure its not currently written through an other app
# 0.3
# - Docker is now optional
# - No exported SUP will be deleted if the new option <preserve_sup_files> is enabled
# - Check if mkvtoolnix docker container is in use by other bash script before killing it
# 0.2
# - Bug fix: Changed some exit status codes
# - Bug fix: Some file names containing dots were cut
# - Bug fix: Now all <sub_langs> of forced subtitles are renamed and not only those with <default_lang>
# - Bug fix: Named SUP files that were only skipped, will now be deleted, too
# 0.1
# - first release
# 
# Todo:
# - update subtitle track names in mkv file
# - set forced track as default in mkv file
# - how to solve doubles (two forced subtitle tracks in the same language, at the moment those will be tried to be renamed, but this fails as there already exists one)
# - add support for DVDs S_VOBSUB subtitles
# - determine all SRT subtitles (Regular, SDH, etc.) by using word/char matching
# - while writing SUP files check if one contains the word "Forced" and is <default_lang>
# - skip MKV files that are currently written
# #####################################
# 
# ######### Settings ##################
movies_path="/volume1/Movies/"
docker_config_path="/volume1/docker"
sub_langs="ger,eng,tur" # Use "all" to preserve all subtitle languages. Note: The first language is set as default.
sub_forced_max_size="3MB"
preserve_sup_files=true
# #####################################
# 
# ######### Script ####################
# check user settings
movies_path=$([[ "${movies_path: -1}" == "/" ]] && echo "${movies_path%?}" || echo "$movies_path")
docker_config_path=$([[ "${docker_config_path: -1}" != "/" ]] && echo "${docker_config_path}/" || echo "$docker_config_path")
default_lang="${sub_langs:0:3}" # first language is used as default language
sub_forced_max_size="${sub_forced_max_size//[!0-9.]/}" # float filtering (https://stackoverflow.com/a/19724571/318765)
sub_forced_max_size=$(awk "BEGIN { print $sub_forced_max_size*1000000}") # convert MB to Bytes
mkv_path=""
function exitus() {
    exit_status=$1
    # check if container exists
    if [[ -x "$(command -v docker)" ]] && [[ "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then
        # stop container only if its not in use (by other shell script)
        mkvtoolnix_cpu_usage="$(docker stats mkvtoolnix_mkv2sub --no-stream --format "{{.CPUPerc}}")"
        # if [[ ${mkvtoolnix_cpu_usage%.*} -lt 1 ]]; then
            # we do not stop the container as our script is not race-condition safe!
            # echo "Stop mkvtoolnix container"
            # docker stop mkvtoolnix_mkv2sub
            # docker rm mkvtoolnix_mkv2sub
        # fi
    fi
    exit $exit_status
}
function mkv_getinfo() {
    # check if mkvtoolnix exists
    if [[ -x "$(command -v mkvmerge)" ]]; then
        echo "mkvtoolnix will be used to fetch tracks information"
        mkv_info="$(mkvmerge -J "$mkv_path")"
        return "$mkv_info"
    # check if docker exists
    elif [[ -x "$(command -v docker)" ]]; then
        echo "Docker will be used to fetch tracks information"
        # check if mkvtoolnix container exists
        if [[ ! "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then # https://stackoverflow.com/a/38576401/318765
            # check for blocking container
            if [[ "$(docker ps -aq -f status=exited -f name=mkvtoolnix_mkv2sub)" ]]; then
                docker rm mkvtoolnix_mkv2sub
            fi
            echo "mkvtoolnix container needs to be started"
            # start mkvtoolnix container
            docker_options=(
                run -d
                --name=mkvtoolnix_mkv2sub
                -e TZ=Europe/Berlin
                -v "${docker_config_path}mkvtoolnix_mkv2sub:/config:rw"
                -v "${movies_path}:/storage:rw"
                jlesage/mkvtoolnix
            )
            echo "docker ${docker_options[@]}"
            docker "${docker_options[@]}"
        fi
        mkv_info="$(docker exec mkvtoolnix_mkv2sub /usr/bin/mkvmerge -J "$docker_mkv_path")"
        return
    fi
    echo "mkvtoolnix and docker do not exist!"
    exitus 1
}
function mkv_extract() {
    # check if mkvtoolnix exists
    if [[ -x "$(command -v mkvmerge)" ]]; then
        echo "mkvtoolnix will be used to extract tracks"
        mkv_info="$(mkvextract "$mkv_path" "${mkvextract_options[@]}")"
        return
    # check if docker exists
    elif [[ -x "$(command -v docker)" ]]; then
        echo "mkvtoolnix@docker will be used to extract tracks"
        # check if mkvtoolnix container exists
        if [[ ! "$(docker ps -q -f name=mkvtoolnix_mkv2sub)" ]]; then # https://stackoverflow.com/a/38576401/318765
            # check for blocking container
            if [[ ! "$(docker ps -aq -f status=exited -f name=mkvtoolnix_mkv2sub)" ]]; then
                docker rm mkvtoolnix_mkv2sub
            fi
            echo "mkvtoolnixcontainer needs to be started"
            # start mkvtoolnix container
            docker_options=(
                run -d
                --name=mkvtoolnix_mkv2sub
                -e TZ=Europe/Berlin
                -v "${docker_config_path}mkvtoolnix_mkv2sub:/config:rw"
                -v "${movies_path}:/storage:rw"
                jlesage/mkvtoolnix
            )
            echo "docker ${docker_options[@]}"
            docker "${docker_options[@]}"
        fi
        echo "docker exec mkvtoolnix_mkv2sub /usr/bin/mkvextract $docker_mkv_path ${mkvextract_options[@]}"
        docker exec mkvtoolnix_mkv2sub /usr/bin/mkvextract "$docker_mkv_path" "${mkvextract_options[@]}"
        return
    fi
    echo "mkvtoolnix and docker do not exist!"
    exitus 1
}
# get next mkv file
shopt -s nullglob # avoid empty directory errors (https://unix.stackexchange.com/questions/56051/avoiding-errors-due-to-unexpanded-asterisk)
for movie_path in "$movies_path"/*; do
    mkv_folder="$(basename "$movie_path")"
    echo "Parsing '$mkv_folder'..."
    for mkv_path in "$movie_path"/*.mkv; do
        mkv_basename=$(basename "$mkv_path")
        mkv_filename=${mkv_basename%.*}
        file_time=$(stat -c %Y "$mkv_path") # file modification time
        file_time=$(($file_time+120)) # the last modification of the file should be a few time ago
        current_time=$(date +%s) # actual timestamp
        if [[ $file_time -gt $current_time ]]; then
            continue
        fi
        for sup_path in "$movie_path"/*.sup; do
            sup_basename=$(basename "$sup_path")
            if [[ $sup_basename == *"$mkv_filename."* ]]; then
                # skip this mkv file because its sup subtitle has been found
                continue 2
            fi
        done
        for srt_path in "$movie_path"/*.srt; do
            srt_basename=$(basename "$srt_path")
            if [[ $srt_basename == *"$mkv_filename."* ]]; then
                # skip this mkv file because its srt subtitle has been found
                continue 2
            fi
        done
        echo "'$mkv_path' has been found."
        # we found an mkv file without subtitle files
        docker_mkv_path="/storage/${mkv_folder}/${mkv_basename}"
        break 2;
    done
done
shopt -u nullglob # its important to reset this setting (https://unix.stackexchange.com/questions/534858/why-does-shopt-s-nullglob-remove-a-string-with-question-mark-in-an-array-elemen)
# no mkv file found
if [[ -z $docker_mkv_path ]]; then
    echo "No mkv files found or all subtitles have been exported!"
    exitus 0
fi
mkv_getinfo # uses $mkv_path, fills $mkv_info
if [[ -z $mkv_info ]]; then
    echo "Error while fetching tracks information with mkvmerge"
    exitus 1
fi
echo "Informations of all tracks have been obtained."
# parse info
sub_track_ids=(); track_langs=(); track_names=(); track_codec_ids=();
while read -r line ; do
    echo $line
    # Note: we did not use "jq -r" to parse JSON as it needs installation
    track_codec_name=$(echo $line | grep -oP '^.*?(?=\")')
    track_id=$(echo $line | grep -oP '(?<="id": )[0-9]+')
    track_bits=$(echo $line | grep -oP '(?<="audio_bits_per_sample": )[0-9]+')
    track_channels=$(echo $line | grep -oP '(?<="audio_channels": )[0-9]+')
    track_codec_id=$(echo $line | grep -oP '(?<="codec_id": ").*?[^\\](?=\",)')
    track_lang=$(echo $line | grep -oP '(?<="language": ")[a-z]+')
    track_name=$(echo $line | grep -oP '(?<="track_name": ").*?[^\\](?=\",)') # most flexible way of getting a JSON value (https://stackoverflow.com/a/6852427/318765)
    track_default=$(echo $line | grep -oP '(?<="default_track": )(true|false)')
    track_forced=$(echo $line | grep -oP '(?<="forced_track": )(true|false)')
    track_type=$(echo $line | grep -oP '(?<=")[a-z]+$')
    # collect track langs
    if [[ -n $track_lang ]]; then
        track_langs[$track_id]=$track_lang
    else
        track_langs[$track_id]='und' # und = undetermined
    fi
    # collect track names
    if [[ -n $track_name ]]; then
        track_names[$track_id]=$track_name
    else
        track_names[$track_id]='und'
    fi
    # collect codec ids
    if [[ -n $track_codec_id ]]; then
        track_codec_ids[$track_id]=$track_codec_id
    else
        track_codec_ids[$track_id]='und'
    fi
    # collect subtitles in prefered languages
    if [[ $track_type == "subtitles" ]] && [[ $track_codec_id == "S_HDMV/PGS" ]]; then
        if [[ $sub_langs == "all" ]] || [[ $sub_langs == *"$track_lang"* ]]; then
            sub_track_ids+=("$track_id")
        fi
    fi
done < <(echo "$mkv_info" | 
        tr -d '\n' | # we need to remove line breaks with "tr" to force grep to return one-liners
        grep -oP '(?<=codec": ").*?"type": "[a-z]+') # Regex is faster than looping through all lines
# create empty sup file if mkv file does not contain any subtitles (by that it will be skipped in next turn)
if [[ ${#sub_track_ids[@]} -eq 0 ]];then
    empty_sup_filename="${movies_path}/${mkv_folder}/${mkv_filename}.nosubs.sup"
    echo "The empty SUP file '${empty_sup_filename}' will be created to skip MKV file '${mkv_path}' in the next turn as it does not contain any (compatible) subtitles."
    touch "$empty_sup_filename"
    exitus 0
fi
# build mkvextract export parameter
mkvextract_options=(tracks)
for track_id in "${sub_track_ids[@]}"; do
    # file naming scheme "Movie_Name.[Language_Code].forced.ext" adopted from Plex (https://support.plex.tv/articles/200471133-adding-local-subtitles-to-your-media/#toc-3)
    mkvextract_options+=("${track_id}:/storage/${mkv_folder}/${mkv_filename}.track${track_id}.${track_langs[$track_id]}.${track_names[$track_id]}.sup")
done
# export all subtitles
mkv_extract # uses mkv_path, docker_mkv_path, mkvextract_options, movies_path
echo "Successfully extracted all subtitles"
# determine forced subtitle
shopt -s nullglob
shopt -s nocasematch # insensitive string comparison (https://stackoverflow.com/a/14138301/318765)
forced_found=false
for sup_path in "${movie_path}"/*.sup; do
    # get path parts
    sup_dirname=$(dirname "$sup_path")
    sup_basename=$(basename "$sup_path")
    sup_filename=${sup_basename%.*.*.*.*} # (filename).track[0-9].<lang>.<name>.sup
    sup_extension=${sup_basename/#"$sup_filename"./} # filename.(track[0-9].<lang>.<name>.sup)
    # fetch track data through filename
    IFS='.' # set internal field separator to dot (default is whitespace)
    read -ra track_data <<< "$sup_extension" # explode to array (https://stackoverflow.com/a/918931/318765)
    unset IFS; # unset internal field separator
    track_id=${track_data[0]}
    track_id=${track_id/track/} # remove the word "track"
    track_lang=${track_data[1]}
    track_name=${track_data[2]}
    # skip SUP files with wrong naming scheme
    if [[ -n ${track_id//[0-9]/} ]]; then
        echo "'$track_id' is not a track id"
        continue
    fi
    if [[ ${#track_lang} -lt 2 ]] || [[ ${#track_lang} -gt 3 ]] || [[ -n "${track_lang//[a-zA-Z]/}" ]]; then
        echo "'$track_lang' is not a track lang"
        continue
    fi
    if [[ -n ${track_name//[a-zA-Z \']/} ]]; then
        echo "'$track_name' is not a track name"
        continue
    fi
    # set Plex compatible filename (https://support.plex.tv/articles/200471133-adding-local-subtitles-to-your-media/#toc-3)
    sup_filename_new="${sup_dirname}/${sup_filename}.${track_lang}.forced.sup"
    # determine by track name
    if [[ $track_name == "forced" ]] && [[ $sub_langs == *"$track_lang"* ]] || [[ $default_lang == "all" ]]; then
        forced_found=true
        mv "$sup_path" "$sup_filename_new"
        echo "'$sup_path' has been renamed to '$sup_filename_new'"
        continue
    fi
    # skip subtitle tracks that already have names like "Regular", "SDH", etc.)
    if [[ $track_name != "und" ]];then
        if [[ $preserve_sup_files == "false" ]]; then
            rm -rf "$sup_path"
            echo "'$sup_path' has been deleted"
        fi
        continue
    fi
    # determine by filesize
    filesize=$(stat -c%s "$sup_path")
    if [ $sub_forced_max_size -ge $filesize ]; then
        forced_found=true
        echo "'$sup_path' is small enough to be a forced subtitle"
        mv "$sup_path" "$sup_filename_new"
        # cp --backup "$sup_path" "$sup_filename_new"
        echo "'$sup_path' has been renamed to '$sup_filename_new'"
        continue
    fi
    # delete all other exported subtitles
    if [[ $preserve_sup_files == "false" ]]; then
        rm -rf "$sup_path"
        echo "'$sup_path' has been deleted"
    fi
done
shopt -u nocasematch
shopt -u nullglob
# create empty sup file if mkv does not contain at least one forced subtitle (by that it will be skipped in next turn)
if [[ $preserve_sup_files != "true" ]] && [[ $forced_found != "true" ]]; then
    empty_sup_filename="${movies_path}/${mkv_folder}/${mkv_filename}.noforced.sup"
    echo "The empty SUP file '${empty_sup_filename}' has been created to skip MKV file '${mkv_path}' in the next turn as it does not contain forced subtitles."
    touch "$empty_sup_filename"
    exitus 0
fi
exitus 0
Last edited by mgutt on Tue Jun 02, 2020 9:31 am, edited 10 times in total.
mgutt
Posts: 141
Joined: Sun May 05, 2019 6:38 pm

Re: mkv2sup - Auto export and determine forced subtitles

Post by mgutt »

mkv2sup exports all subtitles:
2019-11-19 22_01_30.jpg
2019-11-19 22_01_30.jpg (43.86 KiB) Viewed 22660 times

then determines the forced subtitles:
2019-11-19 22_01_40.jpg
2019-11-19 22_01_40.jpg (22.52 KiB) Viewed 22660 times
Last edited by mgutt on Tue Nov 19, 2019 11:06 pm, edited 1 time in total.
mgutt
Posts: 141
Joined: Sun May 05, 2019 6:38 pm

Re: mkv2sup - Auto export and determine forced subtitles

Post by mgutt »

Finally you can drag & drop the SUP files into Subtitle Edit and convert them to SRT:
2019-11-19 23_33_13.jpg
2019-11-19 23_33_13.jpg (97.14 KiB) Viewed 22660 times

The SUP file naming is based on Plex:
https://support.plex.tv/articles/200471 ... dia/#toc-3
Movies/Movie_Name (Release Date).[Language_Code].forced.ext
mgutt
Posts: 141
Joined: Sun May 05, 2019 6:38 pm

Re: mkv2sup - Auto export and determine forced subtitles

Post by mgutt »

Multiple MKVs per movie are supported as well:

While exporting all subtitles:
2019-11-20 00_30_26.jpg
2019-11-20 00_30_26.jpg (49.16 KiB) Viewed 22647 times
Result:
2019-11-20 00_32_49.jpg
2019-11-20 00_32_49.jpg (20.59 KiB) Viewed 22647 times

Because of that it supports TV shows as well, but only if your episodes are not located in season subfolders as mkv2sub does not crawl those sub-subfolders (not yet, maybe in a later release). So this will be processed:
"V:\Serien\Gilmore Girls (2000)\s01e01 Alles auf Anfang.mkv"
"V:\Serien\Gilmore Girls (2000)\s01e02 Ein klassischer Fehlstart.mkv"
"V:\Serien\Gilmore Girls (2000)\s01e03 Familie mit Handicap.mkv"
...
"V:\Serien\Gilmore Girls (2000)\s02e05 Ein schwerer Fall.mkv"
"V:\Serien\Gilmore Girls (2000)\s02e01 Der Antrag.mkv"
"V:\Serien\Gilmore Girls (2000)\s02e02 Nicht ohne meine Mutter.mkv"
But these not:
"V:\Serien\Gilmore Girls (2000)\Staffel 1\s01e01 Alles auf Anfang.mkv"
"V:\Serien\Gilmore Girls (2000)\Staffel 1\s01e02 Ein klassischer Fehlstart.mkv"
"V:\Serien\Gilmore Girls (2000)\Staffel 1\s01e03 Familie mit Handicap.mkv"
...
"V:\Serien\Gilmore Girls (2000)\Staffel 2\s02e05 Ein schwerer Fall.mkv"
"V:\Serien\Gilmore Girls (2000)\Staffel 2\s02e01 Der Antrag.mkv"
"V:\Serien\Gilmore Girls (2000)\Staffel 2\s02e02 Nicht ohne meine Mutter.mkv"
mgutt
Posts: 141
Joined: Sun May 05, 2019 6:38 pm

Re: mkv2sup - Auto export and determine forced subtitles

Post by mgutt »

Version 0.2 has been released:

Code: Select all

# 0.2
# - Bug fix: Changed some exit status codes
# - Bug fix: Some file names containing dots were cut
# - Bug fix: Now all <sub_langs> of forced subtitles are renamed and not only those with <default_lang>
# - Bug fix: Named SUP files that were only skipped, will now be deleted, too
Some new To do's:
# - use mkvtoolnix instead of docker container (if installed)
# - add support for DVDs S_VOBSUB subtitles
# - determine all SRT subtitles (Regular, SDH, etc.) by using word/char matching
The last idea could be realized through a separate Bash script. I'm not sure at the moment.
mgutt
Posts: 141
Joined: Sun May 05, 2019 6:38 pm

Re: mkv2sup - Auto export and determine forced subtitles

Post by mgutt »

Version 0.5 has been released. Updates since 0.2:

Code: Select all

# 0.5
# - "bin/bash" added to the head of the script to force the usage of the correct interpreter
# 0.4
# - check mkv modification filetime to ensure its not currently written through an other app
# 0.3
# - Docker is now optional
# - No exported SUP will be deleted if the new option <preserve_sup_files> is enabled
# - Check if mkvtoolnix docker container is in use by other bash script before killing it
Post Reply