Batch / Command Line Conversion of SUP to SRT

MKV playback, recompression, remuxing, codec packs, players, howtos, etc.
Post Reply
mgutt
Posts: 125
Joined: Sun May 05, 2019 6:38 pm

Batch / Command Line Conversion of SUP to SRT

Post by mgutt » Sat Nov 23, 2019 12:18 am

Sometimes its really funny. I'm rippping several weeks now. In this time I tried to find out if there is any app out there being able to batch convert multiple graphical VobSUB (DVD) and PGS SUP (Blu-Ray) subtitles to SRT (text). It seemed that there is no solution out there and even the most popular OCR tool Subtitle Edit states in its FAQ:
Subtitle Edit can also convert subtitles via command line (text formats only)
So I'm opening day by day every single MKV through Subtitle Edit and tried to find the forced subtitle, manually starting the OCR and setting the correct SRT filename. This cost me so much time.

And today I tried in the GUI under "Tools" the "Batch convert..." feature and wait... what...
2019-11-22 23_50_53.jpg
2019-11-22 23_50_53.jpg (81.07 KiB) Viewed 6010 times

Subtitle Edit is able to automatically convert/OCR'ing a complete folder of SUP subtitles recursively with one click :shock:

Did you know that? In combination with the automatic forced SUP export this saves me so much time! I mean, I'm doing nothing now. 8)

I never thought this would work as there is no batch option displayed that is usually part of the manual OCR Tool like the language setting, the engine mode, word guessing, etc. I mean which setting uses Subtitle Edit in this mode and why the hell is nobody mentioning this feature?! Crazy.

yorick
Posts: 16
Joined: Sun Nov 04, 2018 12:38 pm

Re: Batch / Command Line Conversion of SUP to SRT

Post by yorick » Wed Jan 15, 2020 1:00 pm

Neat!

How many of those .srt files do you need to manually correct after?

I’ve found that for very small forced subtitles without any “unusual” words, Subtitle Edit is spot on. But throw in some place names or given names and it gets confused. Likewise, it likes to add spaces before punctuation.

It’s a neat feature, and, you’re likely to have to go back and edit these srt files to clean up the errors. And because you batch converted, you don’t have color coding now to see where subtitle edit guessed, and what it guessed at. Which might make the corrections more time consuming.

Grauhaar
Posts: 531
Joined: Thu Sep 15, 2016 3:46 pm

Re: Batch / Command Line Conversion of SUP to SRT

Post by Grauhaar » Wed Jan 15, 2020 2:42 pm

The only tool which can extract forced flagged subtitles from an track is "BD sup2sub" (works for Blu-ray and DVD). Extract the subtitle file (from tthe DVD rip) and load it into "BD sup2sub". look at the message which tells you whow many forced flagged subtitles are in the track. If there are any, use the save and now an option to save only forced flagged subtitles is shown, specify the file name and you have it. Easy task. Can also be used with the the bacth interface to automate these steps.
An other intresting way to find out which is the forced track use MediaInfo on the MakeMKV created track. If processed by Handbrake, the Elementcount ist lost, but remuxxing with MKVToolNix brings it back. The listing from MediaInfo shows (in JSON format display, mayve other works also) and Elementcount for each subtitle track. For DVD tracks this is in most cases the exact number of subtitles, for Blu-tray this should be divided by 2 to have a approx number of subtitle. I think the Elementcount is the number of used blocks in the mkv fie. Due to that Blu-ray subtitles are larger (resolution and the 256 entries for color mapping) one subtitle uses two blocks. I found out that divide by 2 is very realistic for blu-rays.
Good Luck :)
_____________________________________________________________
Useful MakeMKV links: FAQs - Debug Log - Buy - Expiration of beta key
Two Blu-ray (UHD) Drives LG LG BH16NS55 with Libredrive Firmware 1.04

mgutt
Posts: 125
Joined: Sun May 05, 2019 6:38 pm

Re: Batch / Command Line Conversion of SUP to SRT

Post by mgutt » Wed Jan 15, 2020 3:13 pm

yorick wrote:
Wed Jan 15, 2020 1:00 pm
How many of those .srt files do you need to manually correct after?

I’ve found that for very small forced subtitles without any “unusual” words, Subtitle Edit is spot on. But throw in some place names or given names and it gets confused. Likewise, it likes to add spaces before punctuation.
Which language and which engine are you using? Do not forget: You need to install the recent tesseract version and select the correct language by OCR'ing one file manually. After that the batch tool re-uses this setting. You can not select the language or engine through the batch tool. This is something like a bug:
https://github.com/SubtitleEdit/subtitl ... -561696725

I OCR german texts and I need to correct maybe 1 out of 20 forced subtitles with really small typos for example it recognizes sometimes a combination of letters like "IVI" as the letter "M".
Last edited by mgutt on Wed Jan 15, 2020 3:38 pm, edited 1 time in total.

mgutt
Posts: 125
Joined: Sun May 05, 2019 6:38 pm

Re: Batch / Command Line Conversion of SUP to SRT

Post by mgutt » Wed Jan 15, 2020 3:37 pm

Grauhaar wrote:
Wed Jan 15, 2020 2:42 pm
.
The listing from MediaInfo shows (in JSON format display, mayve other works also) and Elementcount for each subtitle track. For DVD tracks this is in most cases the exact number of subtitles, for Blu-tray this should be divided by 2 to have a approx number of subtitle.
Ok, but this looks to be more complicate as you need to build a script that exports Mediainfo and parse the results. With mkvextract and the filesize it's and easy task. It takes the same time to export all sup files from the MKV as opening a MKV with SubtitleEdit. So it's "export all" with MKVextract and open only those SUP files that are smaller than 3MB. That's how my batch script works:
viewtopic.php?f=10&t=20931

Grauhaar
Posts: 531
Joined: Thu Sep 15, 2016 3:46 pm

Re: Batch / Command Line Conversion of SUP to SRT

Post by Grauhaar » Wed Jan 15, 2020 5:13 pm

For me the question is, for what are srt subtitles needed (with corrections) if the VOB sub version can be created and used.

And in many cases, there is no extra subtitle track for forced flagged subtitles on DVDs, so the forced flagged subtitles must be extracted (if any) from the main track. This can be done with BD sup2sub without loosing quality and without using an subtitle editor and OCR detection with some errors in it.

But, it's your solution
Good Luck :)
_____________________________________________________________
Useful MakeMKV links: FAQs - Debug Log - Buy - Expiration of beta key
Two Blu-ray (UHD) Drives LG LG BH16NS55 with Libredrive Firmware 1.04

mgutt
Posts: 125
Joined: Sun May 05, 2019 6:38 pm

Re: Batch / Command Line Conversion of SUP to SRT

Post by mgutt » Wed Jan 15, 2020 6:30 pm

I need to OCR the subtitles to SRT to avoid Emby/Plex transcoding while it burns in the subtitles. Not many clients are able to directly display graphical subtitles. For example my Samsung TV does not. With Kodi it's easier as it is part of the application to display an additional layer on top of the movie. But Kodi needs own hardware for the client and lacks many of the features that are part of Emby/Plex (remote play, offline play, offline transcoding, easy Multi-User, easy Age restrictions, etc...).

But finally you have to check the subtitles in both situations as Kodi needs a "required flag" to be able to autoplay the forced subtitle so it's in comparison not really a difference.

Post Reply