Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
oss
Site Admin
Posts: 5879
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Convert encoding and format in realtime (WEBVTT support)

Fri Jan 30, 2015 10:29 am

Hi guys,


UPDATE: Check use for your program docs for XMLRPC SearchSubtitles(): http://trac.opensubtitles.org/projects/ ... hSubtitles


with everything moving to online streaming and such, we implemented realtime support:
example URL:
http://www.opensubtitles.org/en/subtitl ... -button-cs

URL of subtitle file:
http://dl.opensubtitles.org/en/download/file/1954602263

Change of character encoding
normally, when you download file, you get same file back, as was uploaded. You dont know the encoding, and that is causing mess. Now you can try to change encoding like this

URL for changing encoding:
http://dl.opensubtitles.org/en/download ... 1954602263

notes:
- we are using php iconv() and sometimes it fails, so dont rely on this. It fails for sure, if we dont know input encoding (can not detect) - check xml-rpc results :)
- you can use as encoding any iconv() supported encoding, list at end, you can pass it in lower case...

Change of subtitle format
- now is just webvtt supported from "srt" format
- vtt is always utf8, so no need to pass subencoding with it...
- we are sending proper http headers...

URL for converting to webvtt:
http://dl.opensubtitles.org/en/download ... 1954602263

If you are interested, it is possible to support realtime shifting subtitles and so on...


iconv encodings:

Code: Select all

ARMSCII-7 AST166-7 AST_34.005 ARMSCII-8 AST166-8 AST_34.002 ARMSCII-8A AST166-A AST_34.002_A ATARIST ATARI BIG5-2003 BIG5-E BIG5E BIG-5 BIG-FIVE BIG5 BIG5-ETEN BIG5ETEN BIGFIVE CN-BIG5 CSBIG5 BIG5-HKSCS BIG5-HKSCS:2004 BIG5HKSCS BIG5-IBM BIG5-PLUS BIG-5+ BIG5+ C99 CP037 037 EBCDIC-CP-CA EBCDIC-CP-NL EBCDIC-CP-US EBCDIC-CP-WT IBM037 CP038 038 EBCDIC-INT IBM038 CP10000 10000 CP10000_MACROMAN CP10006 10006 CP10006_MACGREEK CP10007 10007 CP10007_MACCYRILLIC MS-MAC-CYRILLIC CP10029 10029 CP10029_MACLATIN2 CP1006 1006 MSCP1006 CP10079 10079 CP10079_MACICELANDIC CP10081 10081 CP10081_MACTURKISH CP1026 1026 IBM1026 CP1046 1046 IBM1046 CP1124 1124 IBM1124 CP1125 1125 IBM1125 CP1129 1129 IBM1129 CP1131 1131 IBM1131 CP1133 1133 IBM-CP1133 IBM1133 CP1161 1161 CSIBM1161 IBM-1161 IBM1161 CP1162 1162 CSIBM1162 IBM-1162 IBM1162 MSCP874 WINDOWS-874 CP1163 1163 CSIBM1163 IBM-1163 IBM1163 CP1250 1250 MS-EE MSCP1250 WINDOWS-1250 CP1251 1251 MS-CYRL MSCP1251 WINDOWS-1251 CP1252 1252 MS-ANSI MSCP1252 WINDOWS-1252 CP1253 1253 MS-GREEK MSCP1253 WINDOWS-1253 CP1254 1254 MS-TURK MSCP1254 WINDOWS-1254 CP1255 1255 MS-HEBR MSCP1255 WINDOWS-1255 CP1256 1256 MS-ARAB MSCP1256 WINDOWS-1256 CP1257 1257 MSCP1257 WINBALTRIM WINDOWS-1257 CP1258 1258 MSCP1258 WINDOWS-1258 CP273 273 IBM273 CP274 274 EBCDIC-BE IBM274 CP275 275 EBCDIC-BR IBM275 CP277 277 EBCDIC-CP-DK EBCDIC-CP-NO IBM277 CP278 278 EBCDIC-CP-FI EBCDIC-CP-SE IBM278 CP280 280 EBCDIC-CP-IT IBM280 CP281 281 EBCDIC-JP-E IBM281 CP284 284 EBCDIC-CP-ES IBM284 CP285 285 EBCDIC-CP-GB IBM285 CP290 290 EBCDIC-JP-KANA IBM290 CP297 297 EBCDIC-CP-FR IBM297 CP420 420 EBCDIC-CP-AR1 IBM420 CP423 423 EBCDIC-CP-GR IBM423 CP424 424 EBCDIC-CP-HE IBM424 CP437 437 CSPC8CODEPAGE437 IBM437 CP500 500 EBCDIC-CP-BE EBCDIC-CP-CH IBM500 CP50220 50220 MSCP50220 WINDOWS-50220 CP50221 50221 MSCP50221 WINDOWS-50221 CP50222 50222 MSCP50222 WINDOWS-50222 CP51932 51932 MS51932 MSCP51932 WINDOWS-51932 CP737 737 MSCP737 CP775 775 CSPC775BALTIC MSCP775 CP850 850 CSPC850MULTILINGUAL IBM850 CP851 851 IBM851 CP852 852 CSPC852 IBM852 CP853 853 IBM853 CP855 855 CSIBM855 IBM855 CP856 856 MSCP856 CP857 857 CSIBM857 IBM857 CP858 858 IBM858 CP860 860 CSIBM860 IBM860 CP861 861 CP-IS CSIBM861 IBM861 CP862 CSPC862LATINHEBREW IBM862 CP863 863 CSIBM863 IBM863 CP864 864 CSIBM864 IBM864 CP865 865 CSIBM865 IBM865 CP866 866 CSIBM866 MSCP866 CP868 868 CP-AR IBM868 CP869 869 CP-GR CSIBM869 IBM869 CP870 870 EBCDIC-CP-ROECE EBCDIC-CP-YU IBM870 CP871 871 EBCDIC-CP-IS IBM871 CP874 874 IBM874 WINDOWS-874 CP875 875 MSCP875 CP880 880 EBCDIC-CYRILLIC IBM880 CP891 891 IBM891 CP903 903 IBM903 CP904 904 IBM904 CP905 905 EBCDIC-CP-TR IBM905 CP918 918 EBCDIC-CP-AR2 IBM918 CP922 922 IBM922 CP932 932 CSWINDOWS31J MS932 MSCP932 SHIFT_JIS-MS SJIS-MS SJIS-OPEN SJIS-WIN WINDOWS-31J WINDOWS-932 CP936 936 MSCP936 WINDOWS-936 CP942 942 IBM942 942C CP942C IBM942C CP943 943 IBM943 943C CP943C IBM943C CP949 949 MSCP949 UHC CP950 950 MSCP950 CTEXT DECHANYU DEC-HANYU DEC_HANYU DECMCS DEC-MCS DEC_MCS EBCDIC-AT-DE-A EBCDIC-AT-DE EBCDIC-CA-FR EBCDIC-DK-NO-A EBCDIC-DK-NO EBCDIC-ES-A EBCDIC-ES-S EBCDIC-ES EBCDIC-FI-SE-A EBCDIC-FI-SE EBCDIC-FR EBCDIC-IT EBCDIC-PT EBCDIC-UK EUC-CN CN-GB CSGB3212 EUCCN GB2312 EUC-JIS-2004 EUC-JISX0213 EUC-JP-MS EUCJP-MS EUCJP-OPEN EUCJP-WIN EUCJPMS EUC-JP CSEUCPKDFMTJAPANESE EUCJP IBM-EUCJP EUC-KR CSEUCKR CSKSC56011987 EUCKR ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 EUC-TW CNS11643 CSEUCTW EUCTW GB12345 GB18030 GBK GEORGIAN-ACADEMY-OLDCAPITAL GEORGIAN-ACADEMY GEO8-BPG GEORGIAN-ILIA GEORGIAN-RS GEORGIAN-PS-OLDCAPITAL GEORGIAN-PS GEO8-GOV GEO8STD GEORGIAN-STD HP-ROMAN8 CSHPROMAN8 R8 ROMAN8 HZ HZ-GB-2312 HZ-GB2312 HZ8 ISO-2022-CN-EXT ISO2022-CNEXT ISO-2022-CN CSISO2022CN ISO2022-CN ISO-2022-JP-1 ISO2022-JP1 ISO-2022-JP-2 CSISO2022JP2 ISO2022-JP2 ISO-2022-JP-2004 ISO-2022-JP-3 ISO2022-JP2004 ISO2022-JP3 ISO-2022-JP CSISO2022JP ISO2022-JP ISO-2022-KR CSISO2022KR ISO2022-KR ISO-8859-1 CP819 CSISOLATIN1 IBM819 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN6 ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 ISO-8859-11 ISO-IR-166 ISO8859-11 ISO_8859-11 TIS-620 TIS.2533-1 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 ISO_8859-13:1998 L7 LATIN7 ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8 CP923 IBM923 ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 L9 LATIN9 ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10 ISO-8859-2 CP912 CSISOLATIN2 IBM912 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 ISO-8859-3 CP913 CSISOLATIN3 IBM913 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 ISO-8859-4 CP914 CSISOLATIN4 IBM914 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 ISO-8859-5 CP915 CSISOLATINCYRILLIC CYRILLIC IBM915 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 ISO-8859-6 ARABIC ASMO-708 CP1089 CSISOLATINARABIC ECMA-114 IBM1089 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 ISO-8859-7 CP813 CSISOLATINGREEK ECMA-118 ELOT_928 GREEK GREEK8 IBM813 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 ISO_8859-7:2003 ISO-8859-8 CP916 CSISOLATINHEBREW HEBREW IBM916 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 ISO-8859-9 CP920 CSISOLATIN5 IBM920 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 ISO-IR-165 ISO646-BASIC:1983 ISO_646.BASIC:1983 REF REF ISO646-BASIC@1983 ISO646-BASIC:1983 ISO646-CA CA CSA7-1 CSA_Z243.4-1985-1 ISO-IR-121 CSA7-2 CSA_Z243.4-1985-2 ISO-IR-122 ISO646-CA2 ISO646-CN CN CSISO57GB1988 GB_1988-80 ISO-IR-57 ISO646-CU CUBA ISO-IR-151 NC_NC00-10:81 ISO646-DE DE DIN_66003 ISO-IR-21 ISO646-DK DK DS2089 DS_2089 ISO646-ES ES ISO-IR-17 ES2 ISO-IR-85 ISO646-ES2 ISO646-FR FR ISO-IR-69 NF_Z_62-010 ISO-IR-25 ISO646-FR1 NF_Z_62-010_(1973) ISO646-GB BS_4730 ISO-IR-4 ISO646-HU HU ISO-IR-86 MSZ_7795.3 ISO646-IRV:1983 IRV ISO-IR-2 ISO646-IRV@1983 ISO646-IRV:1983 ISO646-IT ISO-IR-15 IT ISO646-JP-OCR-B ISO-IR-92 JIS_C6229-1984-B JP-OCR-B ISO646-JP CSISO14JISC6220RO ISO-IR-14 JIS_C6220-1969-RO JP ISO646-KR KSC5636 ISO646-NO ISO-IR-60 NO NS_4551-1 ISO-IR-61 ISO646-NO2 NO2 NS_4551-2 ISO646-PT ISO-IR-16 PT ISO-IR-84 ISO646-PT2 PT2 ISO646-SE FI ISO-IR-10 ISO646-FI SE SEN_850200_B ISO-IR-11 ISO646-SE2 SE2 SEN_850200_C ISO646-US 646 ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 CSASCII IBM367 ISO-IR-6 ISO_646.IRV:1991 US US-ASCII ISO646-YU ISO-IR-141 JS JUS_I.B1.002 JAVA JISX0201-KANA CSHALFWIDTHKATAKANA JISX0201 JISX0201-1976 JIS_X0201 X0201 JISX0208:1990 CSISO87JISX0208 ISO-IR-87 JIS0208 JISX0208-1990 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 JIS_X0208:1990 X0208 JISX0208@1990 JISX0208:1990 JOHAB CP1361 KOI7-SWITCHED KOI7 ISO-5427 ISO-IR-37 ISO_5427 KOI-7 KOI8-C KOI8-E ECMA-CYRILLIC ISO-IR-111 KOI8-R KOI8-RU KOI8-T KOI8-U KOI8 CP878 KOI-8 KZ1048 CSKZ1048 KZ-1048 RK1048 STRK1048-2022 MACARABIC MACCELTIC MACCENTEURO MACCENTRALEUROPE MACCROATIAN MACCYRILLIC MAC-CYRILLIC MACUKRAINE MACUKRAINIAN MACDEVANAGA ISCII-DEV MACDEVANAGARI MACDINGBATS MACFARSI MACGAELIC MACGREEK MACGUJARATI MACGURMUKHI MACHEBREW MACICELAND MACINUIT MACKEYBOARD MACROMAN CSMACINTOSH MAC MACINTOSH MACROMANIA MACROMANIAN MACSYMBOL MACTHAI MACTURKISH MULELAO-1 NEXTSTEP PTCP154 CP154 CSPTCP154 CYRILLIC-ASIAN PARATYPE-154 PT-154 PT154 RISCOS-LATIN1 SHIFT_JIS-2004 SHIFT_JIS TCVN5712-1 TCVN TCVN-5712 TCVN-5712-1:1993 VN-1 TDS565 ISO-IR-230 UTF-16-INTERNAL UCS-2-INTERNAL UTF-16-SWAPPED UCS-2-SWAPPED UTF-16 UNICODE UTF16 CSUNICODE CSUNICODE11 ISO-10646-UCS-2 UCS-2 UCS-2BE UNICODE-1-1 UNICODEBIG UTF-16BE UTF16BE UCS-2LE UNICODELITTLE UTF-16LE UTF16LE UTF-32-INTERNAL UCS-4-INTERNAL UTF-32-SWAPPED UCS-4-SWAPPED UTF-32 CSUCS4 ISO-10646-UCS-4 UCS-4 UCS-4BE UTF-32BE UTF32BE UCS-4LE UTF-32LE UTF32LE UTF-7 CSUNICODE11UTF7 UNICODE-1-1-UTF-7 UTF7 UTF-8 UTF8 VIQR VISCII CSVISCII VISCII1.1-1 VSCII ZW

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: Convert encoding and format in realtime (WEBVTT support)

Mon Mar 16, 2015 6:18 pm

Yes, I think it is very interesting.

I've been thinking about it lately. You can find a lot of "problems" or inconveniences that could be solved
with dynamic download generation. That would mean more processing power. I don't know what is your server's load but I think it should be considered. Take into account that you could reduce dramatically your storage needs. Now you can find several versions of the same subtitle with different synchronization or encoding.

Another thing you could handle is "foreign parts only": viewtopic.php?f=1&t=3094

Or the problem with 24fps/25fps versions. You could generate the timings dynamically.

The key is having a canonical format (for example TTML) that you use for storage and generating every download from it dynamically. So you could store more information like who is speaking, a very useful feature for hearing impaired and which I usually miss because SRT does not support it.

I've been doing some research on this subject because sometimes I make more than one subtitle for the same film. It is very likely that the timings are the same for every language, so again you are saving space and avoiding redundancy.

Well, perhaps it's too much dreaming but progress is made of dreams, isn't it?

User avatar
oss
Site Admin
Posts: 5879
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Convert encoding and format in realtime (WEBVTT support)

Mon Apr 20, 2015 7:32 am

well, store subtitles in some meta-format and then creating subtitles on "fly" would be good. Also possible to change FPS (we need to know source FPS), or timing. Maybe one day...

MarcoTC
Subtitles Admin
Posts: 44
Joined: Sat Jun 13, 2015 5:15 am

Re: Convert encoding and format in realtime (WEBVTT support)

Sat Jun 13, 2015 5:52 am

Change of character encoding
normally, when you download file, you get same file back, as was uploaded. You dont know the encoding, and that is causing mess. Now you can try to change encoding like this

URL for changing encoding:
http://dl.opensubtitles.org/en/download ... 1954602263

notes:
- we are using php iconv() and sometimes it fails, so dont rely on this. It fails for sure, if we dont know input encoding (can not detect) - check xml-rpc results :)
[/quote]

Hi,

Wow, this sounds like an awesome help for the encoding troubles and headaches.

I have a few questions:

- Your post is a few months old. Do you have any insight already how well it is working? Is it only a few you cannot detect or the majority?
- Is this going to be implemented in the xmlrpc downloadsubtitles method?
- You said 'check the xml-rpc' results, what do you mean by that? Is this an undocumented API feature or do you return an xml error if it could not convert correctly? If it's the last, do you have an example?

thanks!

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: Convert encoding and format in realtime (WEBVTT support)

Thu Jul 23, 2015 9:36 am

I'm trying to do something useful with this.

But to use the "http://dl.opensubtitles.org/en/download/file/" URL you need to know the file id.
How do I get the file id from subtitle id?

Isn't there a more user friendly way to do this? :)

User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: Convert encoding and format in realtime (WEBVTT support)

Thu Jul 23, 2015 10:49 am

How do I get the file id from subtitle id?
Using the web site:
- If subtitle ID is 1234567, go to http://www.opensubtitles.org/en/subtitles/1234567
- In the appearing page, find the subtitle file name, for example Movie.Name.2016.Bluray.720p.x264.ReLeASe.srt (123456 bytes)
- The underlaying link will be for example http://dl.opensubtitles.org/en/download/file/9876543210
- File ID is 9876543210

I know, a pretty clumsy way.
Maybe Oss can provide something smarter.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
oss
Site Admin
Posts: 5879
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Convert encoding and format in realtime (WEBVTT support)

Thu Jul 23, 2015 11:37 am

use SearchSubtitles() of XMLRPC. Also it is possible to use /xml at the end of url and then find what you need - but thats not correct way.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: Convert encoding and format in realtime (WEBVTT support)

Fri Jul 24, 2015 10:27 am

it works!

But I think I'll keep doing it as always: I'll convert it myself. At least until there is a friendlier way :)

Charles09
Posts: 3
Joined: Tue Jul 28, 2015 8:36 am

Re: Convert encoding and format in realtime (WEBVTT support)

Tue Jul 28, 2015 8:46 am

Hi oss,

first of all thank you for this great feature, works great :D
Is in the future will be function to change the other formats (than SRT) to webvtt ?

User avatar
oss
Site Admin
Posts: 5879
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Convert encoding and format in realtime (WEBVTT support)

Tue Jul 28, 2015 9:29 am

Hi,

yes, this is possible, (from sub for example), but we need to know FPS. If somebody would make/find library for converting formats (srt,sub, vtt etc) would be great.

Charles09
Posts: 3
Joined: Tue Jul 28, 2015 8:36 am

Re: Convert encoding and format in realtime (WEBVTT support)

Tue Jul 28, 2015 5:43 pm

thanks for reply, i found perl script: https://github.com/robelix/sub2srt/blob/master/sub2srt

supported formats:

Code: Select all

if ($format eq "subrip") { conv_subrip(); } elsif ($format eq "microdvd") { conv_microdvd(); } elsif ($format eq "txtsub") { conv_txtsub(); } elsif ($format eq "mpl2") { conv_mpl2(); } elsif ($format eq "tmp") { conv_tmp(); } elsif ($format eq "srt") { print "Input file is already subviewer srt format.\n"; }

User avatar
oss
Site Admin
Posts: 5879
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Convert encoding and format in realtime (WEBVTT support)

Tue Jul 28, 2015 8:05 pm

well I would like to avoid calling perl, but if no other way...I can do it

Charles09
Posts: 3
Joined: Tue Jul 28, 2015 8:36 am

Re: Convert encoding and format in realtime (WEBVTT support)

Thu Jul 30, 2015 2:20 pm

if you want I can rewrite it to php :)

User avatar
oss
Site Admin
Posts: 5879
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Convert encoding and format in realtime (WEBVTT support)

Thu Jul 30, 2015 3:52 pm

well, it would be great to have some PHP lib, which can handle everything. If you will rewrite this, it will help for sure, just make enough tests :) Then I can implement this, no problem.

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 21 guests