Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

new XML subtitle application

Wed Jul 01, 2015 1:02 am

I've been developing lately my own XML subtitle application or language which I call SubtitleML or STML.

This is not about XML-RPC or OpenSubtitles API but I thought it fits better here than in general section.

Why? Because I needed (or wanted) something more than SRT format could offer. I was reading about TTML: http://www.w3.org/TR/ttaf1-dfxp/

Why not use something that is already done and complete? It lacks some functionality I wanted: the possibility to store several languages in the same file. If you think about it, the timings will usually be the same for every language. Besides, I'm relatively new to XML and I didn't see the way to extend or reuse TTML for my purposes. It's not that it can't be done. It possibly can.

The language is now almost ready. I have a working XML Schema of the application and a XSL transformation to convert STML to SRT. The opposite way, converting SRT to STML I've been doing with Emacs macros and some Lisp code. Not very elegant but it works. Now it would be great to have the same for SSA format. I'm working on it.

Basically what I'm asking for is some ideas and suggestions to enhance the language and at the same time I'm offering my work to anyone who could be interested. You should know XML, Schema, XSLT and XPath.

The next step is to develop some application using the language. I hope that will come.
Last edited by hector on Fri Jul 24, 2015 12:49 pm, edited 1 time in total.

User avatar
oss
Site Admin
Posts: 5885
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: new XML subtitle application

Wed Jul 01, 2015 8:33 am

It is not better to user some STANDARD format ?

What will be use-case for what you develop except your programs ?

I know XML/XSLT and so on (all OS is working on that)

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Wed Jul 01, 2015 10:14 am

Yes, it is better. That's why I tried TTML. But as I said, I didn't know how to extend it or use it for my purposes.
Besides, XML is standard enough. The situation was much worse with binary formats.

With this format I write a master file with all the translations, usually 3 at most. And then I can generate automatically the SRT for each language. With 3 or 4 variables you can customise the output. Those would be:

- lang: language to generate
- hi: normal or hearing-impaired (it includes environment sounds and all that stuff)
- lyrics: include soundtrack lyrics

and anything you can imagine. All those personal preference questions: viewtopic.php?f=1&t=12599&start=30 can be chosen setting the corresponding variable.

Perhaps I'm reinventing the wheel. I know XML is designed with modularity as one of the key design principles. But I just didn't know how to extend TTML. What to do? Get the TTML Schema and modify it? Or use the "import" element in my Schema?

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Wed Jul 01, 2015 10:18 am

I didn't conceive it as an interchange format. I mean, something to be used by media players. I rather think of it as an storage and authoring format.

User avatar
oss
Site Admin
Posts: 5885
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: new XML subtitle application

Thu Jul 02, 2015 9:14 am

I am not sure, in what language is your program, but now I would prefer JSON over XML. But true is XSLT helps here a lot.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Thu Jul 02, 2015 12:14 pm

You can call me "crazy" but I don't have a program yet. Only the language. I write it as text with my text editor.

Here is an extract of the XML Schema. This would be the document element:

Code: Select all

<xs:element name="subtitles"> <xs:complexType> <xs:sequence> <xs:element name="head" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="metadata" type="stml:metadata.type" minOccurs="0"/> <xs:element ref="ttm:agent" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="body"> <xs:complexType> <xs:choice maxOccurs="unbounded"> <xs:element ref="stml:div"/> <xs:element ref="stml:st"/> </xs:choice> <xs:attribute ref="xml:space"/> <xs:attributeGroup ref="stml:audiolang.attrib.class"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
Where possible I tried to follow TTML. Sometimes I use some attributes and elements taken from it. For example, "ttm:agent". One of the issues I had trouble with was choosing between local and global elements.

Basically, it has a "head" with metadata, like TTML and a body with the subtitles. Instead of "p" element from TTML I used "st" which carries the timings. Inside "st" you can have several translations which would be the "sst" elements (not shown here) with an "xml:lang" attribute.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Fri Jul 03, 2015 10:01 am

Perhaps I should have started with this. So you can get a general idea of what I mean.
This is how a typical STML file looks like:

Code: Select all

<subtitles> <head> <video> <type>feature-film</type> <title>28 days</title> </video> ⋮ </head> <body> <st begin="00:03:49,400" end="00:03:51,800" agent="g" off-screen="yes"> <sst xml:lang="en"> Thank God for bar cars. </sst> <sst xml:lang="es"> Qué suerte que haya coches-bar. </sst> <sst xml:lang="de"> Gott sei dank gibt es Barwagen. </sst> </st> <st begin="00:03:54,000" end="00:04:06,000" ttm:role="x-environment"> <sst xml:lang="en"> Church bell </sst> <sst xml:lang="es"> Campana de la iglesia </sst> <sst xml:lang="de"> Kirchenglocke </sst> </st> <st begin="00:04:00,500" end="00:04:02,200"> <sst xml:lang="en"> You're late! </sst> <sst xml:lang="es"> ¡Llegas tarde! </sst> <sst xml:lang="de"> Du kommst zu spät! </sst> </st> <st begin="00:04:02,500" end="00:04:04,300" agent="g"> <sst xml:lang="en"> Jasper, this is… </sst> <sst xml:lang="es"> Jasper, esta es… </sst> <sst xml:lang="de"> Jasper, das ist… </sst> </st> <st begin="00:04:05,200" end="00:04:07,800" agent="g"> <sst xml:lang="en"> a bridesmaid<br/> with makeup and a dress. </sst> <sst xml:lang="es"> una dama de honor<br/> maquillada y vestida. </sst> </st> </body>

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Wed Jul 22, 2015 5:05 pm

Now I can generate both SRT and SSA.

My main concern now is the language itself. I think I've got some design flaws. For example, to denote a dialog I do it this way:

Code: Select all

<st begin="00:00:01.000" end="00:00:05.000"> <sst xml:lang="en"> <span agent="george">What time is it?</span> <span agent="mary">I don't know</span> </sst> </st>
This would produce the following SRT:
00:00:01,000 --> 00:00:05,000
- What time is it?
- I don't know.

The problem comes up when there are several languages. I have to repeat the "agent" in every instance of "sst"

Code: Select all

<st begin="00:00:01.000" end="00:00:05.000"> <sst xml:lang="en"> <span agent="george">What time is it?</span> <span agent="mary">I don't know</span> </sst> <sst xml:lang="es"> <span agent="george">¿Qué hora es?</span> <span agent="mary">No lo sé.</span> </sst> </st>
I'm not quite sure but I scent this is not a good design.

Please, if there's someone out there who knows XML and XSL and wants to help, I would appreciate some comments or ideas.

MarcoTC
Subtitles Admin
Posts: 44
Joined: Sat Jun 13, 2015 5:15 am

Re: new XML subtitle application

Sat Aug 01, 2015 8:12 am

Where you say 'language', you mean 'format'.
You're on a brave quest designing a new format but I would like to suggest you look into JSON and forget about the rest.
If it's done smart, simple and flexible in JSON, you have a lot better chance people are actually going to pick it up.
My 2 cents.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Tue Aug 04, 2015 5:37 pm

Yes. You can say "format". In XML jargon it is called "XML application".

I don't know any application that stores its data in JSON format. Besides I don't think JSON is so powerful. Can you store a tree structure with JSON? With XML you have it out of the box. And you have XSLT which is the main reason I'm using it. You can convert easily (more or less) to any format: SRT, SSA, WebVTT, SUB...

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Sat Aug 08, 2015 7:28 pm

And one more reason to use this format: the films with different versions or cuts. Think of it: one unique file storing all translations for every version of a film without redundancy and supporting various subtitle flavours (normal, hearing impaired, with/without lyrics, etc). Perhaps the only drawback is that I'm the only one person using it right now :( But I think it could be useful.

But I guess this is as always: if it proves to be useful it will expand. Otherwise it won't.

For the time being it is useful to me and that makes it worth the effort.

watdafox
Posts: 27
Joined: Fri Aug 07, 2015 3:13 am

Re: new XML subtitle application

Sat Aug 08, 2015 8:03 pm

It would be kinda hard to maintain, wouldn't it? I mean, we can already rule out end users using these files, as they will be 14MB each (35 langs x 2 release x 2 hearing x 100ko), maybe more. And on distributer-side, to update it, it will need to manipulate a huge file, that's harder than having a db with 140 entries. Not to forget that on a db you can also store metadata, where your xml file would be subs only.

Sidenote: SSA can store multiple langs.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: new XML subtitle application

Sun Aug 09, 2015 4:25 pm

we can already rule out end users using these files, as they will be 14MB each
It isn't much bigger than the same information in 35 SRT files with the benefit that you don't have redundancy:

35 lang * (100 ko + 2 alternative releases * 6 ko + 2 ko hearing information)

You share the common information between different cuts and flavours. The only overhead is the markup (XML tags) and this goes away when you compress it.
Not to forget that on a db you can also store metadata, where your xml file would be subs only
No. You can store anything you want. Now I store some metadata about the film like title, id, etc. and the author/translator of the subtitle. But you can easily get rid of it.
Sidenote: SSA can store multiple langs
I didn't know. Thanks.

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 27 guests