Fri Apr 16, 2010 10:50 pm
The easiest part would be the subtitlers and uploaders and hashers to assign the metadata themselve. Either explicitly upon upload (optional) or inferred from the name of the files recorded (moviename or subtitle name).
What could be done:
1.-To modify subtitle entries
1.1.-If they belong to a tv series imdbid flag them as such.
1.2.-Parse their name, removing the tv series name from the file name, for season/episode(s).
1.3.-To allow a method to search by IMDBID (for the series), season and episode(s).
2.-To modify moviehash entries
2.1.-Similar to 1.1
2.2.-Similar to 1.2
2.3.-Similar to SeenCount for MovieHash IMDB Identification of movie hashes, store the possible Season/Episode combinations for a given moviehash.
After this the metadata would start to flow into the database. It wouldn't be infallible but would at least be as accurate as the current information is (that is, the margin of error would be similar to what today we see for hashes matching the wrong movie ID or having the wrong language defined). Names that don't fit in the parsers wouldn't be considered valid and wouldn't be counted.
After this "Season" and "Episode(s)" can be parameters of the SearchSubtitles method and fields in the web search. Also with this it's easy to link to either IMDB or TheTVDB or TVRage for the rest of the information in the subtitle page, if you wanted to.
Currently the search methods are optimized for movies but are counterproductive for tv episodes. And the changes above are not that big (the initial harvest and parsing of existing subtitles would be a big task, of course).
For my own programs, what I've done as parsers of tv shows is:
Show can either be:
-EPI - 1 episode, regular show - Regular episodes
-EP2 - 2 episodes, regular show - Season Starters, Season Enders
-DAY - Daily Shows - i.e. Talk Shows, News Casts
-SPC - Special - i.e. Documentaries, behind-the-scenes, mid-seasons
-PRT - Multi-parters - i.e. Miniseries
The parsers I use are (they can probably be simplified, but I have had no need to do so yet):
EP2:
"^(.*) S([0-9][0-9]*)E([0-9][0-9]*)-E([0-9][0-9]*).*$"
"^(.*) S([0-9][0-9]*)E([0-9][0-9]*) E([0-9][0-9]*).*$"
"^(.*) S([0-9][0-9]*)E([0-9][0-9]*)E([0-9][0-9]*).*$"
"^(.*) ([0-9][0-9]*)x([0-9][0-9]*)-([0-9][0-9]*).*$"
"^(.*) season ([0-9][0-9]*) ep ([0-9][0-9]*)-([0-9][0-9]*).*$"
Season \2
Episodes \3-\4
EPI:
"^(.*) season ([0-9][0-9]*) ep ([0-9][0-9]*).*$"
"^(.*) S([0-9][0-9]*)E([0-9][0-9]*).*$"
"^(.*) ([0-9][0-9]*)x([0-9][0-9]*).*$"
Season \2
Episode \3
DAY:
"^(.*) (19[0-9]{2}|20[0-1][0-9]) (0[0-9]|1[0-2]) ([0-2][0-9]|3[0-1]).*$"
"^(.*) (0[0-9]|1[0-2]) ([0-2][0-9]|3[0-1]) (19[0-9]{2}|20[0-1][0-9]).*$"
Year: \2
Month: \3
Day: \4
SPC:
"^(.*) S([0-9][0-9]*).*$"
"^(.*) E([0-9][0-9]*).*$"
Special ID: \2
PRT:
"^(.*) Part ([0-9][0-9]*)[ ]*[Oo][Ff][ ]*([0-9][0-9]*).*$"
"^(.*) ([0-9][0-9]*)[ ]*[Oo][Ff][ ]*([0-9][0-9]*).*$"
"^(.*) Part[ ]*([0-9][0-9]*).*$"
Part #: \2
Parts: \3
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].