in ,

RameshAditya / scoper, Hacker News

RameshAditya / scoper, Hacker News



Fuzzy and Semantic Caption-Based Searching for YouTube Videos


What Scoper is

Scoper is a python script that takes a youtube URL and a user query string as inputs, and returns the timestamps in the video where the content of the caption closely matches the user’s query string.

For example, in the video –– which is Apple’s October 2018 event, if you were to queryPhotoshop for ipad, you’d see the following output –

photoshop on ipad. 1h 6m 29 s for. 54 M 16 s ipad. 50 M 37 s photoshop. 1h 14 m 8s this is a historic center for 3m 48 s would love to play it for you 4m 50 s pro users but designed for all 7m 52 s exactly what you're looking for, 8m 0s go and use for everything they 8m 52 s product line for years to come, 9m 29 s

How Scoper works

Scoper works in two ways.

  • Extract captions and timestamps from the YouTube URL
  • Preprocess the user query and train a Word2Vec model
  • Query over the captions and find the best match. This is done in two ways, as decided by the user –
    • Fuzzy searching

      • Scoper enables you to query over the video’s captions by using fuzzy matching algorithms.
      • This means it searches for the most relevant captions in terms of spelling and finds the nearest match.
      • Done by using variants of Levenshtein’s distance algorithms.
      • Supports multiple languages.
    • Semantic searching

      • Scoper also enables you to query over the video’s captions using semantic sentence similarity algorithms.
      • The performance of semantic searching is highly dependent on the dataset on which the Word2Vec model used is trained on.
      • By default, the Brown’s corpus is used to train the Word2Vec model, and additionally a modified word-mover’s distance algorithm is used to evaluate sentence-sentence similarity.
      • For non-english language querying, the user will have to provide their own dataset.
  • Map back the chosen captions to the original timestamps and return them

How to use Scoper

Shell usage

Web GUI usage


CLI usage

>python -W ignore --video --mode FUZZY --limit 10 --Language en Enter query string: prjct airo  air. 9m 0s project aero, our new augmented 1h 6m 7s well, with project aero, now you 1h 9m 54 s we also showed you project aero, 1h  (M)  s pro.  (M)  s ipad pro and it protects both 57 M 15 s tap. 59 M 52 s so now with photoshop, project 1h  (M)  s products, every ipad pro is made 1h  (M)  s previous air.  (m)  s >python -W ignore --video --mode SEMANTIC --limit 10 - -language en Enter query string: i can't wait to introduce you  i am thrilled to be able to tell  (m)  s you're going to be amazed by 1h  (m)  s powered by the all-new a  (x) *********************************************************************************** (m)  s but since this is an x ​​chip, it 51 M 51 s in fact, this new a 12 x has more 51 M 55 s i can't wait for you to get your  (m)  s just like in the xr, we call it  (m)  s The A 12 x bionic has an all-new 53 m 1s a few days ago and they're live  (m)  s and all of the new features of 21 M 47 s

  • Improve the sentence similarity algorithm
  • Include out-of-the-box support for use of pretrained word embeddings
  • Include support for general audio searching using SpeechRecognition APIs to generate a corpus from non-captioned audios

Support Me

If you liked this, leave a star!⭐️

If you liked this and also liked my other work, be sure to follow me for more!***

Brave Browser
Read More

What do you think?

Leave a Reply

Your email address will not be published.

GIPHY App Key not set. Please check settings

Why Finnish babies sleep in boxes, Hacker News

Why Finnish babies sleep in boxes, Hacker News

Linking Lending Rate to External Benchmark Must for New Loans; Can Banks Still Cheat Old Borrowers? – Moneylife,