Video metadata

From Tournesol
Jump to navigation Jump to search

Tournesol manages a database to store videos' metadata. The metadata are imported and updated on Tournesol via youtube-dl.

A video's metadata is tuple < video_id, title, channel, description, views, likes, date, language, uploader, thumbnail, status, last_updated, download_failures >.

These metadata are currently used to improve the contributor's experience and to enable keyword search in the search page.

Description of the metadata

The value video_id is the video identifier. It is used throughout the platform, e.g., in the Tournesol main database.

The value status says if the video is public, unlisted, private or removed. This is used to determine which videos are recommended in the search page, or selected for rating in the rating page.

The value channel is the account that uploaded the video.

The value last_update is a time stamp of the latest successful update of the video's metadata.

Metadata import

When a contributor adds a new video to Tournesol, either by adding it to the rate-later list or by copy-pasting its URL in the rate page, then Tournesol checks whether the video is in the video metadata database. If the video is not in the video metadata database, then Tournesol checks if it fits the YouTube video identifier format. If it does, then Tournesol calls youtube-dl to import the video metadata.

To mitigate failure, we start a clock when the call is made. After wait_time = 1 minute, a second download attempt is made and wait_time is doubled. As long as no successful download is made, after wait_time since the latest download attempt, a new download attempt is made, and wait_time is doubled again. If wait_time > 1 week, the download attempts are aborted.

Metadata update

Every 20 minutes, a video is randomly selected from the video metadata database, and a call to youtube-dl is made to download its metadata. This latency is designed to avoid YouTube rejects.

If the download is successful, then the metadata of the video are updated, as well as last_update, and download_failures is set to 0.

If the download is unsuccessful, then download_failures is incremented.