Tournesol manages a database to store videos' metadata. The metadata are imported and updated on Tournesol via youtube-dl.
A video's metadata is tuple < video_id, title, channel, description, views, likes, date, language, uploader, thumbnail, status, last_updated, download_failures >.
These metadata are currently used to improve the contributor's experience and to enable keyword search in the search page.
Description of the metadata
The value channel is the account that uploaded the video.
The value last_update is a time stamp of the latest successful update of the video's metadata.
When a contributor adds a new video to Tournesol, either by adding it to the rate-later list or by copy-pasting its URL in the rate page, then Tournesol checks whether the video is in the video metadata database. If the video is not in the video metadata database, then Tournesol checks if it fits the YouTube video identifier format. If it does, then Tournesol calls youtube-dl to import the video metadata.
To mitigate failure, we start a clock when the call is made. After wait_time = 1 minute, a second download attempt is made and wait_time is doubled. As long as no successful download is made, after wait_time since the latest download attempt, a new download attempt is made, and wait_time is doubled again. If wait_time > 1 week, the download attempts are aborted.
Every 20 minutes, a video is randomly selected from the video metadata database, and a call to youtube-dl is made to download its metadata. This latency is designed to avoid YouTube rejects.
If the download is successful, then the metadata of the video are updated, as well as last_update, and download_failures is set to 0.
If the download is unsuccessful, then download_failures is incremented.