TubeKit
A Youtube Crawling Toolkit

Tools

In addition to TubeKit, which incorporates a suit of tools to perform query-based YouTube crawling, we have developed a few small tools that lets one grab various forms of information off YouTube without running queries.

To use a tool, click on its name, which should display the code in your browser. Save that file. Remove the extension '.txt' and you are good to go! Some of the tools require database support and/or additional tools. Database creation and configuration scripts are also provided below.

Tool Usage Notes
Extract YouTube video URLs > php extractYTVideoURLs.php yturls.txt mylist.txt

This PHP script takes a set of YouTube URLs (or URLs to almost any webpage), and extracts the embedded URLs that point to YouTube videos. You can use this generated list (here, in 'mylist.txt') to harvest various attributes about those video using the 'Harvest videos' or 'Download YouTube vidoes' tools.

You don't need any database support for this. Just put one URL per line in 'yturls.txt', provide the name of the output file (here, 'mylist.txt'), and run it.
Download YouTube videos > php downloadYTVideos.php mylist.txt

Copy and paste Python script from here and save it as 'youtube-dl' in the current directory. Then put the URLs of YouTube videos in a text file and pass the name of that file as an argument on the command line. You can write these URLs manually, or use the output of the 'Extract YouTube videos URLs' tool.

You don't need any database support for this. You do need Python 2.4 or higher for running 'youtube-dl' that this PHP script uses to download videos.
Harvest videos > php harvestYTVideos.php mylist.txt

This PHP script lets you collect a bunch of attributes of a YouTube video. All you need to do is put the URLs of those videos in a text file and pass the name of that file as an argument on the command line. You can write these URLs manually, or use the output of the 'Extract YouTube videos URLs' tool.

The results of this harvesting go into a MySQL database. You need to get this database and appropriate table ready before you run this script. See below for database configuration. You also need a database connection file, MagpieRSS, and parseRSS.php file (see below). Make sure you open this script and set $magpieRSSLocation.
Collect comments > php collectComments.php yt_once yt_comments

This PHP script collects comments for all the videos you have in your MySQL table (likely to be collected from 'Harvest videos' tool).

Make sure that you have a table with some entries generated by 'Harvest videos' tool. Then create a new table for storing the comments (see the Database configuration file). Run this tool with the name of these two tables in that order.
Harvest profiles > php harvestYTProfiles.php

This PHP script reads username handles from a table in which the data is collected by 'Harvest videos' tool, and collects a set of attributes from that user's profile.

You need to create a table in your MySQL database where the data collected with this script can be stored. See below for database configuration. You also need a database connection file (see below).
Parsers Download MagpieRSS parser.

Copy and paste this file and store as 'parseRSS.php'.

Store these files in the same directory from where you are running other scripts.
Database configuration This text file explains how to prepare your MySQL database that you can use for the tools listed here. This file has instructions only and you don't need to download this file.
Database connection Save this file as 'connect.php'. Store this file in the same directory from where you are running other scripts dependent on database connection.

Back to top