Hack of the day: downloading VOICEROID実況 from Nicovideo
- Comments実況? Is it something edible?
In recent times, I’ve been watching a lot of VOICEROID実況 (じっきょう, jikkyou, literally “commentary”) videos from the rather famous (in Japan) video service ニコニコ動画, better known as “Nicovideo”. In this case, the commentary actually refers to games: they’re basically a Japanese version of the Let’s Play videos that are all around other places like YouTube.
The difference from “regular” videos lies in the “VOICEROID” term: this is a name of a TTS software developed by AH Software using an engine devised by a company called AI Inc. The name is derived from the very famous VOCALOID singing software. Like in VOCALOID, many different voices have been created, each associated to a specific character. This software is used to have these characters talk and provide commentary to the game being shown. Depending on the video and the uploader, these comments may range from comedy to more serious themes, and some authors even created stories featuring them in the game they are playing.
This in turns shapes the characters beyond the original designs by AHS into the realm of “secondary creations”, to use a term borrowed from Re:CREATORS. That’s what makes thse videos interesting for me (and in addition, it’s still a good way to keep my Japanese up to speed).
The problem
The video interface of Nicovideo sucks. Seriously. Up to recent times, it didn’t even offer 1080p, and most of the features (including advanced seeking, etc.) are locked beyond their premium account (which, however, grants access also to other bits like live events). In addition, when the website is under heavy traffic watching videos can be a true pain. Luckily, youtube-dl supports downloading from Nicovideo, barring some bugs.
This situation complicated recently, because Nicovideo became the target of a dDoS from outside Japan. Their response was to shut off access from outside Japan for a number of days. I could’ve just waited it out, but I wanted to work around the problem. So I started to what to think about it.
The implementation
The first ingredient in the recipe was getting a cheap VPS located in Japan. Linode did in their Tokyo 2 datacenter, so I signed up for their $5 offering. I didn’t need either processing power or storage: it would just exist as a “hop” to Nico. For the image, I chose openSUSE Leap 42.3, as I’m mostly familiar with the distribution. I installed a stock minimal install, but I used the distro-supplied kernel instead of Linode’s (there’s a reason, which I’ll show afterwards).
Then, I need some form of VPN to allow access from my home network. I thought about openVPN, but since I’ve been testing and using WireGuard with great satisfaction, I settled for that. WireGuard is much simpler to configure than openVPN, doesn’t require daemons, and routing uses the stock Linux tools like iproute2
. It has also support for LEDE and OpenWRT, which meant I could hook it up in my Turris Omnia.
First of all, I added the relevant repositories:
# zypper ar -f obs://network:vpn vpn
# zypper in wireguard wireguard-tools
This installed both the tools (wg
and wg-quick
) and the kernel module required by WireGuard (that’s why I needed a stock distro kernel).
Then, I needed a firewall:
# zypper in firewalld
# systemctl start firewalld
# firewall-cmd --add-service=ssh
# firewall-cmd --zone=public --change-interface=eth0
# firewall-cmd --zone=public --change-interface=eth0 --permanent
# firewall-cmd --add-service=ssh --permanent
# firewall-cmd --zone=internal --add-masquerade
# firewall-cmd --zone=internal --add-masquerade --permanent
Afterwards, I had to configure WireGuard:
# mkdir /etc/wireguard
# chmod 0700 /etc/wireguard
# umask 002 # Don't make files group accessible
# wg genkey > /etc/wireguard/wg0.key # this generates a private key
# cat /etc/wireguard/wg0.key | wg pubkey > /etc/wireguard/wg0.pub
Then I edited /etc/wireguard/wg0.conf
with the details of the interface:
[Interface]
PreUp = firewall-cmd --add-port=51820/udp
PostDown = firewall-cmd --remove-port=51820/udp
ListenPort = 51820
PrivateKey = <my private key>
Address = 10.67.53.10/32
MTU = 1500 # Different from default, see below
[Peer]
PublicKey = <my public key>
AllowedIPs = 10.67.53.0/24,192.168.35.0/24
Endpoint = <home address IP>:51820
“Allowed IPs” in WireGuard mean the destination IPs that are allowed through the tunnel (note that routing must be set separately, although wg-quick
handles that for you).
Afterwards I had to tweak the firewall to ensure that:
- The
wg0
interface was masqueraded (for packets coming from my own LAN) - Packets could go from
wg0
toeth0
and vice versa - Apply MSS clamping
Some of the commands below may be redundant, but firewalld wasn’t really meant to be used like this (I removed the --permanent
lines for brevity).
# firewall-cmd --zone=internal --change-interface=wg0
# firewall-cmd --direct --passthrough ipv4 -t nat -A POSTROUTING -s 10.67.35.0/24 -o eth0 -j MASQUERADE
# firewall-cmd --direct --add-rule ipv4 filter FORWARD 0 -i wg0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter FORWARD 0 -i eth0 -o wg0 -m state --state RELATED,ESTABLISHED -j ACCEPT
# firewall-cmd --direct --passthrough ipv4 -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
Then, I brought the interface up:
# systemctl start wg-quick@wg0
# systemctl senable wg-quick@wg0
I set the MTU specifically to 1500, because lower values set by wg-quick
would cause packet fragmentation and packets would go nowhere (I spent a lot of time with tcpdump
before figuring it out).
On the Turris Omnia side, I had already WireGuard configured. It was just a matter of adding a few lines in /etc/config/network
and restarting the network itself:
config wireguard_wg0
option public_key '<public-key>'
list allowed_ips '10.67.53.10/32'
list allowed_ips '<nicovideo IP block>'
option endpoint_host '<linode public IP>'
option endpoint_port '51820'
option persistent_keepalive '60'
option route_allowed_ips '1'
While the dDoS was in effect, I routed data for Nicovideo through the VPN, thus bypassing the block. Now that it works so well, I might consider it expanding it to work around some programs (games) that reply on Japanese IPs, like Girls Trinary.
Admittedly, it wasn’t enough, even after the dDoS was over. Given that the VPS has a higher speed link than my own connection when it comes to Japan, why not leverage that?
To do so, I installed a couple more packages:
# zypper in rsync python3 python3-pip youtube-dl
The last package required enabling the Packman repository through YaST beforehand.
Then, I installed sarge, which wasn’t available in the distro, through pip
:
# pip3 install sarge --prefix /usr/local
And then it was a matter of hacking around a “simple” script. This would fetch one or multiple video URLs (including Nicovideo’s “mylist”, similar to YT’s playlists), pass them through youtube-dl, then rsync
them to the NAS I have at home (and deleting them afterwards). It makes use of youtube-dl’s “hooks” which are executed when a video has been downloaded.
The script is provided at the bottom of the post (BSD licensed). Note the total absence of error checking: it was a “hack” as the title of the post implies. It worked for me: it may or not may work for you. It might even kill every kitten in the world or bring the Great Old Ones to this planet. Exercise caution.
Afterwards, there was just the matter of filling in the Nicovideo download credentials, as login is required to view. To do I created a .netrc
in the home directory of the download user:
machine niconico login <my login> password <my password
Set permissions to 0600, and it’s done (or できた! if I were to use Japanese).
Then I just need to invoke the script with one or more URLs and it will download and transfer things to my NAS. Magic!
#!/usr/bin/python3
# Copyright 2018 Luca Beltrame
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
# 3. Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from this
# software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import argparse
import os
from pathlib import Path
import sys
import sarge
import youtube_dl
MY_NAS_IP = "127.0.0.1"
def download_hook(params):
if params["status"] == "finished":
destination = params["filename"]
cmd = ("/usr/bin/rsync -aP "
"{0} {1}:/home/storage/video/nico/")
cmd = sarge.shell_format(cmd, destination, MY_NAS_IP)
print(cmd)
output = sarge.run(cmd)
def manual_rsync(filename):
cmd = ("/usr/bin/rsync -aP --remove-source-files "
"{0} {1}:/home/storage/video/nico/")
cmd = sarge.shell_format(cmd, filename, MY_NAS_IP)
print(cmd)
output = sarge.run(cmd)
def file_downloaded(ydl, params):
filename = ydl.prepare_filename(result)
return Path(filename).exists()
def check_mylist(ydl, params):
"""Get all video files from a Nico's mylist, skipping
already downloaded ones.
"""
playlist_start = 1
filenames = list()
for entry in params["entries"]:
filename = ydl.prepare_filename(entry)
if Path(filename).exists():
playlist_start += 1
manual_rsync(filename)
continue
filenames.append(filename)
return playlist_start, filenames
def main():
youtube_params = {"usenetrc": True,
"progress_hooks": [download_hook, error_hook]}
check_params = {"simulate": True,
"usenetrc": True, "quiet": True}
parser = argparse.ArgumentParser()
parser.add_argument("url", nargs="+")
options = parser.parse_args()
urls = options.url
# Check filenames
to_download = list()
for url in urls:
# Simulate download once to get metadata
with youtube_dl.YoutubeDL(check_params) as ydl:
result = ydl.extract_info(url)
if "mylist" in url:
playlist_start, to_download = check_mylist(ydl, result)
if not to_download:
urls.remove(url)
continue
# FIXME: Alters this globally for all downloads
youtube_params["playliststart"] = playlist_start
else:
filename = ydl.prepare_filename(result)
if Path(filename).exists():
urls.remove(url)
manual_rsync(filename)
continue
to_download.append(filename)
if not urls:
return
# Keep on retrying to work around youtube-dl's behavior with nico
while True:
try:
with youtube_dl.YoutubeDL(youtube_params) as ydl:
res = ydl.download(urls)
except youtube_dl.utils.DownloadError:
res = -1
pass # DANGEROUS
if res == 0:
break
for item in to_download:
if Path(item).exists():
Path(item).unlink()
if __name__ == "__main__":
main()