↑ Best viewed this side up ↑

Retrieving a Web Site for Offline View Using Wget

February 23rd, 2013

I’ve volunteered to do a few minor CSS tweaks on a third-party Web site, implemented in Java and hosted on Heroku. Given the tiny scope of my task, figuring out how to build the entire thing and run it on a staging instance or locally would have been an overkill. So I’ve sought a way to create a static local mirror of the site. That turned out to be less straightforward than running wget --mirror Home-Page-URL.

First, the Web site in question has its stylesheets and other static files served from a CDN (content distribution network.) It also relies on third-party services for Web fonts, video streaming, and chat.

Second, it has some really big downloads. Fortunately, they are served from a subdomain.

Third, it has a blog section, also in a subdomain, which uses a completely different stylesheet that I did not need to touch.

To cut the long story short, here is the wget command line that worked for me:

wget --mirror \
  --page-requisites \
  --convert-links \
  --span-hosts \
  --domains domain-list \
  --reject pattern-list \
  Home-Page-URL

and here is the explanation:

--mirror
Enable infinite recursion and time-stamping.
--page-requisites
Also download files required to view the web page: images, stylesheets and so on.
--convert-links
Edit links in the downloaded documents so as to enable offline viewing. This includes links to page requisites. As a result, links to the also-downloaded files point to local copies, all other links get replaced with complete URLs.
--span-hosts
Permit recursion and retrieval of page requisites to span across hosts. (Use with caution or you’d download the entire Internet.)
--domains domain-list
Restrict the list of domains to download files from. In my case, those were the “www” subdomain of the Web site being mirrored and the domain of the CDN serving its static files.

Example: --domains www.example.com,somecdn.net

--reject pattern-list
Do not mirror certain files. list is a comma-separated list of file name suffixes or patterns.

Example: --reject mp3,ogg

KB2756872 Windows 8 Update Fails to Install? Remove Realtek Audio Drivers

December 22nd, 2012

My notebook, just upgraded to Windows 8, has spent quite some time trying to install the KB2756872 update and rolling back after failures.

For the record, I have run delmigprov.exe accompanying the standalone download of that update, but that alone did not help.

The solution in my case was the removal of Realtek audio drivers. (It is interesting that audio has continued to function, so now I am not sure whether I need those drivers at all.)

It’s Been a Long Time Since I Had To Patch a Binary Executable…

December 11th, 2012

Needed to push a file created on a notebook the other night to a Git repo on a desktop, both running Windows and connected to my home router. Thought the git protocol would work. It did not due to a bug in msysgit. (TL/DR: the bug has been open since March 2010, but nobody so far has volunteered to find the root cause and remedy, or sponsor such an effort. The only sensible workaround is to recompile msysgit from source with the side-band-64k protocol capability disabled, as the older side-band does not exhibit the problem, but the newer, faster alternative always takes precedence if both client and server support it.)

Followed the advice from Eli Billauer’s blog and patched git.exe on my desktop, which plays the role of a “central” Git server.

Here is a patch script that worked for me. It requires Gsar for Windows from the GnuWin32 collection; CygWin likely includes gsar too.

@echo off
copy /-y git.exe git.exe~
if errorlevel 1 goto copyfailed
gsar -o -sside-band-64k -rKW6YzEZbBv584 git.exe
if errorlevel 1 goto patchfailed
echo git.exe successfully patched
goto quit
:copyfailed
echo Could not create a backup copy of git.exe
goto quit
:copyfailed
echo Could not patch git.exe with GSAR
goto quit
:quit

The value of the -r option is just a random 13-character string generated by DuckDuckGo using the query password 13. You may wish to use different values on each Windows machine you may be pushing to using the git protocol.

Outlook Macro to Nicely Format Skype Chat Excerpts

December 1st, 2012

My day job involves a lot of communication, mostly via email and Skype IM. From time to time, I need to file an important excerpt from a Skype chat for later retrieval, or email it to a customer, partner, or colleague.

For years, I would have select that excerpt, copied it to the clipboard and pasted into a new Exchange Mail or Post item.

However, what got pasted was unformatted plain text, way harder to read than the original chat displayed in Skype:

Raw paste

I used to format the lengthier excerpts manually, out of respect to the recipients and/or future readers. Tedious work.

Earlier this year, I had proposed to celebrate our company’s 13th anniversary with a hackathon. Excelsior Hack Day I was a success, and I used it as a chance to take one bit off the routine part of my work.

My solution

Skype IM Pretty Printer is a VBA Macro for Microsoft Outlook that takes a Skype chat from the clipboard, formats it nicely and pastes into a new HTML message:

Paste using Skype IM Pretty Printer

If you want to give Skype IM Pretty Printer a shot, I have open sourced it under the MIT/X11 license. You can fork it on GitHub or visit the official page for download and installation instructions.

Running Online Python Tutor in a Local Linux VM

October 29th, 2012

Online Python Tutor (OPT) enables first-year CS students to watch the nicely visualized execution of their Python programs step-by-step.

A fresh edX student very much liked OPT but had two problems with its online nature: sometimes the OPT Web site was not responding, sometimes she had no Internet connection. Fortunately, OPT is open sourced on GitHub, so I was able to set it up on her Windows notebook as follows:

OPT runs on Google App Engine, but there is a local development server in the GAE SDK. I’ve set it up up on top of a small VirtualBox VM running Linux, so as to minimize interference with other software and simplify migration.

  1. Set up or clone a baseline Linux VM. I had a baseline Ubuntu 12.04 LTS disk image already, so just followed my own VirtualBox VM cloning recipe.
  2. In the meantime, download the Linux version of the Google App Engine SDK for Python:

    wget -c http://googleappengine.googlecode.com/files/google_appengine_1.7.3.zip

    (look up the current URL on the SDK download page)

  3. Fetch the latest version of OPT from GitHub:

    wget -c -O online-python-tutor.zip https://github.com/pgbovine/OnlinePythonTutor/zipball/master
  4. It turned out that Ubuntu 12.04 Server has Python installed even in the minimal configuration. I had to install unzip though:

    sudo apt-get install unzip
  5. Unpack both packages. I have chosen to put them under /opt (pun not intended – that is where FHS says you should put optional packages):

    cd /opt
    sudo unzip ~/google_appengine_1.7.3.zip
    sudo unzip ~/online-python-tutor.zip
    
  6. (optional) Rename the OPT directory:

    sudo mv pgbovine-OnlinePythonTutor-c4880ea online-python-tutor
  7. Try running OPT:

    sudo /opt/google_appengine/dev_appserver.py \
      -a 0.0.0.0 \
      -p 80 \
      --skip_sdk_update_check \
      /opt/online-python-tutor/v3
    

    There will be warnings about the unavailability of some APIs and such, but OPT apparently does not use those, so you may ingore the warnings.

  8. Try connecting to the VM from your browser. You should see the main OPT screen and be able to use it:

    Check that it works, then get back to the VM console/terminal and press Ctrl-C to shutdown the development server.

  9. Finally, make OPT start automatically on boot. On Ubuntu and other Upstart-enabled systems, add a .conf file to /etc/init:

    sudoedit /etc/init/pythontutor.conf

    with the following content (change installation directories if necessary):

    start on runlevel [2345]
    stop on runlevel [!2345]
    
    expect fork
    exec /opt/google_appengine/dev_appserver.py \
      --skip_sdk_update_check  \
      -a 0.0.0.0 -p 80 \
      /opt/online-python-tutor/v3 &
    
  10. Start the pythontutor job:

    sudo start pythontutor

    If this time you cannot connect to OPT from your browser, look for clues in /var/log/upstart/pythontutor.

Now that everything is working, you may wish to reduce the amount of RAM allocated to the VM. 128MB is more than enough to run a copy of the OPT just for the user connecting from the host, but watch memory use if you install e.g. a shared copy for your class or something.

Running programs on Linux boot up

September 29th, 2012

The other day I needed to configure a Linux VM to run a few programs at system startup. It turned out that there is no single way to accomplish that that would work across all major Linux distros and Unix flavors.

Read the rest of this entry »

Pushing Files from Windows to Linux/Unix Hosts with cwRsync

July 16th, 2012

“Use the best tool for the job” is a great principle. I however reserve the right to define which tool is the best when the person doing the job is going to be me. That is why I develop my Web properties, such as this blog, on a Windows PC, as I am more comfortable with Windows as a desktop platform, but for a very similar reason I run them on a Linux VPS.

This in particular means I need to deploy from Windows to Linux. Back then, I manually copied the new and changed files using the WinSCP plugin for FAR, and that was okay while there were just a few files. I also have a staging environment — a VirtualBox VM that more or less replicates my VPS setup, so I could have set up some shared folders on that VM, and then use rsync to push changes from staging to production. But instead I have set up rsync to push files right from Windows to either staging or production. Here is how you can do that too:

Read the rest of this entry »

Protecting Downloads From Hotlinking – The Soft Way

December 20th, 2011

The Story

Once upon a time, on a Web site 23 hops away from my home PC, there was a free software download that required registration. An email with the download page URL was sent to the visitor after registration. The download page contained usage instructions and a direct, static URL of the download in the form http://host/download/file.

Someone had registered and published the latter URL in a public directory, so people started to download the file directly, without seeing even the download page, let alone the registration form.

The URL of the download was changed, but the story repeated itself in a couple of weeks.
Read the rest of this entry »

Gadgets »

After Fifteen Years, the Kitchen Entertainment Problem is Finally Solved

December 18th, 2011

I had assembled a complete set of HiFi components, albeit Soviet-made, well before our marriage. (More precisely, I had no CD player then, but I owned a reel-to-reel tape recorder if you know what that means.) Nevertheless, ever since moving out of my parents’ apartment back in 1996, we would say to each other every once in a while: “We gotta get us something to enjoy music in the kitchen.”

To put you in context, most people in Russia live in apartments, and in most apartments here the kitchen is a completely separate room, too small for a full-size audio. Some people would have removed the wall between the kitchen and the living room, but if that wall was part of the building structure, as in our case, you were out of luck.

During the occasional visits to the electronic stores, you know, the real, brick-and-mortar ones, we would have looked at those classic tape+FM boomboxes and the flashy all-in-one systems, but we did not quite like any of them. Then a friend upgraded his CD player so we started collecting CDs, and at about the same time all the good FM stations went south, so we decided we’d buy a CD microsystem. Then the MP3 boom came, so I started looking at MP3-enabled boomboxes. Then all of a sudden everyone who went to trade shows had a surplus of USB thumb drives. Now, of course, we needed something with an USB slot. Then I read about Internet radio receivers and thought that is what we should get, but for some strange reason they were not available in Russia at that time – usually we get our hands on new gadgets quickly thanks to the proximity to China. ;)

And then I realized we only need one simple gadget to finally solve the problem.

Here is the setup we ended up with:

- Internet connection
- WiFi router
- PC holding a copy of our CD collection
- iPad
- Bluetooth speakers (that was the missing piece)

Is not it amazing that so many technologies had to emerge, commoditize and consumerize to let me get some jazz with my morning coffee?

iTunes Automation: Convert FLAC Audio Files to Apple Lossless

December 15th, 2011

Update 07-Jul-2013: My iTunes for Windows automation scripts have got a home page at github.io: http://dmitryleskov.github.io/iTunesScripts/, and the latest release can now always be found at https://github.com/dmitryleskov/iTunesScripts/releases.

CD Cabinet

If this is not your first visit to my blog, you may recall that I had ripped all my audio CDs and put them away in boxes last year. My music collection now occupies just a tiny corner of a 1.5 TB hard drive. (I do not own THAT many CDs as you may guess.) Not a big deal itself these days, but the reason for blogging about that was the small utility I had then written to help EAC deal with UTF-8 encoded freedb entries. Now time has come to share with you another FLAC tool that I have created for myself. Hope you will find it handy.

Basically, I wanted to use iTunes Remote to control the playback on the PC and occasionally copy some tracks to my iPad. Problem is, iTunes does not support FLAC, so it looked as if I had to use a third-party plugin or a converter. But then I have discovered the iTunes COM for Windows SDK and put together a script that has imported my entire FLAC collection into iTunes in Apple Lossless format in just a few hours. (That’s on an i5 desktop and, mind you, I do not have THAT many CDs, so your mileage will vary.)

The script needs two binaries from the official FLAC command-line tools, flac.exe and metaflac.exe, and a small helper utility to convert metadata from UTF-8 to UTF-16.

For your convenience, dear reader, I have put together a package containing all those dependencies, but I must first warn you that my solution currently has a few limitations:

  • It is not capable of processing albums ripped of downloaded as a single FLAC file. You must split them into individual tracks before conversion.
  • The FLAC files that have any of the metatags ARTIST, ALBUM, and TITLE missing are skipped.
  • Tracks are imported in Apple Lossless (ALAC) format.

If you need any of these fixed, drop me a line in the comments or fork my script on GitHub – it is available under the MIT/X11 license.

Otherwise, grab the latest release from GitHub, watch the screencast, and enjoy!

Update 19-Feb-2012: Recorded a screencast for the most straightforward usage scenario.

Update 02-Feb-2013: Fixed a bug with invalid track numbers causing crashes.

Update 15-Jun-2013: Fixed a bug that caused the script to abort after converting a FLAC file with the readonly attribute set. Credit for finding the root cause and suggesting a fix goes to Dave Granic – thank you, Dave!.

Update 14-Jul-2013: Fixed a bug: metadata-less FLAC files caused the script to crash.