List of HTML pages in a Jekyll site

List of HTML pages in a Jekyll site

I’m porting a few sites from Jekyll over to WordPress and needed to get a list of the urls in order to add redirects (HTTP 301s). Using a bit of UNIX-fu made this simple

Find all html pages

find _site/ -iname "*html" > url-list.txt

Edit the _site part to be the domain to the new WordPress site, like: http://example.com/

emacs url-list.txt

 

Pipe each url into curl so the WordPress blog is hit with each url. This will cause the Redirections plugin to log the request so I can go through each one. (-I tells curl to show the HTTP headers only).

cat url-list.txt |xargs curl -I

 

Finally when I’m done I can rerun the last command and check that all the urls are redirecting correctly:

HTTP/1.1 301 Moved Permanently
Date: Fri, 15 Apr 2011 23:24:33 GMT
Server: Apache/2.2.14 (Ubuntu)
X-Powered-By: PHP/5.3.2-1ubuntu4.7
Location: /category/blog/page/2/
Vary: Accept-Encoding
Connection: close
Content-Type: text/html