I got a request from a client today. They had a site, that was made up mostly of static html pages (with a smattering of PHP and a WordPress blog.) They wanted to drop <meta> description tags into their site. Excellent idea. They helpfully sent along a Word doc containing the urls of each page and the description they would like added. Of course, this was a list of 30 discreet urls, which would be something of a pain to do by hand with ctrl-v/c. Like most humans I hate repetitive tasks like this, so I figured I should write a script to do it for me.
Luckily for me the url’s mapped to the html/php files themselves (no fancy routing) and looked something like this:
http://www.someurl.com/ourproducts.html This page is about wonderful unicorns that enable your business to churn out molten rivers of gold
http://www.someurl.com/aboutus.html We're a gang of very exciting entrepreneaurs making fabulous widgets in our Elven caves.
(etc.)
This gave me an idea: I could export the word doc to plain text and write a script to parse said textfile grabbing filenames & descriptions. The script would then open the correct files and drop the correct <meta> tags into them.
To make my life easier I pre-processed the textfile with TextMate (any text editor will do) and used a find/replace to convert the urls into just the filenames wrapped with some tokens. So http://www.someurl.com/ourproducts.html became @ourproducts.html@. Wrapping the filenames with @’s made it easier to differentiate the filenames from the descriptions.
I then used TextMate’s Find in Project command to add an empty <meta type="description" content=""> tag to every file. Again, I did this to make it easier to place the descriptions later with my script. To pull this off I wrote a regex to find the <title> tags on each page and added the empty <meta> after them:
find: <title>(.*?)<\/title>
replace: <title>$1<\/title>\n<meta type="description" content="">
I’ve been playing around with Ruby and Python lately, to expand my horizons past PHP. I decided to use Ruby for this particular task. The following script opens my textfile and reads it line by line grabbing filenames and descriptions. It then attempts to open the file specified, finds the empty <meta> tags and drops in the description using gsub!.
I ran it from TextMate with command-R but I could have added a shebang and run it from the cli. It worked like a champ:
# Open a file and look for the empty meta tag
# if found, drop the description into it
def get_and_open_file(filename, desc)
begin
if f = File.open(filename, "r+") # open the file
lines = f.readlines
# for each line in the textfile build a new meta tag
# and drop it into place
lines.each do |it|
newline = '<meta type="description" content="'+desc+'" />'
# replace empty meta tag with loaded tag
if it.gsub!(/<meta type="description" content="" \/>/, newline)
puts it
end
end
f.pos = 0 # back to start
f.print lines # write out modified lines
f.truncate(f.pos) # truncate the file to its new length & save
end
rescue
end
end
# Open the textfile and parse it line by line looking for
# URLs & descriptions with a regex: /\@(.*?)\@(.*)/
File.open("Metadata.txt").each { |line|
if line =~ /\@(.*?)\@(.*)/
filename = $1
desc = $2
get_and_open_file(filename, desc)
end
}
I know its the not the prettiest Ruby code (and I’m sure I could have done it more efficiently) but it did what I needed in a very short amount of time. I find that I’m using Regex’s all the time lately and they’re really helping to automate a lot of very boring tasks.