In yet another move towards giving my left brain complete control I went ahead and took the plunge into GitHub recently. All the cool kids seemed to be doing it, and I figured it was high time I tried to contribute something back to the open source community. Whether they want/need my contributions is a different matter. Amazingly, Andy Dawson foolishly took a pull request from me and merged my patch into an open source project, so if your MongoDB Datasource for CakePHP breaks, you now have someone to blame. Yeah, it’s small potatoes, but I was pretty stoked and now my code will work its way into tens of projects. Whether the kool kids will let me hang out with them is another story.
So my first actual project is also pretty minor, but it’s one of those odd scripts I put together to do some basic maintenance on projects before releasing to a client. Lately I’ve been handing off static HTML/CSS templates to one of the agencies I work with. Often when developing a template I end up with a lot of un-used images in my /img directory (such as backgrounds that didn’t work out), and its something of a pain to go round them up for deletion.
Initially I had hacked up a small Ruby script that would do this for a single HTML & CSS file:
#!/usr/bin/env ruby -wKU
# Removes all images not being used in HTML or CSS
f = File.new('css/your_css_file_here.css', 'r')
g = File.new('your_html_file_here.html', 'r')
a = Array.new
# search CSS for images
while (line = f.gets)
if line =~ /url/
line.scan(/'..\/img\/*.*'/) { |m| a << m.split(/\//)[2].gsub( /'/, "" ) }
end
end
# search HTML for images
while (line = g.gets)
if line =~ /<img src="img\/*.*"/
#puts line
line.scan(/img src="\/*.*"/) { |m| a << m.split(/\//)[1].split(/ /)[0].gsub( /"/, "" ) }
end
end
# change to img dir
Dir.chdir('img')
d = Dir.glob('*')
# diff the two arrays
kills = d - a.uniq
kills.each do |k|
File.delete(k)
end
Yeah, not the prettiest thing in the world, but it worked. The regexes tore through the files line by line storing image references in an array that was then diffed with the contents of the /img directory. Any files left over were nuked.
So while it worked, it only worked on single files, and also didn’t take into account @import rules. What I really wanted was a script that could walk through all of the HTML documents in a folder, search them for images, and also search those HTML files for <links> to CSS files. It would then walk those CSS files for images and also follow any @import rules. Only images that were actually in use would be saved. If a CSS file referenced images, but the CSS file was not used, then those images would get trashed as well (could be dangerous, but works for me.) This leaves me with a nice tidy package I can zip up and send to clients without worrying that stray images are present.
This of course meant re-writing the whole thing. The new script relies on Nokogiri for silky smooth HTML parsing and a customized version of the css_parser gem. The original css_parser didn’t have support for local files, so I made some patches to allow it do so. It’s a hack, but it does the trick. It looks like the original gem maintainer, Alex Dunae has been working on this feature but it wasn’t live on stage so to speak.
Perhaps you may find this useful. Perhaps you can adapt it to do something else useful for you.
My next task will be to build in simple options parsing to allow easy use from the command line and also to add proper unit tests, something I haven’t explored yet with Ruby.