Research Buzz today has a good bit of advice for Internet researchers on how to use Google to find spreadsheets posted on the web. The primary focus of the piece is to use the filetype: operator the search for Microsoft Excel and other spreadsheets. For those not familiar with the filetype operator, it tells Google to look for pages in which the URL ends in “XLS” but does not validate the filetype.
From Research Buzz:
filetype:weatherwax inurl:weatherwax
filetype:feathers inurl:feathers
filetype:hamburger inurl:hamburger
filetype:montypython inurl:montypython
(You can’t use the filetype: syntax alone in a search, but you can work around that by teaming it with the inurl: syntax.)
They left out my favorite companion to the filetype: operator, and that is the site: operator. By combining these two, you can search for all of a particular filetypes on a given web site. For example, the following entry into the Google command bar will search for all of the Microsot Excel documents in the att.com domain which is owned by AT&T.
filetype:xls source:att.com
I have found a TON of great intelligence doing this. It surprises me how careless people can be sometimes making sensitive corporate information available to the public Internet audience. Competitive intelligence professionals would do well to proactively run this and related searches on their competitor pages. Look primarily for Excel, PowerPoint and Word documents for fun and profit. Conversely, an exercise in counterintelligence would be to run these searches on your own domain to make sure no documents are posted to the public Internet which shouldn’t be.