Category Archives: Tips & Tricks

Screaming Frog Tips & Tricks

This is an aide memoire for myself, but if you have found really useful Screaming Frog tips to get the most out of the tool, please leave a comment or email me at hello@jacknorell.com.

If I include the tip, you’ll get a link back to your blog or Twitter handle.

Find unlinked mentions

If you have a long list of URLs where your brand or product is mentioned, and you want to find out which pages do or don’t link back to you, this custom filter in Screaming Frog will be useful:

<a [^>]*?href=(“|’)?http(?:s)?://(?:www\.)?YOURSITE\.com[^>]*?>

(updated to better match a link tag)

Go to Configuration and select Custom, then enter like this:

Regular expression used in Screaming Frog custom filter
Use regex with a custom filter

Check if page is linking to your site

By using “Contains” in the Custom settings, the filter will find if the page is currently linking to your site, though in this version not whether it is nofollowed.

If you have a modified RegEx to check for nofollow links, please do share in comments.

Match anything including New Lines with RegEx

Within Screaming Frog, the expression .* matches everything except new lines and returns, so if the HTML you want to match looks like this:

<div name=”examplediv”>
<span>Extract This Text</span>

That default expression won’t work. However, if you use (?s).* instead, it will work just fine.

Regex for Inclusion or Exclusion must match full URL string

Screaming Frog only reliably matches on the full URL evaluated. This means that this regex won’t work in all cases:

/\d{12}$

What this would normally do is include or exclude all auction listing pages on Ebay.co.uk that look like:

http://www.ebay.co.uk/itm/[auction-name]-/141736421954

That regex should match the twelve-digit auction ID number ending the URL. However, a range of these will still be included when crawling. Updating the regex to the following, to evaluate the whole URL, will work:

.*/\d{12}$

Extract anchor text from links to a site

This is a bit messy, but if a page is linking to example.com, it will extract the text from ‘example.com’ to the next closing anchor tag </a>.

This may at times capture other HTML tags that wrap the anchor text itself inside the <a> tag, if those tags have spaces before or after as the regex currently does not handle spaces well. You can find these easily by running a =find(“<“, [cell reference]) to locate any tag opening sharp brackets, then clean up manually.

href=.*?example\.com.*?>(?:<.*?>)*?(.*?)(?:<.*?>)*?</a>

Note that only the first link’s anchor text will be extracted. If the page has more than one link to example.com, you will need to use a script to break up the page source code and iterate through the resulting text strings.

If you have a more sophisticated working example, please share in the comments.

This Screaming Frog tips article is regularly updated

More Screaming Frog tips & tricks will be added over time.

Using Google Translate in Your Google Spreadsheets

translation

Google Translate works very well together with Spreadsheets to turn whatever language you don’t read into your own (or English of course). Once you’re acquainted with the functions used, you’ll quickly be able to modify your original text into whichever language you require. With a bit of clever work, you could automate processes, by connecting your sheet with If This Then That (IFTTT). Below I’m providing two examples of applications I’ve found useful.

But first, we’ll review the formulas.

The formulas

Google spreadsheets has two formulas to help you both translate and identify the language of text within a column.

=GOOGLETRANSLATE(text, [source_language],[target_language])

You can also set the [source_language] to auto-detect by using “auto” instead of a source language code, like this:

=GOOGLETRANSLATE(text, “auto”,”en”) to translate these anchor texts into English.

You don’t need to set the target language, as it will default to the language used in the spreadsheet.

The second formula can help you filter by language:

=DETECTLANGUAGE(text_or_range)

Full list of 2 letter ISO language codes on Wikipedia.

Translating backlink anchor text

In my consulting work, my team and I often come across backlinks in a range of languages and alphabets. Of course, this makes it difficult to evaluate backlink profiles: Is that anchor text a Brand, Compound, Money or Other term in our classification? Rather than just shrug our shoulders and chuck all of these incomprehensible text snippets in either Money or Other, I decided that using Google Spreadsheets to translate the lot would be more helpful.

To ensure that I got a broad selection of non-English anchor texts, I pulled the backlinks for Aliexpress.com. As they’re in mostly Chinese, it came in handy for the example. Removing the unnecessary columns, we are left with this:

Aliexpress backlinks
Backlink examples for Aliexpress.com

By using the formula =GOOGLETRANSLATE(D8, “auto”, “en”) in the appropriate columns, we’ll end up with a translated text.

GoogleTranslate formula Aliexpress
Translating with auto-detect to English

Copying the formula down the sheet, and waiting a few moments, we end up with results. I also translated the link source page titles to further illustrate how useful these functions are:

Translated Aliexpress results
The anchor texts and page titles translated

In our work, we would now be easily able to classify the anchor texts in the right groupings.

Auto-translating Google Reader replacement

While Google Reader is no more, Spreadsheets can use the ImportFeed formula to import RSS or Atom feeds.

=ImportFeed(URL, [feedQuery | itemQuery], [headers], [numItems]). Formula arguments are the following:

URL is the url of the RSS or ATOM feed.

feedQuery/itemQuery is one of the following query strings: “feed”, “feed title”, “feed author”, “feed description”, “feed url”, “items”, “items author”, “items title”, “items summary”, “items url”, or “items created”. The feed queries return feed properties; the feed’s title, the feed’s author, etc. If you want the feed data, do an “items” request.

  1. the “feed” query returns a single row with all of the feed information
  2. the “feed ” query returns a single cell with the requested feed information
  3. the “items” query returns a full table, with all of the item information about each item in the feed
  4. the “items ” query returns a single column with the requested information about each item
  5. using a “feed” query, the numItems parameter isn’t necessary and is replaced by the option headers param
  6. with an “items” query, the numItems parameter is expected as the third parameter, and headers as the fourth
  7. headers – “true” if column headers is desired. This will add an extra row to the top of the output labeling each column of the output

Building the spreadsheet

I decided to grab content from Spin Sucks for this example:

ImportFeed formula example
Pulling in the Spin Sucks RSS feed
ImportFeed Results
The ImportFeed formula output

Now for translating the contents of the feed. I picked Swedish (my birth language) by using the formula =GOOGLETRANSLATE(E4, “auto”, “sv”)

GoogleTranslate to Swedish
Formula to translate source into Swedish

And the results are predictably poor but understandable Swedish:

GoogleTranslate Swedish Results
Translating feed text into Swedish

The above is of course a very basic implementation of the formulas, but gives you a starting point to develop from.

Other useful import queries

IMPORTXML: Imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.

IMPORTRANGE: Imports a range of cells from a specified spreadsheet.

IMPORTHTML: Imports data from a table or list within an HTML page.

IMPORTDATA: Imports data at a given url in .csv (comma-separated value) or .tsv (tab-separated value) format.