I would definitely avoid this on a production website, as it’s incredibly user un-friendly and looks a bit dodgy. After all, your visitors arrive to read / learn from your content. Often that involves taking notes / clippings for use elsewhere.
That regex should match the twelve-digit auction ID number ending the URL. However, a range of these will still be included when crawling. Updating the regex to the following, to evaluate the whole URL, will work:
Extract anchor text from links to a site
This is a bit messy, but if a page is linking to example.com, it will extract the text from ‘example.com’ to the next closing anchor tag </a>.
This may at times capture other HTML tags that wrap the anchor text itself inside the <a> tag, if those tags have spaces before or after as the regex currently does not handle spaces well. You can find these easily by running a =find(“<“, [cell reference]) to locate any tag opening sharp brackets, then clean up manually.
Note that only the first link’s anchor text will be extracted. If the page has more than one link to example.com, you will need to use a script to break up the page source code and iterate through the resulting text strings.
If you have a more sophisticated working example, please share in the comments.
This Screaming Frog tips article is regularly updated
Google Translate works very well together with Spreadsheets to turn whatever language you don’t read into your own (or English of course). Once you’re acquainted with the functions used, you’ll quickly be able to modify your original text into whichever language you require. With a bit of clever work, you could automate processes, by connecting your sheet with If This Then That (IFTTT). Below I’m providing two examples of applications I’ve found useful.
But first, we’ll review the formulas.
Google spreadsheets has two formulas to help you both translate and identify the language of text within a column.
In my consulting work, my team and I often come across backlinks in a range of languages and alphabets. Of course, this makes it difficult to evaluate backlink profiles: Is that anchor text a Brand, Compound, Money or Other term in our classification? Rather than just shrug our shoulders and chuck all of these incomprehensible text snippets in either Money or Other, I decided that using Google Spreadsheets to translate the lot would be more helpful.
To ensure that I got a broad selection of non-English anchor texts, I pulled the backlinks for Aliexpress.com. As they’re in mostly Chinese, it came in handy for the example. Removing the unnecessary columns, we are left with this:
By using the formula =GOOGLETRANSLATE(D8, “auto”, “en”) in the appropriate columns, we’ll end up with a translated text.
Copying the formula down the sheet, and waiting a few moments, we end up with results. I also translated the link source page titles to further illustrate how useful these functions are:
In our work, we would now be easily able to classify the anchor texts in the right groupings.
Auto-translating Google Reader replacement
While Google Reader is no more, Spreadsheets can use the ImportFeed formula to import RSS or Atom feeds.
=ImportFeed(URL, [feedQuery | itemQuery], [headers], [numItems]). Formula arguments are the following:
URL is the url of the RSS or ATOM feed.
feedQuery/itemQuery is one of the following query strings: “feed”, “feed title”, “feed author”, “feed description”, “feed url”, “items”, “items author”, “items title”, “items summary”, “items url”, or “items created”. The feed queries return feed properties; the feed’s title, the feed’s author, etc. If you want the feed data, do an “items” request.
the “feed” query returns a single row with all of the feed information
the “feed ” query returns a single cell with the requested feed information
the “items” query returns a full table, with all of the item information about each item in the feed
the “items ” query returns a single column with the requested information about each item
using a “feed” query, the numItems parameter isn’t necessary and is replaced by the option headers param
with an “items” query, the numItems parameter is expected as the third parameter, and headers as the fourth
headers – “true” if column headers is desired. This will add an extra row to the top of the output labeling each column of the output
Building the spreadsheet
I decided to grab content from Spin Sucks for this example:
Now for translating the contents of the feed. I picked Swedish (my birth language) by using the formula =GOOGLETRANSLATE(E4, “auto”, “sv”)
And the results are predictably poor but understandable Swedish:
The above is of course a very basic implementation of the formulas, but gives you a starting point to develop from.
Other useful import queries
IMPORTXML: Imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.
IMPORTRANGE: Imports a range of cells from a specified spreadsheet.
IMPORTHTML: Imports data from a table or list within an HTML page.
IMPORTDATA: Imports data at a given url in .csv (comma-separated value) or .tsv (tab-separated value) format.