Home » Useful Articles » How to scrape a Shopify store into Google Sheets

How to scrape a Shopify store into Google Sheets

5/5 - (1 vote)

(2)
By eMagicOne
article-img-sect-1

If you need to scrape a Shopify store to get product price, reviews or other data, you may do it yourself using the Google Sheets IMPORTXML formula. It is pretty easy and doesn't require programming knowledge. Though, if you need to extract all product details at large scale, you may face the limitations of the tool making it impossible to solve your task.

article-sect-1

In this article, we’ll review the possibilities and limitations of scraping Shopify and other eCommerce stores with the Google Sheets functionality. Also, we’ll look at an alternative solution allowing you to get complete product data for your specific business task.

Using Google Sheets IMPORTXML formula

A little bit of theory about the IMPORTXML function from the Google Help Center: “IMPORTXML imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.” Sample usage:

IMPORTXML("https://en.wikipedia.org/wiki/Moon_landing", "//a/@href")

IMPORTXML(A2,B2)

The IMPORTXML formula is formatted with two arguments:

  • URL of the webpage you intend to extract data from.
  • XPath of the element in which the data is contained.

XPath stands for XML Path Language and can be used to navigate through elements and attributes in an XML document. Further in the article we’ll show you how to identify the XPath of the page element (product name, price, description, etc.)

Now we’ll show you how to apply this formula in practice to extract product details from a Shopify store.

Steps to scrape a Shopify store to Google Sheets

Here is a product page we’ll try to scrape:

Shopify Store to be Scraped to Google Sheets

Shopify Store to be Scraped to Google Sheets

Let’s begin with retrieving product title:

  1. Create a Google Sheet.
  2. Add a column name and paste the URL.
  3. Copying Shopify Store URL to a Google Sheet

    Copy the Shopify Store URL to a Google Sheet

  4. Get XPath for the product title. For that, select the title, right click to bring up the menu and click “Inspect”:
  5. Shopify Product Title Extraction Inspect

    Select the Shopify product title extraction and in the context menu click “Inspect”

    In the Chrome dev tool, you’ll see the highlighted text with tags. Right click here and in the menu press Copy > Copy XPath:

    Shopify Title Scraping Copy XPath

    Copy XPath to clipboard

    Paste it to the corresponding cell:

    Copying Shopify Product Title XPath to Google Sheets

    Copy title XPath to Google Sheets

  6. Apply the formula =IMPORTXML (URL, XPath_query) and press Enter.
  7. *Select corresponding cells for URL and XPath_query. In our example its cell A2 for the URL, and B2 for the XPath:

    Shopify Product Title Scraped to Google Sheets

    Apply the formula and press Enter to get the product title scraped to Google Sheets

Now when the product’s name is in the sheet, we can extract the product’s price and review in the same way. For that, repeat steps 2-4 for each entity. Here what we’ve got in our document:

Shopify Product Details Scraped to Google Sheets

Repeat the steps to scrape all product details

Why does the “Imported content is empty” issue occur

When we try to extract image URL in the same way, though, we get the “Imported content is empty” issue (it can also be “The imported XML content can not be parsed”):

Shopify Scraping to Google Sheets Error

The “Imported content is empty” error

These issues may be caused by the incorrect XPath. Try to select the element more precisely and copy its Xpath again. If the error persists, the reason may be that you are trying to extract dynamic content. And Google Sheets formula doesn’t work for dynamic websites.

Dynamic content includes such kinds of UI interaction on the page as drop-downs with product variants, “See more” sections, multiple images, pagination and other. To detect dynamic elements, try disabling Java for the page. We’ve disabled JavaScript for our source sample page and the image isn’t displayed now:

Shopify Site Disabled Java

Image isn’t displayed when Javascript is disabled

That was the reason why we couldn’t fetch its URL. Thus we’ve stumbled upon one of the limitations of scraping an eCommerce website with the IMPORTXML function.

Limitations of the IMPORTXML function

The biggest limitation of the import formula is that you can’t scrape dynamic pages that load the content through Java/API.

Google Sheets and other static web scrapers can only scrape the page if its content is displayed on the initial page request, but not the content is generated after.

Other limitations you may face:

  • It’s not always easy to identify the correct XPath query.
  • IMPORTXML & Sheets cannot be used to scrape large quantities of data at scale (it will stop functioning).
  • Import formulas cannot work with data behind a password wall.

If you plan on importing the extracted data into your online store, it would also take time to clean up the data and prepare the correct file structure for import.

An alternative way of scraping Shopify and other eCommerce websites to Google Sheets

In cases when it’s needed to scrape multiple dynamic websites with different structures and big data sets, a more robust solution has to be used. For cases like these, you can rely on the eScraper data extraction service.

eScraper combines powerful data extraction possibilities and manual tweaking of the data according to the client’s specific needs.

Here is what you get:

  • Extraction of required data (specify it in the order form what fields you need in the result Google Sheet or attach a file template.)
  • Scraping of all data including dynamic elements: product variations, multiple products images, pages with pagination, etc.
  • Scraping of password-protected websites (given you’ve provided the login details).
  • Cleaned up data in the format ready for import. You can order a file for import to Shopify, WooCommerce, PrestaShop, Magento or other shopping carts, ERP systems, etc. Just specify the format and we’ll tweak scraped data accordingly.

Here is an example of data extracted from our sample website with eScraper. As you can see, eScraper could fetch all product data including multiple image URLs and image Alt-tags:

Shopify Product Details All Scraped to Google Sheets by eScraper

All product details can be scraped to Google Sheets by eScraper

Here is the same data formatted for Shopify import. In this file, we’ve added the columns from the Shopify import template and filled them with the scraped data. Product description comes with HTML tags as required:

Shopify Scraped Data Import

Scraped data formatted for import to Shopify

Try eScraper today for free! Submit your request via the online order form and get all data extracted to Google Sheets hassle-free!

Was this article helpful?

5/5 - (1 vote)

Scrape Shopify to Google Sheets with eScraper

Try eScraper

COMMENTS
Rick
Rick • 12.29.2021 at 16:18

Your tutorial helped me to scrape some more content for my eCommerce store. Thank you so much. But frankly say for more complicated task Google Sheet and IMPORTXML is more like 5th wheel to my car)))

eScraper • 01.03.2022 at 11:45

Glad to know the article was helpful to you. More complicated tasks, though, can be solved with more advanced tools. If you have such a task, we’ll be glad to help. Fill in the request form and we’ll prepare a free test file for you.

img-form