How to scrape a Shopify store into Google Sheets
If you need to scrape a Shopify store to get product price, reviews or other data, you may do it yourself using the Google Sheets IMPORTXML formula. It is pretty easy and doesn't require programming knowledge. Though, if you need to extract all product details at large scale, you may face the limitations of the tool making it impossible to solve your task.
In this article, we’ll review the possibilities and limitations of scraping Shopify and other eCommerce stores with the Google Sheets functionality. Also, we’ll look at an alternative solution allowing you to get complete product data for your specific business task.
Using Google Sheets IMPORTXML formula
A little bit of theory about the IMPORTXML function from the Google Help Center: “IMPORTXML imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.” Sample usage:
The IMPORTXML formula is formatted with two arguments:
- URL of the webpage you intend to extract data from.
- XPath of the element in which the data is contained.
XPath stands for XML Path Language and can be used to navigate through elements and attributes in an XML document. Further in the article we’ll show you how to identify the XPath of the page element (product name, price, description, etc.)
Now we’ll show you how to apply this formula in practice to extract product details from a Shopify store.
Steps to scrape a Shopify store to Google Sheets
Here is a product page we’ll try to scrape:
Shopify Store to be Scraped to Google Sheets
Let’s begin with retrieving product title:
- Create a Google Sheet.
- Add a column name and paste the URL.
- Get XPath for the product title. For that, select the title, right click to bring up the menu and click “Inspect”:
- Apply the formula
=IMPORTXML (URL, XPath_query)and press Enter.
Copy the Shopify Store URL to a Google Sheet
Select the Shopify product title extraction and in the context menu click “Inspect”
In the Chrome dev tool, you’ll see the highlighted text with tags. Right click here and in the menu press Copy > Copy XPath:
Copy XPath to clipboard
Paste it to the corresponding cell:
Copy title XPath to Google Sheets
*Select corresponding cells for URL and XPath_query. In our example its cell A2 for the URL, and B2 for the XPath:
Apply the formula and press Enter to get the product title scraped to Google Sheets
Now when the product’s name is in the sheet, we can extract the product’s price and review in the same way. For that, repeat steps 2-4 for each entity. Here what we’ve got in our document:
Repeat the steps to scrape all product details
Why does the “Imported content is empty” issue occur
When we try to extract image URL in the same way, though, we get the “Imported content is empty” issue (it can also be “The imported XML content can not be parsed”):
The “Imported content is empty” error
These issues may be caused by the incorrect XPath. Try to select the element more precisely and copy its Xpath again. If the error persists, the reason may be that you are trying to extract dynamic content. And Google Sheets formula doesn’t work for dynamic websites.
That was the reason why we couldn’t fetch its URL. Thus we’ve stumbled upon one of the limitations of scraping an eCommerce website with the IMPORTXML function.
Limitations of the IMPORTXML function
The biggest limitation of the import formula is that you can’t scrape dynamic pages that load the content through Java/API.
Google Sheets and other static web scrapers can only scrape the page if its content is displayed on the initial page request, but not the content is generated after.
Other limitations you may face:
- It’s not always easy to identify the correct XPath query.
- IMPORTXML & Sheets cannot be used to scrape large quantities of data at scale (it will stop functioning).
- Import formulas cannot work with data behind a password wall.
If you plan on importing the extracted data into your online store, it would also take time to clean up the data and prepare the correct file structure for import.
An alternative way of scraping Shopify and other eCommerce websites to Google Sheets
In cases when it’s needed to scrape multiple dynamic websites with different structures and big data sets, a more robust solution has to be used. For cases like these, you can rely on the eScraper data extraction service.
eScraper combines powerful data extraction possibilities and manual tweaking of the data according to the client’s specific needs.
Here is what you get:
- Extraction of required data (specify it in the order form what fields you need in the result Google Sheet or attach a file template.)
- Scraping of all data including dynamic elements: product variations, multiple products images, pages with pagination, etc.
- Scraping of password-protected websites (given you’ve provided the login details).
- Cleaned up data in the format ready for import. You can order a file for import to Shopify, WooCommerce, PrestaShop, Magento or other shopping carts, ERP systems, etc. Just specify the format and we’ll tweak scraped data accordingly.
Here is an example of data extracted from our sample website with eScraper. As you can see, eScraper could fetch all product data including multiple image URLs and image Alt-tags:
All product details can be scraped to Google Sheets by eScraper
Here is the same data formatted for Shopify import. In this file, we’ve added the columns from the Shopify import template and filled them with the scraped data. Product description comes with HTML tags as required:
Scraped data formatted for import to Shopify
Try eScraper today for free! Submit your request via the online order form and get all data extracted to Google Sheets hassle-free!