In this article, we’ll review the possibilities and limitations of scraping Shopify and other eCommerce stores with the Google Sheets functionality. Also, we’ll look at an alternative solution allowing you to get complete product data for your specific business task.
A little bit of theory about the IMPORTXML function from the Google Help Center: “IMPORTXML imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.” Sample usage:
IMPORTXML("https://en.wikipedia.org/wiki/Moon_landing", "//a/@href")
IMPORTXML(A2,B2)
The IMPORTXML formula is formatted with two arguments:
XPath stands for XML Path Language and can be used to navigate through elements and attributes in an XML document. Further in the article we’ll show you how to identify the XPath of the page element (product name, price, description, etc.)
Now we’ll show you how to apply this formula in practice to extract product details from a Shopify store.
Here is a product page we’ll try to scrape:
Shopify Store to be Scraped to Google Sheets
Let’s begin with retrieving product title:
Copy the Shopify Store URL to a Google Sheet
Select the Shopify product title extraction and in the context menu click “Inspect”
In the Chrome dev tool, you’ll see the highlighted text with tags. Right click here and in the menu press Copy > Copy XPath:
Copy XPath to clipboard
Paste it to the corresponding cell:
Copy title XPath to Google Sheets
=IMPORTXML (URL, XPath_query)
and press Enter. *Select corresponding cells for URL and XPath_query. In our example its cell A2 for the URL, and B2 for the XPath:
Apply the formula and press Enter to get the product title scraped to Google Sheets
Now when the product’s name is in the sheet, we can extract the product’s price and review in the same way. For that, repeat steps 2-4 for each entity. Here what we’ve got in our document:
Repeat the steps to scrape all product details
When we try to extract image URL in the same way, though, we get the “Imported content is empty” issue (it can also be “The imported XML content can not be parsed”):
The “Imported content is empty” error
These issues may be caused by the incorrect XPath. Try to select the element more precisely and copy its Xpath again. If the error persists, the reason may be that you are trying to extract dynamic content. And Google Sheets formula doesn’t work for dynamic websites.
Dynamic content includes such kinds of UI interaction on the page as drop-downs with product variants, “See more” sections, multiple images, pagination and other. To detect dynamic elements, try disabling Java for the page. We’ve disabled JavaScript for our source sample page and the image isn’t displayed now:
Image isn’t displayed when Javascript is disabled
That was the reason why we couldn’t fetch its URL. Thus we’ve stumbled upon one of the limitations of scraping an eCommerce website with the IMPORTXML function.
The biggest limitation of the import formula is that you can’t scrape dynamic pages that load the content through Java/API.
Google Sheets and other static web scrapers can only scrape the page if its content is displayed on the initial page request, but not the content is generated after.
Other limitations you may face:
If you plan on importing the extracted data into your online store, it would also take time to clean up the data and prepare the correct file structure for import.
In cases when it’s needed to scrape multiple dynamic websites with different structures and big data sets, a more robust solution has to be used. For cases like these, you can rely on the eScraper data extraction service.
eScraper combines powerful data extraction possibilities and manual tweaking of the data according to the client’s specific needs.
Here is what you get:
Here is an example of data extracted from our sample website with eScraper. As you can see, eScraper could fetch all product data including multiple image URLs and image Alt-tags:
All product details can be scraped to Google Sheets by eScraper
Here is the same data formatted for Shopify import. In this file, we’ve added the columns from the Shopify import template and filled them with the scraped data. Product description comes with HTML tags as required:
Scraped data formatted for import to Shopify
Try eScraper today for free! Submit your request via the online order form and get all data extracted to Google Sheets hassle-free!
Do you need to extract product data from any marketplace like Alibaba, Amazon, FaceBook or…
Do you need to import products from any site, and you do not have a…
Store owners most often use eScraper to extract data for their eCommerce needs like building…
November 4, 2022 - eScraper announces the release of the eScraper data extraction software tailored…
Are you looking for a method to not only scrape Shopify image URLs but also…
And if you are interested in scraping Australian car marketplaces, chances are you’re looking for…
Your tutorial helped me to scrape some more content for my eCommerce store. Thank you so much. But frankly say for more complicated task Google Sheet and IMPORTXML is more like 5th wheel to my car)))
Glad to know the article was helpful to you. More complicated tasks, though, can be solved with more advanced tools. If you have such a task, we’ll be glad to help. Fill in the request form and we’ll prepare a free test file for you.