SerpApi code challenge by rob-mccormick · Pull Request #394 · serpapi/code-challenge

rob-mccormick · 2026-06-25T10:25:58Z

Extract artwork from Google SERP page

Given a file path to a Google SERP page html file, the FileScraper will return a JSON list of artworks. Each artwork includes its:

name
extensions
link
image

In the search result page some artwork is shown with thumbnail images. For these pieces of artwork the thumbnail image is included as the artwork image.

rob-mccormick · 2026-06-25T10:31:32Z

@@ -0,0 +1,53 @@
+# frozen_string_literal: true
+
+require "ferrum"


I used Ferrum as a web driver because it's headless by default and simple to use (for this challenge at least).

But it's yet to reach a 1.0 release, so the Selenium web-driver could be used as an alternative.

rob-mccormick · 2026-06-25T10:34:19Z

+
+    document = Nokogiri::HTML(html)
+
+    artworks = document.css("g-loading-icon + div").children


This selector could be done only in CSS with g-loading-icon + div > *.

I used the children method instead as it is clearer what is being selected than using the > * selector.

rob-mccormick · 2026-06-25T10:35:28Z

+    artworks = document.css("g-loading-icon + div").children
+
+    result = artworks.map do |artwork|
+      extensions = artwork.css("img + div").children.map do |extension|


Like with artworks above, this selector could be done only in CSS with img + div > *.

I used the children method as it is clearer and consistent with the approach above.

rob-mccormick · 2026-06-25T10:42:47Z

+
+  private
+
+  def self.extract_html(file_path)


To begin with I was just parsing the html files with Nokogiri, but found I needed to use a browser to execute the JavaScript for the thumbnail images. So I switched to using a web driver to render the page.

I thought a separate method for extracting the HTML was useful because:

It could be extended to scrape the page in different ways, such as with or without a web driver.

Having the web driver code contained within this method makes it simpler to change the web driver (e.g. switch Ferrum for Selenium).

rob-mccormick · 2026-06-25T10:56:54Z

+
+      if file_path == "./files/van-gogh-paintings.html"
+        it "produces the expected response" do
+          @response["artworks"].each.with_index do |artwork, index|


This test could be simplified to:

expect(@response).to eq(expected_response)

I used this approach as it was better for debugging (you get a single wall of text when doing the assertion at the response level).

rob-mccormick added 15 commits June 24, 2026 21:21

Add and initialize RSpec

ed3e3f2

Add nokogiri

fc17485

Add basic tests

1267ae4

Scrape all but the thumbnail image

7a69e1c

Update image scraping and add expected json test

9106403

Improve test for expected response

95431f6

Fix error with empty extensions in artwork

61c218c

Add ferrum gem

82bc0be

Collect html document after JS execution

05247a7

Add 2 similar results pages

3f9dd52

Update tests to test all sample pages

d3abc1c

Update selectors to work in all example pages

a9d76ec

Make conditional clearer

b317e05

Add code comment

f7c964d

Improve expected response test and fix bug

325e46f

rob-mccormick commented Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SerpApi code challenge#394

SerpApi code challenge#394
rob-mccormick wants to merge 15 commits into
serpapi:masterfrom
rob-mccormick:master

rob-mccormick commented Jun 25, 2026

Uh oh!

rob-mccormick Jun 25, 2026

Uh oh!

rob-mccormick Jun 25, 2026

Uh oh!

rob-mccormick Jun 25, 2026

Uh oh!

rob-mccormick Jun 25, 2026

Uh oh!

rob-mccormick Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,53 @@
		# frozen_string_literal: true

		require "ferrum"


		document = Nokogiri::HTML(html)

		artworks = document.css("g-loading-icon + div").children

Uh oh!

Conversation

rob-mccormick commented Jun 25, 2026

Extract artwork from Google SERP page

Uh oh!

rob-mccormick Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

rob-mccormick Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

rob-mccormick Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

rob-mccormick Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

rob-mccormick Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant