From c26bb4df689a377af40c4fc8ff6f2de2daea482d Mon Sep 17 00:00:00 2001 From: Arto Gahr Date: Thu, 14 May 2026 20:24:43 +0200 Subject: [PATCH 1/4] docs: fix academy ikea highest price example prices --- .../03_devtools_extracting_data.md | 6 +++--- .../scraping_basics_python/03_devtools_extracting_data.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md b/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md index e774b7e1d7..fde2d094c2 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md @@ -81,7 +81,7 @@ In the next lesson, we'll start with our Node.js project. First we'll be figurin ### Extract the price of IKEA's most expensive artificial plant -At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as you'll be browsing their Swedish offer). Before opening DevTools, use your judgment to adjust the page to make the task as straightforward as possible. Finally, use the [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) function to convert the price text into a number. +At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as you'll be browsing their Swedish offer). Before opening DevTools, use your judgment to adjust the page to make the task as straightforward as possible. Finally, use the [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) function to convert the price text into a number (you may need [`replace()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) to handle spaces).
Solution @@ -93,8 +93,8 @@ At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/a 1. Notice that the price is structured into two elements, with the integer separated from the currency, under a class named `plp-price__integer`. This structure is convenient for extracting the value. 1. In the **Console**, execute `document.querySelector('.plp-price__integer')`. This returns the element representing the first price in the listing. Since `document.querySelector()` returns the first matching element, it directly selects the most expensive plant's price. 1. Save the element in a variable by executing `price = document.querySelector('.plp-price__integer')`. - 1. Convert the price text into a number by executing `parseInt(price.textContent)`. - 1. At the time of writing, this returns `699`, meaning [699 SEK](https://www.google.com/search?q=699%20sek). + 1. Convert the price text into a number by executing `parseInt(price.textContent.replace(' ', ''))`. Note that `replace(' ', '')` removes spaces from the price string before converting it to a number. + 1. At the time of writing, this returns `1299`, meaning [1 299 SEK](https://www.google.com/search?q=1299%20sek).
diff --git a/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md b/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md index f864362f8a..66f04d391d 100644 --- a/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md @@ -78,7 +78,7 @@ In the next lesson, we'll start with our Python project. First we'll be figuring ### Extract the price of IKEA's most expensive artificial plant -At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as you'll be browsing their Swedish offer). Before opening DevTools, use your judgment to adjust the page to make the task as straightforward as possible. Finally, use JavaScript's [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) function to convert the price text into a number. +At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as you'll be browsing their Swedish offer). Before opening DevTools, use your judgment to adjust the page to make the task as straightforward as possible. Finally, use JavaScript's [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) function to convert the price text into a number (you may need [`replace()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) to handle spaces).
Solution @@ -90,8 +90,8 @@ At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/a 1. Notice that the price is structured into two elements, with the integer separated from the currency, under a class named `plp-price__integer`. This structure is convenient for extracting the value. 1. In the **Console**, execute `document.querySelector('.plp-price__integer')`. This returns the element representing the first price in the listing. Since `document.querySelector()` returns the first matching element, it directly selects the most expensive plant's price. 1. Save the element in a variable by executing `price = document.querySelector('.plp-price__integer')`. - 1. Convert the price text into a number by executing `parseInt(price.textContent)`. - 1. At the time of writing, this returns `699`, meaning [699 SEK](https://www.google.com/search?q=699%20sek). + 1. Convert the price text into a number by executing `parseInt(price.textContent.replace(' ', ''))`. Note that `replace(' ', '')` removes spaces from the price string before converting it to a number. + 1. At the time of writing, this returns `1299`, meaning [1 299 SEK](https://www.google.com/search?q=1299%20sek).
From 9cf81820389cdf3ca5b4f24ed70970519be3edc7 Mon Sep 17 00:00:00 2001 From: Arto Gahr Date: Fri, 15 May 2026 10:23:16 +0200 Subject: [PATCH 2/4] docs: the guardian first post might not have a lead paragraph --- .../scraping_basics_javascript/03_devtools_extracting_data.md | 4 ++-- .../scraping_basics_python/03_devtools_extracting_data.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md b/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md index fde2d094c2..bc5ad52138 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md @@ -119,7 +119,7 @@ On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selecto ### Extract details about the first post on Guardian's F1 news -On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Specifically, extract its title, lead paragraph, and URL of the associated photo. +On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Specifically, extract its title, lead paragraph (if it has one), and URL of the associated photo. ![F1 news page](../scraping_basics/images/devtools-exercise-guardian2.png) @@ -132,7 +132,7 @@ On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), 1. Notice that the markup does not provide clear, reusable class names for this task. The structure uses generic tag names and randomized classes, requiring you to rely on the element hierarchy and order instead. 1. In the **Console**, execute `post = document.querySelector('#maincontent ul li')`. This returns the element representing the first post. 1. Extract the post's title by executing `post.querySelector('h3').textContent`. - 1. Extract the lead paragraph by executing `post.querySelector('span div').textContent`. + 1. Extract the lead paragraph (if it has one) by executing `post.querySelector('span div').textContent`. 1. Extract the photo URL by executing `post.querySelector('img').src`. diff --git a/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md b/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md index 66f04d391d..853a067997 100644 --- a/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md @@ -116,7 +116,7 @@ On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selecto ### Extract details about the first post on Guardian's F1 news -On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Specifically, extract its title, lead paragraph, and URL of the associated photo. +On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Specifically, extract its title, lead paragraph (if it has one), and URL of the associated photo. ![F1 news page](../scraping_basics/images/devtools-exercise-guardian2.png) @@ -129,7 +129,7 @@ On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), 1. Notice that the markup does not provide clear, reusable class names for this task. The structure uses generic tag names and randomized classes, requiring you to rely on the element hierarchy and order instead. 1. In the **Console**, execute `post = document.querySelector('#maincontent ul li')`. This returns the element representing the first post. 1. Extract the post's title by executing `post.querySelector('h3').textContent`. - 1. Extract the lead paragraph by executing `post.querySelector('span div').textContent`. + 1. Extract the lead paragraph (if it has one) by executing `post.querySelector('span div').textContent`. 1. Extract the photo URL by executing `post.querySelector('img').src`. From cd76c6f9f1431bc40f615f69356b139a3069813a Mon Sep 17 00:00:00 2001 From: Arto Gahr Date: Sat, 16 May 2026 11:40:53 +0200 Subject: [PATCH 3/4] docs: fix broken cheerio docs links --- .../scraping_basics_javascript/06_locating_elements.md | 8 ++++---- .../scraping_basics_javascript/07_extracting_data.md | 2 +- .../scraping_basics_legacy/crawling/finding_links.md | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md b/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md index fedd418abd..b4d1f8124f 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md +++ b/sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md @@ -39,7 +39,7 @@ if (response.ok) { } ``` -Calling [`toArray()`](https://cheerio.js.org/docs/api/classes/Cheerio#toarray) converts the Cheerio selection to a standard JavaScript array. We can then loop over that array and process each selected element. +Calling [`toArray()`](https://cheerio.js.org/docs/api/classes/cheerio#toarray) converts the Cheerio selection to a standard JavaScript array. We can then loop over that array and process each selected element. Cheerio requires us to wrap each element with `$()` again before we can work with it further, and then we call `.text()`. If we run the code, it… well, it definitely prints _something_… @@ -136,7 +136,7 @@ When translated to a tree of JavaScript objects, the element with class `price` - a `span` HTML element, - a textual node representing the actual amount and possibly also white space. -We can use Cheerio's [`.contents()`](https://cheerio.js.org/docs/api/classes/Cheerio#contents) method to access individual nodes. It returns a list of nodes like this: +We can use Cheerio's [`.contents()`](https://cheerio.js.org/docs/api/classes/cheerio#contents) method to access individual nodes. It returns a list of nodes like this: ```text LoadedCheerio { @@ -197,7 +197,7 @@ if (response.ok) { } ``` -We're enjoying the fact that Cheerio selections provide utility methods for accessing items, such as [`.first()`](https://cheerio.js.org/docs/api/classes/Cheerio#first) or [`.last()`](https://cheerio.js.org/docs/api/classes/Cheerio#last). If we run the scraper now, it should print prices as only amounts: +We're enjoying the fact that Cheerio selections provide utility methods for accessing items, such as [`.first()`](https://cheerio.js.org/docs/api/classes/cheerio#first) or [`.last()`](https://cheerio.js.org/docs/api/classes/cheerio#last). If we run the scraper now, it should print prices as only amounts: ```text $ node index.js @@ -237,7 +237,7 @@ Macao, China :::tip Need a nudge? -You may want to check out Cheerio's [`.eq()`](https://cheerio.js.org/docs/api/classes/Cheerio#eq). +You may want to check out Cheerio's [`.eq()`](https://cheerio.js.org/docs/api/classes/cheerio#eq). ::: diff --git a/sources/academy/webscraping/scraping_basics_javascript/07_extracting_data.md b/sources/academy/webscraping/scraping_basics_javascript/07_extracting_data.md index 74b440b4bc..d728da6954 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/07_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_javascript/07_extracting_data.md @@ -290,7 +290,7 @@ Hamilton reveals distress over ‘devastating’ groundhog accident at Canadian :::tip Need a nudge? - HTML's `time` element can have an attribute `datetime`, which [contains data in a machine-readable format](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/time), such as the ISO 8601. -- Cheerio gives you [.attr()](https://cheerio.js.org/docs/api/classes/Cheerio#attr) to access attributes. +- Cheerio gives you [.attr()](https://cheerio.js.org/docs/api/classes/cheerio#attr) to access attributes. - In JavaScript you can use an ISO 8601 string to create a [`Date`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date) object. - To get the date, you can call `.toDateString()` on `Date` objects. diff --git a/sources/academy/webscraping/scraping_basics_legacy/crawling/finding_links.md b/sources/academy/webscraping/scraping_basics_legacy/crawling/finding_links.md index d185e8b01a..3b9c03aad0 100644 --- a/sources/academy/webscraping/scraping_basics_legacy/crawling/finding_links.md +++ b/sources/academy/webscraping/scraping_basics_legacy/crawling/finding_links.md @@ -57,7 +57,7 @@ We'll start from a boilerplate that's very similar to the scraper we built in [B {Example} -Aside from importing libraries and downloading HTML, we load the HTML into Cheerio and then use it to retrieve all the `` elements. After that, we iterate over the collected links and print their `href` attributes, which we access using the [`.attr()`](https://cheerio.js.org/docs/api/classes/Cheerio#attr) method. +Aside from importing libraries and downloading HTML, we load the HTML into Cheerio and then use it to retrieve all the `` elements. After that, we iterate over the collected links and print their `href` attributes, which we access using the [`.attr()`](https://cheerio.js.org/docs/api/classes/cheerio#attr) method. When you run the above code, you'll see quite a lot of links in the terminal. Some of them may look wrong, because they don't start with the regular `https://` protocol. We'll learn what to do with them in the following lessons. From b8e7768854e90c9f903282faa045114797683f0b Mon Sep 17 00:00:00 2001 From: Arto Gahr Date: Mon, 18 May 2026 17:24:45 +0200 Subject: [PATCH 4/4] docs: add more relevant links to the ikea scraping task --- .../scraping_basics_javascript/03_devtools_extracting_data.md | 2 +- .../scraping_basics_python/03_devtools_extracting_data.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md b/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md index bc5ad52138..da431dfc19 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_javascript/03_devtools_extracting_data.md @@ -93,7 +93,7 @@ At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/a 1. Notice that the price is structured into two elements, with the integer separated from the currency, under a class named `plp-price__integer`. This structure is convenient for extracting the value. 1. In the **Console**, execute `document.querySelector('.plp-price__integer')`. This returns the element representing the first price in the listing. Since `document.querySelector()` returns the first matching element, it directly selects the most expensive plant's price. 1. Save the element in a variable by executing `price = document.querySelector('.plp-price__integer')`. - 1. Convert the price text into a number by executing `parseInt(price.textContent.replace(' ', ''))`. Note that `replace(' ', '')` removes spaces from the price string before converting it to a number. + 1. Convert the price text into a number by executing `parseInt(price.textContent.replace(' ', ''))`. The price text contains spaces as thousand separators, so `.replace(' ', '')` strips them before parsing - a technique you'll explore further in the [Extracting data](./07_extracting_data.md#removing-dollar-sign-and-commas) lesson. 1. At the time of writing, this returns `1299`, meaning [1 299 SEK](https://www.google.com/search?q=1299%20sek). diff --git a/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md b/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md index 853a067997..ad21d99501 100644 --- a/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md @@ -90,7 +90,7 @@ At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/a 1. Notice that the price is structured into two elements, with the integer separated from the currency, under a class named `plp-price__integer`. This structure is convenient for extracting the value. 1. In the **Console**, execute `document.querySelector('.plp-price__integer')`. This returns the element representing the first price in the listing. Since `document.querySelector()` returns the first matching element, it directly selects the most expensive plant's price. 1. Save the element in a variable by executing `price = document.querySelector('.plp-price__integer')`. - 1. Convert the price text into a number by executing `parseInt(price.textContent.replace(' ', ''))`. Note that `replace(' ', '')` removes spaces from the price string before converting it to a number. + 1. Convert the price text into a number by executing `parseInt(price.textContent.replace(' ', ''))`. The price text contains spaces as thousand separators, so `.replace(' ', '')` strips them before parsing - a technique you'll explore further in the [Extracting data](./07_extracting_data.md#removing-dollar-sign-and-commas) lesson. 1. At the time of writing, this returns `1299`, meaning [1 299 SEK](https://www.google.com/search?q=1299%20sek).