Using JS to scrape tables

  • 84 Views
  • Last Post 27 October 2020
burque505 posted this 22 October 2020

Here's a way (one of many) to scrape a table without using the 'Get Table' activity from the Toolbox. The table scraped is the w3schools table that is a frequent example.

w3schools table page

Javascript used:

var items = document.querySelectorAll('#customers > tbody > tr');
var docStr = "";
items.forEach(function(element){
    docStr += element.innerText + "\n";
    });
    
return docStr;

Debug message box (text is still tab separated. You can easily replace the tabs with commas in the javascript, and avoid both the MessageBox and the Replace activity from the String Utilities):

A view of the resulting .CSV file as opened in Excel (columns manually formatted for viewing):

Regards,
burque505

Order By: Standard | Newest | Votes
Intellibot Support posted this 24 October 2020

Hi,

Thanks for the Post. Good to know that you're trying the different options. Just one suggestion from my end, instead of using Static Wait For Time (8 seconds to load),  you can use WaitForCreate component available for the WebScreen( HTML Tables), so that it gets handled even if it takes more time or less time to load the page. As WaitForCreate doesn't wait till the mentioned time, if the page loads before the mentioned time it immediately returns True.

 

Thanks & Regards

Intellibot support

  • Liked by
  • winterlaite@gmail.com
burque505 posted this 24 October 2020

Thank you for the excellent suggestion!

"WaitForCreate" would make automations much more robust. I would really appreciate a brief example, though. The docs are not very thorough on that method. It would appear from just looking at it that a loop of some kind would be required, so control doesn't pass until the Boolean output is "True". Might you be able to help in that regard?

EDIT: It appears a loop is not required. A screenshot below shows execution times.

 

By the way, I believe the ability to execute Javascript in the browser is one of the best components of Intellibot. It adds flexibility not found in many other RPA platforms.

Best regards,

burque505

Intellibot Support posted this 27 October 2020

Hi,

WaitForCreate Component allows a user to configure the Wait to Create  for a specific webpage to load completely before execution.

You can design the flow based on the Boolean value return by WaitForCreate method followed by using decision component.

Please find the below screenshot for your reference.



Thanks & Regards,
Intellibot Support.

 

  • Liked by
  • admin@repcat.cat
RepCat posted this 27 October 2020

Thanks for the screenshot!

Regards,
RepCat

  • Liked by
  • support@intellibot.io
Close