AutoTools Easily parse data from a website (even if it needs you to be logged in) with AutoTools Regex

Learn how you can easily get data from a website by parsing its contents while being logged in

  1. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
  2. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    Hi,
    I just tried to parse the travel duration from a google maps route but it seems I fail at generating the correct regex. :(
    This is the route for example.
    The snippet I'm aiming at is
    Code (Text):
    <div class="section-directions-trip-duration delay-light" jstcache="324" jsan="7.section-directions-trip-duration,7.delay-light"> <span jstcache="325">7 Std. 50 Min.</span> <span style="display:none" jstcache="326">schätzungsweise <span jstcache="327"></span></span> </div>
    This is the regex I put together with the https://www.debuggex.com/.
    Code (Text):
    7.section-directions-trip-duration,7.+?[^<]+=".+?">([^<]+)
    Group1 in debuggex is properly recognized.
    Then I change that to
    Code (Text):
    7.section-directions-trip-duration,7.+?[^<]+=".+?">(?<dura>[^.]+)
    to assign it to $dura().
    When running the task I get an Error, Regex doesnt match text.

    My intention is to check different gmroutes to determine the one with the lowest travel time.
    I want to put this on a button and run it prior to departure to use the best route.
    I know there is a gmaps API which is doing this easier, but the free usage is timely limited
    and it seems this solution is doing the job also. No need to feed google with more of my data.

    Thanks in advance for your assistance! :)
     
  3. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    Hi. Did you try enabling the option to run Javascript in the AutoTools Regex action? that google maps page most likely needs that :)
     
  4. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    Yes, I pretty much tried every option available. :confused:
     
  5. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    Wait, you changed
    Code (Text):
    ([^<]+)
    to
    Code (Text):
    (?<dura>[^.]+)
    Why did you change the matching part? :) Maybe that's the issue?
     
  6. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    The change doesn't really matter.
    It will either catch the . after Min. or not.
     
  7. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    Just to confirm, if you don't set the group name it'll work?
     
  8. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    Well, at least I think it does. :D
    This is what debuggex is showing.
    I followed your example to find the correct regex but, admittedly, that might be an issue.
     

    Attached Files:

  9. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    Thank you, but I mean in Tasker :) If you use the first regex in Tasker do you get the results you expect? Meaning, can you get the result in the existing regex output variables?
     
  10. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    Unfortunately I don't know how to check that!
    How to check:
    1. What is the content that is downloaded via the link passed to tasker (the route link)?
    2. What regex is "grabbed" if I don't assign it to a variable (the first regex you refer to)?
    3. What is the content of the variable?
     
  11. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    1. Can you please export your task's description (not xml) so I can take a look? Long-click the task in Tasker->export description Thanks in advance
    2. The action will always output %regexmatch and %regexgroups() variables :) You can check those
    3. same as above
     
  12. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    Thanks for clarifying.
    The %regexmatch seems not to match.
    So I saved the page locally and searched the htm for the regex and I couldn't find them.
    There is also no simple and obvious pattern for a regex.
    I tried to remember how I found the snippet and it seems I got it from within the DOM inspector.
    Any idea how to parse the site in a similar way as the DOM inspector does?
     
  13. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    Did you try using the HTML Read action instead? :) That's precisely meant to parse HTML pages....
     
  14. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    OK, good idea! I just set up an html read action.
    It still seems the content parsed is different than the DOM inspection I'm looking at - see first picture.
    The jsoup demo is also not showing the content I expected. If html read is parsing in the same manner it will not work.
     

    Attached Files:

  15. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    Yeah I'm guessing that most of the content is rendered via javascript which may be problematic...
     
  16. Maussschubser

    Maussschubser New Member

    Joined:
    Jan 22, 2020
    Messages:
    8
    Likes Received:
    0
    But AutoTools Regex has an option to "Use Javascript".
    Shouldn't that do the job? :confused:
     
  17. joaomgcd

    joaomgcd Administrator Staff Member

    Joined:
    Feb 3, 2015
    Messages:
    9,479
    Likes Received:
    806
    Yep, it should in theory!
     

Share This Page