By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Marth said:
nero said:

It uses lynx to dump the page source and with grep obtains the line where the ranking is. The it uses sed to only show the number. I can share with you how I did it so we can make both crawlers better. Not a great programmer skills but for it gets the job done. I can't execute every time because Amazon then tells me I'm a robot, so it fails to obtain the numbers xD

Thats interesting we use a similar approach but different environments. My Programm is a Windows Console programmed in C#. I also save the source code and then filter for the Ranking/Name of the Entries. I tried using the native WebClient in C# but ran into the same problem as you that amazon wants to check if Im a robot.

I found a silly workaround: Instead of directly accessing the amazon page I use input manipulation to save the source code. Because amazon does not care about you beeing a robot if your browser has cookies and a logged in profile. Its not beautiful but I have to access 5 times in quick succession and needed to avoid this check at all costs. ^^

Great workaround. Will try logging and saving cookies in a console environment to see if it works