Why is JSoup timing out at random places in my code?Why is subtracting these two times (in 1927) giving a strange result?Why does this code using random strings print “hello world”?jsoup posting JavaGWT 2.5.1 and Kindle paperwhite user agentHow Spring MVC make HttpServletRequest field threadsafe?Spring Java servlet return incorrect user agentHow to save the body content of New York Times links using jsoupWhy is executing Java code in comments with certain Unicode characters allowed?Jsoup catchdata appear unknowhost exception ,and can`t ping the website ,but my web browser can visitScrapy, can't crawl any page: “TCP connection timed out: 110: Connection timed out.”

Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?

Can a virus destroy the BIOS of a modern computer?

Is it possible to download Internet Explorer on my Mac running OS X El Capitan?

Stopping power of mountain vs road bike

What exploit are these user agents trying to use?

Is the Joker left-handed?

Western buddy movie with a supernatural twist where a woman turns into an eagle at the end

A reference to a well-known characterization of scattered compact spaces

Is it canonical bit space?

Can I use a neutral wire from another outlet to repair a broken neutral?

RG-213 Cable with electric strained wire as metallic shield of Coaxial cable

If human space travel is limited by the G force vulnerability, is there a way to counter G forces?

How to say in German "enjoying home comforts"

Why is Collection not simply treated as Collection<?>

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

Neighboring nodes in the network

What is the word for reserving something for yourself before others do?

How badly should I try to prevent a user from XSSing themselves?

What does it mean to describe someone as a butt steak?

How can I tell someone that I want to be his or her friend?

What is the PIE reconstruction for word-initial alpha with rough breathing?

What is going on with Captain Marvel's blood colour?

Can a rocket refuel on Mars from water?

Infinite Abelian subgroup of infinite non Abelian group example



Why is JSoup timing out at random places in my code?


Why is subtracting these two times (in 1927) giving a strange result?Why does this code using random strings print “hello world”?jsoup posting JavaGWT 2.5.1 and Kindle paperwhite user agentHow Spring MVC make HttpServletRequest field threadsafe?Spring Java servlet return incorrect user agentHow to save the body content of New York Times links using jsoupWhy is executing Java code in comments with certain Unicode characters allowed?Jsoup catchdata appear unknowhost exception ,and can`t ping the website ,but my web browser can visitScrapy, can't crawl any page: “TCP connection timed out: 110: Connection timed out.”






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



Here is an example of a connection I am performing (all connections follow this same format).



doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


This is the error I am getting










share|improve this question






























    0















    I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



    I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



    The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



    Here is an example of a connection I am performing (all connections follow this same format).



    doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


    This is the error I am getting










    share|improve this question


























      0












      0








      0


      0






      I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



      I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



      The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



      Here is an example of a connection I am performing (all connections follow this same format).



      doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


      This is the error I am getting










      share|improve this question
















      I am currently trying to use JSoup in Java to scrape retrosheets.org for a baseball coding project I am working on.



      I perform multiple JSoup connections in my code, and some of these connections are done in a loop (therefore are executed many many times). So, in total, I'm making hundreds of connections in my program to scrape the necessary data.



      The program works for ~5 seconds but then gets hung up on a connection (a different one each time). Then, when I try to access the website separately in my browser the website will not load. What could be causing this? Is there an issue with performing too many connections?



      Here is an example of a connection I am performing (all connections follow this same format).



      doc = Jsoup.connect("https://www.retrosheet.org/boxesetc/index.html").maxBodySize(0).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15").get();


      This is the error I am getting







      java web-scraping connection timeout jsoup






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 8 at 23:43







      Jacob Snyder

















      asked Mar 8 at 23:33









      Jacob SnyderJacob Snyder

      32




      32






















          1 Answer
          1






          active

          oldest

          votes


















          0














          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer























          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072441%2fwhy-is-jsoup-timing-out-at-random-places-in-my-code%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer























          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41















          0














          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer























          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41













          0












          0








          0







          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.






          share|improve this answer













          This is most definitely load protection on the target website side - it detects too many requests from same IP and blocks it for a while or throttles number of connections/requests from that IP. That's why you can't open the website in the browser as well - it's not about JSoup or Java at all, it's about connections/requests from your IP to target website being blocked/throttled.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 8 at 23:50









          mvmnmvmn

          1,7691423




          1,7691423












          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41

















          • Is there a way around this? Thank you for the answer.

            – Jacob Snyder
            Mar 9 at 0:00











          • Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

            – mvmn
            Mar 9 at 0:04











          • Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

            – mvmn
            Mar 9 at 0:06












          • P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

            – mvmn
            Mar 9 at 11:41
















          Is there a way around this? Thank you for the answer.

          – Jacob Snyder
          Mar 9 at 0:00





          Is there a way around this? Thank you for the answer.

          – Jacob Snyder
          Mar 9 at 0:00













          Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

          – mvmn
          Mar 9 at 0:04





          Well, you could throttle your requests - e.g. insert delays in the code that does them. Also you could implement retries (optionally with a delay between retries as well). Also there might be a problem with a number of connections you create - JSoup will probably not reuse connections, but if you use Commons HTTPClient with a connection pooling connection manager - that one will. You could retrieve HTML via Commons HTTPClient and then use JSoup for parsing only (not using it's HTTP client capabilities). Best - do all of this (delays + retries + Commons HTTPClient for retrieval).

          – mvmn
          Mar 9 at 0:04













          Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

          – mvmn
          Mar 9 at 0:06






          Here's the method to parse a String as HTML via JSoup (base URL parameter is there to allow JSoup provide absolute URLs from relative ones BTW): jsoup.org/apidocs/org/jsoup/…

          – mvmn
          Mar 9 at 0:06














          P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

          – mvmn
          Mar 9 at 11:41





          P.S. If my answer properly addresses your problem - would you mind upvoting it and/or marking it as a correct answer? Thanks!

          – mvmn
          Mar 9 at 11:41



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072441%2fwhy-is-jsoup-timing-out-at-random-places-in-my-code%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

          List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229