Unable to paginate through all bing API resultsParsing Bing News Search API ResultsNarrow Bing News API search by “publish time”How to get number of search result from Bing APICount number of results, which search is better? Google custom search API or Bing API?Bing API v1 documentationBing search API results decreases with increase in web.offset(skip)valueError in Bing Search APIWhat's the expected behavior of the Bing Search API v5 when deeply paginating?Missing parameter “q” in bing image search APIBing image search APIs give different result that Bing Image web search page

Is there a RAID 0 Equivalent for RAM?

Why does a simple loop result in ASYNC_NETWORK_IO waits?

Mixing PEX brands

Hero deduces identity of a killer

Can I still be respawned if I die by falling off the map?

How does a computer interpret real numbers?

Why did the EU agree to delay the Brexit deadline?

Open a doc from terminal, but not by its name

How do apertures which seem too large to physically fit work?

Non-trope happy ending?

What are the balance implications behind making invisible things auto-hide?

Can I say "fingers" when referring to toes?

Can a College of Swords bard use a Blade Flourish option on an opportunity attack provoked by their own Dissonant Whispers spell?

What does "Scientists rise up against statistical significance" mean? (Comment in Nature)

Angel of Condemnation - Exile creature with second ability

Add big quotation marks inside my colorbox

Quoting Keynes in a lecture

What is the highest possible scrabble score for placing a single tile

Has any country ever had 2 former presidents in jail simultaneously?

PTIJ: Haman's bad computer

Does the UK parliament need to pass secondary legislation to accept the Article 50 extension

Creepy dinosaur pc game identification

putting logo on same line but after title, latex

What should you do if you miss a job interview (deliberately)?



Unable to paginate through all bing API results


Parsing Bing News Search API ResultsNarrow Bing News API search by “publish time”How to get number of search result from Bing APICount number of results, which search is better? Google custom search API or Bing API?Bing API v1 documentationBing search API results decreases with increase in web.offset(skip)valueError in Bing Search APIWhat's the expected behavior of the Bing Search API v5 when deeply paginating?Missing parameter “q” in bing image search APIBing image search APIs give different result that Bing Image web search page













0















I'm currently using the Bing Web Search API v7 to query Bing for search results. As per the API docs, the parameters count and offset are used to paginate through the results, the total number of which are defined in the results themselves by the value of totalEstimatedMatches.



As below from the documentation:




totalEstimatedMatches: The estimated number of webpages that are relevant to the query. Use this number along with the count and offset query parameters to page the results.




This seems to work up to a point, after which the API just continues to return the exact same results over and over, regardless of the values of count and offset.



In my specific case, the totalEstimatedMatches was set at 330,000. With a count of 50 (i.e. 50 results per request) the results begin repeating at around offset 700 i.e. 3,500 results into the estimated 330,000.



In playing with the bing front end, I have noticed a similar behaviour once the page count get sufficiently high e.g.




  • https://www.bing.com/search?q=feed%3amp3&first=1&FORM=PERE - initial search, estimated 51,000 results


  • https://www.bing.com/search?q=feed%3amp3&first=1000&FORM=PERE - first
    1000, should get results 1000 to 1010 but returns same results as url below


  • https://www.bing.com/search?q=feed%3amp3&first=2000&FORM=PERE - first = 2000, should get results 2000 to 2010 but returns same results as url above

Am I using the API incorrectly or is this just some sort of limitation or bug in which the totalEstimatedMatches is just way off?










share|improve this question


























    0















    I'm currently using the Bing Web Search API v7 to query Bing for search results. As per the API docs, the parameters count and offset are used to paginate through the results, the total number of which are defined in the results themselves by the value of totalEstimatedMatches.



    As below from the documentation:




    totalEstimatedMatches: The estimated number of webpages that are relevant to the query. Use this number along with the count and offset query parameters to page the results.




    This seems to work up to a point, after which the API just continues to return the exact same results over and over, regardless of the values of count and offset.



    In my specific case, the totalEstimatedMatches was set at 330,000. With a count of 50 (i.e. 50 results per request) the results begin repeating at around offset 700 i.e. 3,500 results into the estimated 330,000.



    In playing with the bing front end, I have noticed a similar behaviour once the page count get sufficiently high e.g.




    • https://www.bing.com/search?q=feed%3amp3&first=1&FORM=PERE - initial search, estimated 51,000 results


    • https://www.bing.com/search?q=feed%3amp3&first=1000&FORM=PERE - first
      1000, should get results 1000 to 1010 but returns same results as url below


    • https://www.bing.com/search?q=feed%3amp3&first=2000&FORM=PERE - first = 2000, should get results 2000 to 2010 but returns same results as url above

    Am I using the API incorrectly or is this just some sort of limitation or bug in which the totalEstimatedMatches is just way off?










    share|improve this question
























      0












      0








      0








      I'm currently using the Bing Web Search API v7 to query Bing for search results. As per the API docs, the parameters count and offset are used to paginate through the results, the total number of which are defined in the results themselves by the value of totalEstimatedMatches.



      As below from the documentation:




      totalEstimatedMatches: The estimated number of webpages that are relevant to the query. Use this number along with the count and offset query parameters to page the results.




      This seems to work up to a point, after which the API just continues to return the exact same results over and over, regardless of the values of count and offset.



      In my specific case, the totalEstimatedMatches was set at 330,000. With a count of 50 (i.e. 50 results per request) the results begin repeating at around offset 700 i.e. 3,500 results into the estimated 330,000.



      In playing with the bing front end, I have noticed a similar behaviour once the page count get sufficiently high e.g.




      • https://www.bing.com/search?q=feed%3amp3&first=1&FORM=PERE - initial search, estimated 51,000 results


      • https://www.bing.com/search?q=feed%3amp3&first=1000&FORM=PERE - first
        1000, should get results 1000 to 1010 but returns same results as url below


      • https://www.bing.com/search?q=feed%3amp3&first=2000&FORM=PERE - first = 2000, should get results 2000 to 2010 but returns same results as url above

      Am I using the API incorrectly or is this just some sort of limitation or bug in which the totalEstimatedMatches is just way off?










      share|improve this question














      I'm currently using the Bing Web Search API v7 to query Bing for search results. As per the API docs, the parameters count and offset are used to paginate through the results, the total number of which are defined in the results themselves by the value of totalEstimatedMatches.



      As below from the documentation:




      totalEstimatedMatches: The estimated number of webpages that are relevant to the query. Use this number along with the count and offset query parameters to page the results.




      This seems to work up to a point, after which the API just continues to return the exact same results over and over, regardless of the values of count and offset.



      In my specific case, the totalEstimatedMatches was set at 330,000. With a count of 50 (i.e. 50 results per request) the results begin repeating at around offset 700 i.e. 3,500 results into the estimated 330,000.



      In playing with the bing front end, I have noticed a similar behaviour once the page count get sufficiently high e.g.




      • https://www.bing.com/search?q=feed%3amp3&first=1&FORM=PERE - initial search, estimated 51,000 results


      • https://www.bing.com/search?q=feed%3amp3&first=1000&FORM=PERE - first
        1000, should get results 1000 to 1010 but returns same results as url below


      • https://www.bing.com/search?q=feed%3amp3&first=2000&FORM=PERE - first = 2000, should get results 2000 to 2010 but returns same results as url above

      Am I using the API incorrectly or is this just some sort of limitation or bug in which the totalEstimatedMatches is just way off?







      bing bing-api






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked May 21 '18 at 13:05









      user783836user783836

      1,0901323




      1,0901323






















          2 Answers
          2






          active

          oldest

          votes


















          1














          totalEstimatedMatches provides total number of matches for that query around the web - that includes duplicate results and near similar content as well.



          In order to optimize indexing all search engines restrict results to top N webpages. This is what you are seeing. This behavior is consistent across all the search engines as typically near all the users change a query/select a webpage/abandon within 2-3 search pages.



          In short, this is not a bug/incorrect implementation but optimization of index that's restricting you from getting more results. If you really need to get more results, you can use the related searches and append the unique webpages.






          share|improve this answer























          • I would have thought the API would allow full access to the index though even if the site doesn't?

            – user783836
            May 21 '18 at 19:23











          • API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs.

            – Ronak
            May 22 '18 at 18:11


















          0














          Technically this isn't a direct answer to the question as asked. Hopefully it's helpful to provide a way to paginate efficiently through Bing's API without having to use the "totalEstimatedMatches" return value which, as the other answer explains, can behave really unpredictably:
          Here's some python:



          class ApiWorker(object):
          def __init__(self, q):
          self.q = q
          self.offset = 0
          self.result_hashes = set()
          self.finished = False

          def calc_next_offset(self, resp_urls):
          before_adding = len(self.result_hashes)
          self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
          after_adding = len(self.result_hashes)
          if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
          self.finished = True
          else:
          self.offset += len(new_results)

          def page_through_results(self, *args, **kwargs):
          while not self.finished:
          new_resp_urls = ...<call_logic>...
          self.calc_next_offset(new_resp_urls)
          ...<save logic>...
          print(f'All unique results for q=self.q have been obtained.')


          This^ will stop paginating as soon as a full response of duplicates have been obtained.






          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50449584%2funable-to-paginate-through-all-bing-api-results%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            totalEstimatedMatches provides total number of matches for that query around the web - that includes duplicate results and near similar content as well.



            In order to optimize indexing all search engines restrict results to top N webpages. This is what you are seeing. This behavior is consistent across all the search engines as typically near all the users change a query/select a webpage/abandon within 2-3 search pages.



            In short, this is not a bug/incorrect implementation but optimization of index that's restricting you from getting more results. If you really need to get more results, you can use the related searches and append the unique webpages.






            share|improve this answer























            • I would have thought the API would allow full access to the index though even if the site doesn't?

              – user783836
              May 21 '18 at 19:23











            • API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs.

              – Ronak
              May 22 '18 at 18:11















            1














            totalEstimatedMatches provides total number of matches for that query around the web - that includes duplicate results and near similar content as well.



            In order to optimize indexing all search engines restrict results to top N webpages. This is what you are seeing. This behavior is consistent across all the search engines as typically near all the users change a query/select a webpage/abandon within 2-3 search pages.



            In short, this is not a bug/incorrect implementation but optimization of index that's restricting you from getting more results. If you really need to get more results, you can use the related searches and append the unique webpages.






            share|improve this answer























            • I would have thought the API would allow full access to the index though even if the site doesn't?

              – user783836
              May 21 '18 at 19:23











            • API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs.

              – Ronak
              May 22 '18 at 18:11













            1












            1








            1







            totalEstimatedMatches provides total number of matches for that query around the web - that includes duplicate results and near similar content as well.



            In order to optimize indexing all search engines restrict results to top N webpages. This is what you are seeing. This behavior is consistent across all the search engines as typically near all the users change a query/select a webpage/abandon within 2-3 search pages.



            In short, this is not a bug/incorrect implementation but optimization of index that's restricting you from getting more results. If you really need to get more results, you can use the related searches and append the unique webpages.






            share|improve this answer













            totalEstimatedMatches provides total number of matches for that query around the web - that includes duplicate results and near similar content as well.



            In order to optimize indexing all search engines restrict results to top N webpages. This is what you are seeing. This behavior is consistent across all the search engines as typically near all the users change a query/select a webpage/abandon within 2-3 search pages.



            In short, this is not a bug/incorrect implementation but optimization of index that's restricting you from getting more results. If you really need to get more results, you can use the related searches and append the unique webpages.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 21 '18 at 17:09









            RonakRonak

            646310




            646310












            • I would have thought the API would allow full access to the index though even if the site doesn't?

              – user783836
              May 21 '18 at 19:23











            • API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs.

              – Ronak
              May 22 '18 at 18:11

















            • I would have thought the API would allow full access to the index though even if the site doesn't?

              – user783836
              May 21 '18 at 19:23











            • API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs.

              – Ronak
              May 22 '18 at 18:11
















            I would have thought the API would allow full access to the index though even if the site doesn't?

            – user783836
            May 21 '18 at 19:23





            I would have thought the API would allow full access to the index though even if the site doesn't?

            – user783836
            May 21 '18 at 19:23













            API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs.

            – Ronak
            May 22 '18 at 18:11





            API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs.

            – Ronak
            May 22 '18 at 18:11













            0














            Technically this isn't a direct answer to the question as asked. Hopefully it's helpful to provide a way to paginate efficiently through Bing's API without having to use the "totalEstimatedMatches" return value which, as the other answer explains, can behave really unpredictably:
            Here's some python:



            class ApiWorker(object):
            def __init__(self, q):
            self.q = q
            self.offset = 0
            self.result_hashes = set()
            self.finished = False

            def calc_next_offset(self, resp_urls):
            before_adding = len(self.result_hashes)
            self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
            after_adding = len(self.result_hashes)
            if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
            self.finished = True
            else:
            self.offset += len(new_results)

            def page_through_results(self, *args, **kwargs):
            while not self.finished:
            new_resp_urls = ...<call_logic>...
            self.calc_next_offset(new_resp_urls)
            ...<save logic>...
            print(f'All unique results for q=self.q have been obtained.')


            This^ will stop paginating as soon as a full response of duplicates have been obtained.






            share|improve this answer



























              0














              Technically this isn't a direct answer to the question as asked. Hopefully it's helpful to provide a way to paginate efficiently through Bing's API without having to use the "totalEstimatedMatches" return value which, as the other answer explains, can behave really unpredictably:
              Here's some python:



              class ApiWorker(object):
              def __init__(self, q):
              self.q = q
              self.offset = 0
              self.result_hashes = set()
              self.finished = False

              def calc_next_offset(self, resp_urls):
              before_adding = len(self.result_hashes)
              self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
              after_adding = len(self.result_hashes)
              if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
              self.finished = True
              else:
              self.offset += len(new_results)

              def page_through_results(self, *args, **kwargs):
              while not self.finished:
              new_resp_urls = ...<call_logic>...
              self.calc_next_offset(new_resp_urls)
              ...<save logic>...
              print(f'All unique results for q=self.q have been obtained.')


              This^ will stop paginating as soon as a full response of duplicates have been obtained.






              share|improve this answer

























                0












                0








                0







                Technically this isn't a direct answer to the question as asked. Hopefully it's helpful to provide a way to paginate efficiently through Bing's API without having to use the "totalEstimatedMatches" return value which, as the other answer explains, can behave really unpredictably:
                Here's some python:



                class ApiWorker(object):
                def __init__(self, q):
                self.q = q
                self.offset = 0
                self.result_hashes = set()
                self.finished = False

                def calc_next_offset(self, resp_urls):
                before_adding = len(self.result_hashes)
                self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
                after_adding = len(self.result_hashes)
                if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
                self.finished = True
                else:
                self.offset += len(new_results)

                def page_through_results(self, *args, **kwargs):
                while not self.finished:
                new_resp_urls = ...<call_logic>...
                self.calc_next_offset(new_resp_urls)
                ...<save logic>...
                print(f'All unique results for q=self.q have been obtained.')


                This^ will stop paginating as soon as a full response of duplicates have been obtained.






                share|improve this answer













                Technically this isn't a direct answer to the question as asked. Hopefully it's helpful to provide a way to paginate efficiently through Bing's API without having to use the "totalEstimatedMatches" return value which, as the other answer explains, can behave really unpredictably:
                Here's some python:



                class ApiWorker(object):
                def __init__(self, q):
                self.q = q
                self.offset = 0
                self.result_hashes = set()
                self.finished = False

                def calc_next_offset(self, resp_urls):
                before_adding = len(self.result_hashes)
                self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
                after_adding = len(self.result_hashes)
                if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
                self.finished = True
                else:
                self.offset += len(new_results)

                def page_through_results(self, *args, **kwargs):
                while not self.finished:
                new_resp_urls = ...<call_logic>...
                self.calc_next_offset(new_resp_urls)
                ...<save logic>...
                print(f'All unique results for q=self.q have been obtained.')


                This^ will stop paginating as soon as a full response of duplicates have been obtained.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 8 at 1:24









                Rob TruxalRob Truxal

                1,4282722




                1,4282722



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50449584%2funable-to-paginate-through-all-bing-api-results%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

                    Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

                    List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229