scrap a table from a non-html website with R yet examples shown are for hmtl2019 Community Moderator ElectionScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R

Why do Australian milk farmers need to protest supermarkets' milk price?

Python if-else code style for reduced code for rounding floats

Use void Apex method in Lightning Web Component

What's the meaning of “spike” in the context of “adrenaline spike”?

Do I need to be arrogant to get ahead?

Gravity magic - How does it work?

What approach do we need to follow for projects without a test environment?

If I can solve Sudoku can I solve Travelling Salesman Problem(TSP)? If yes, how?

Do the common programs (for example: "ls", "cat") in Linux and BSD come from the same source code?

Should we release the security issues we found in our product as CVE or we can just update those on weekly release notes?

Brexit - No Deal Rejection

PTIJ: Who should I vote for? (21st Knesset Edition)

How do I hide Chekhov's Gun?

Did Ender ever learn that he killed Stilson and/or Bonzo?

Is it normal that my co-workers at a fitness company criticize my food choices?

A Cautionary Suggestion

Hacking a Safe Lock after 3 tries

Interplanetary conflict, some disease destroys the ability to understand or appreciate music

In a future war, an old lady is trying to raise a boy but one of the weapons has made everyone deaf

A link redirect to http instead of https: how critical is it?

Life insurance that covers only simultaneous/dual deaths

Why do passenger jet manufacturers design their planes with stall prevention systems?

How to read the value of this capacitor?

Can I use USB data pins as power source



scrap a table from a non-html website with R yet examples shown are for hmtl



2019 Community Moderator ElectionScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R










0















I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.










share|improve this question
























  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14















0















I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.










share|improve this question
























  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14













0












0








0








I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.










share|improve this question
















I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:



https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp



Yet, I am following something I should not but do not find any answer. This is what I have tried:



library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)


And get this error:




Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'




I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.







r web-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 7 at 14:52







GaB

















asked Mar 7 at 14:03









GaBGaB

1058




1058












  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14

















  • It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

    – camille
    Mar 7 at 15:14
















It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14





It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14












1 Answer
1






active

oldest

votes


















3














Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer























  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55045643%2fscrap-a-table-from-a-non-html-website-with-r-yet-examples-shown-are-for-hmtl%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer























  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18















3














Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer























  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18













3












3








3







Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham






share|improve this answer













Actually, that's pretty straightfoward (based on @lukeA's answer):



library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY


Selectorgadget can be installed here: Selectorgadget by Hadley Wickham







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 7 at 14:11









ha_puha_pu

2578




2578












  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18

















  • I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

    – GaB
    Mar 7 at 14:50






  • 1





    I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

    – ha_pu
    Mar 7 at 14:56












  • thank you ha_pu

    – GaB
    Mar 12 at 14:18
















I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50





I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50




1




1





I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56






I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56














thank you ha_pu

– GaB
Mar 12 at 14:18





thank you ha_pu

– GaB
Mar 12 at 14:18



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55045643%2fscrap-a-table-from-a-non-html-website-with-r-yet-examples-shown-are-for-hmtl%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229