scrap a table from a non-html website with R yet examples shown are for hmtl2019 Community Moderator ElectionScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R

Why do Australian milk farmers need to protest supermarkets' milk price?

Python if-else code style for reduced code for rounding floats

Use void Apex method in Lightning Web Component

What's the meaning of “spike” in the context of “adrenaline spike”?

Do I need to be arrogant to get ahead?

Gravity magic - How does it work?

What approach do we need to follow for projects without a test environment?

If I can solve Sudoku can I solve Travelling Salesman Problem(TSP)? If yes, how?

Do the common programs (for example: "ls", "cat") in Linux and BSD come from the same source code?

Should we release the security issues we found in our product as CVE or we can just update those on weekly release notes?

Brexit - No Deal Rejection

PTIJ: Who should I vote for? (21st Knesset Edition)

How do I hide Chekhov's Gun?

Did Ender ever learn that he killed Stilson and/or Bonzo?

Is it normal that my co-workers at a fitness company criticize my food choices?

A Cautionary Suggestion

Hacking a Safe Lock after 3 tries

Interplanetary conflict, some disease destroys the ability to understand or appreciate music

In a future war, an old lady is trying to raise a boy but one of the weapons has made everyone deaf

A link redirect to http instead of https: how critical is it?

Life insurance that covers only simultaneous/dual deaths

Why do passenger jet manufacturers design their planes with stall prevention systems?

How to read the value of this capacitor?

Can I use USB data pins as power source

scrap a table from a non-html website with R yet examples shown are for hmtl

2019 Community Moderator ElectionScraping a complex HTML table into a data.frame in RGetting with errors with ChIPQC - “Error in (function (classes, fdef, mtable)”Can't close mysql connections in Runable to find an inherited method for function ‘vif’ for signature ‘“integer”’plot graphs in RGetting the centroids of Lat and Longitude in a data frameHow would I load data from a website table into R environment?Variogram for a “gridded” dataCan't submit and scrap html table using rvestWebscraping in R with XML Package “Null” errorpid func doesn't work with pairwiseAlignment R

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1058

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1058

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1058

I have an issue. I am trying to scrap the two tables from a non-html website.
This is the website:

https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp

Yet, I am following something I should not but do not find any answer. This is what I have tried:

library(tidyverse)
library(rvest)
library(XML)
library(httr)



url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

poptable <- readHTMLTable(url, which = 1)

And get this error:

Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message: XML content does not seem to be XML:
'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'

I thought regardless of the asp website type, I still can use the function readHTMLTable. Is there any alternative to this. I haven't found any yet and struggled for hours to get something out.

r web-scraping

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1058

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1058

edited Mar 7 at 14:52

asked Mar 7 at 14:03

GaB

1058

asked Mar 7 at 14:03

GaB

1058

asked Mar 7 at 14:03

GaB

1058

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

It isn't a "non-HTML" site. Maybe the HTML is rendered dynamically, in which case there are tools like Selenium that can help and are documented in other SO questions

– camille
Mar 7 at 15:14

add a comment |

1 Answer
1

active

oldest

votes

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

2578

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55045643%2fscrap-a-table-from-a-non-html-website-with-r-yet-examples-shown-are-for-hmtl%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

2578

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

2578

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

2578

Actually, that's pretty straightfoward (based on @lukeA's answer):

library(rvest)

url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"

page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
 Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY

Selectorgadget can be installed here: Selectorgadget by Hadley Wickham

answered Mar 7 at 14:11

ha_pu

2578

answered Mar 7 at 14:11

ha_pu

2578

answered Mar 7 at 14:11

ha_pu

2578

answered Mar 7 at 14:11

ha_pu

2578

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

1

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

I see, it worked. Thank you. I haven't found any previous issues. Yet it was there. Can you pleas help, is the asp website a html? Because lukeA's answer shows it for an html table. What is the difference?

– GaB
Mar 7 at 14:50

I guest that all websites are html-pages with various elements. The tricky part is to identify the "names" those elements that you are interested in. That's what you can use the Selectorgadget for. With Selectorgadget I figured out that the relevant tables are "table" nodes and that's what I put into the html_nodes command. If you are not interested in tables, you might use html_text instead of html_table in the next step.

– ha_pu
Mar 7 at 14:56

thank you ha_pu

– GaB
Mar 12 at 14:18

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite

1 Answer
1

1 Answer
1

1 Answer
1