Azure Databricks Spark XML Library - Trying to read xml files The Next CEO of Stack OverflowSpark 2.0 with Zeppelin 0.6.1 - SQLContext not availableWindows (Spyder): How to read csv file using pysparkWeird error in initializing sparkContext pythonKinesisUtils.createStream error in Spark streaming + KinesisWhy does Spark-XML on AWS Glue fail with AbstractMethodError?How should I load file on s3 using Spark?KeyError: '1' after zip method - following learning pyspark tutorialKafka Stream to Spark Stream using PySparkError while trying to use count method at a rdd with pysparkIllegalArgumentException: u“Option 'basePath' must be a directory”
Where do students learn to solve polynomial equations these days?
Is it possible to replace duplicates of a character with one character using tr
What does "Its cash flow is deeply negative" mean?
Can Plant Growth be repeatedly cast on the same area to exponentially increase the yield of harvests there (more than twice)?
How to avoid supervisors with prejudiced views?
How many extra stops do monopods offer for tele photographs?
Find non-case sensitive string in a mixed list of elements?
Why don't programming languages automatically manage the synchronous/asynchronous problem?
0-rank tensor vs vector in 1D
RigExpert AA-35 - Interpreting The Information
Why do airplanes bank sharply to the right after air-to-air refueling?
Do I need to write [sic] when a number is less than 10 but isn't written out?
Can we say or write : "No, it'sn't"?
Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?
Using Rolle's theorem to show an equation has only one real root
How to edit “Name” property in GCI output?
How to invert MapIndexed on a ragged structure? How to construct a tree from rules?
Flying from Cape Town to England and return to another province
How to write a definition with variants?
How to count occurrences of text in a file?
Is it ever safe to open a suspicious HTML file (e.g. email attachment)?
Is wanting to ask what to write an indication that you need to change your story?
TikZ: How to reverse arrow direction without switching start/end point?
Is it my responsibility to learn a new technology in my own time my employer wants to implement?
Azure Databricks Spark XML Library - Trying to read xml files
The Next CEO of Stack OverflowSpark 2.0 with Zeppelin 0.6.1 - SQLContext not availableWindows (Spyder): How to read csv file using pysparkWeird error in initializing sparkContext pythonKinesisUtils.createStream error in Spark streaming + KinesisWhy does Spark-XML on AWS Glue fail with AbstractMethodError?How should I load file on s3 using Spark?KeyError: '1' after zip method - following learning pyspark tutorialKafka Stream to Spark Stream using PySparkError while trying to use count method at a rdd with pysparkIllegalArgumentException: u“Option 'basePath' must be a directory”
I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.
df = (spark.read.format("xml")
.option("rootTag","catalog")
.option("rowTag", "book")
.load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))
Exception Details:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
StackTrace:
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
format, schema, **options)
164 self.options(**options)
165 if isinstance(path, basestring):
--> 166 return self._df(self._jreader.load(path))
167 elif path is not None:
168 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
__call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id,
self.name)
1258
Are there any other dependencies I need to define for parsing the xml? Appreciate the help.
apache-spark azure-databricks
add a comment |
I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.
df = (spark.read.format("xml")
.option("rootTag","catalog")
.option("rowTag", "book")
.load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))
Exception Details:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
StackTrace:
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
format, schema, **options)
164 self.options(**options)
165 if isinstance(path, basestring):
--> 166 return self._df(self._jreader.load(path))
167 elif path is not None:
168 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
__call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id,
self.name)
1258
Are there any other dependencies I need to define for parsing the xml? Appreciate the help.
apache-spark azure-databricks
add a comment |
I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.
df = (spark.read.format("xml")
.option("rootTag","catalog")
.option("rowTag", "book")
.load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))
Exception Details:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
StackTrace:
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
format, schema, **options)
164 self.options(**options)
165 if isinstance(path, basestring):
--> 166 return self._df(self._jreader.load(path))
167 elif path is not None:
168 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
__call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id,
self.name)
1258
Are there any other dependencies I need to define for parsing the xml? Appreciate the help.
apache-spark azure-databricks
I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.
df = (spark.read.format("xml")
.option("rootTag","catalog")
.option("rowTag", "book")
.load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))
Exception Details:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
StackTrace:
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
format, schema, **options)
164 self.options(**options)
165 if isinstance(path, basestring):
--> 166 return self._df(self._jreader.load(path))
167 elif path is not None:
168 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
__call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id,
self.name)
1258
Are there any other dependencies I need to define for parsing the xml? Appreciate the help.
apache-spark azure-databricks
apache-spark azure-databricks
asked Mar 8 at 16:20
Satya AzureSatya Azure
939
939
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55067117%2fazure-databricks-spark-xml-library-trying-to-read-xml-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.
add a comment |
Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.
add a comment |
Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.
Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.
answered Mar 9 at 15:41
Satya AzureSatya Azure
939
939
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55067117%2fazure-databricks-spark-xml-library-trying-to-read-xml-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown