Azure Databricks Spark XML Library - Trying to read xml files The Next CEO of Stack OverflowSpark 2.0 with Zeppelin 0.6.1 - SQLContext not availableWindows (Spyder): How to read csv file using pysparkWeird error in initializing sparkContext pythonKinesisUtils.createStream error in Spark streaming + KinesisWhy does Spark-XML on AWS Glue fail with AbstractMethodError?How should I load file on s3 using Spark?KeyError: '1' after zip method - following learning pyspark tutorialKafka Stream to Spark Stream using PySparkError while trying to use count method at a rdd with pysparkIllegalArgumentException: u“Option 'basePath' must be a directory”

Where do students learn to solve polynomial equations these days?

Is it possible to replace duplicates of a character with one character using tr

What does "Its cash flow is deeply negative" mean?

Can Plant Growth be repeatedly cast on the same area to exponentially increase the yield of harvests there (more than twice)?

How to avoid supervisors with prejudiced views?

How many extra stops do monopods offer for tele photographs?

Find non-case sensitive string in a mixed list of elements?

Why don't programming languages automatically manage the synchronous/asynchronous problem?

0-rank tensor vs vector in 1D

RigExpert AA-35 - Interpreting The Information

Why do airplanes bank sharply to the right after air-to-air refueling?

Do I need to write [sic] when a number is less than 10 but isn't written out?

Can we say or write : "No, it'sn't"?

Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?

Using Rolle's theorem to show an equation has only one real root

How to edit “Name” property in GCI output?

How to invert MapIndexed on a ragged structure? How to construct a tree from rules?

Flying from Cape Town to England and return to another province

How to write a definition with variants?

How to count occurrences of text in a file?

Is it ever safe to open a suspicious HTML file (e.g. email attachment)?

Is wanting to ask what to write an indication that you need to change your story?

TikZ: How to reverse arrow direction without switching start/end point?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

Azure Databricks Spark XML Library - Trying to read xml files

The Next CEO of Stack OverflowSpark 2.0 with Zeppelin 0.6.1 - SQLContext not availableWindows (Spyder): How to read csv file using pysparkWeird error in initializing sparkContext pythonKinesisUtils.createStream error in Spark streaming + KinesisWhy does Spark-XML on AWS Glue fail with AbstractMethodError?How should I load file on s3 using Spark?KeyError: '1' after zip method - following learning pyspark tutorialKafka Stream to Spark Stream using PySparkError while trying to use count method at a rdd with pysparkIllegalArgumentException: u“Option 'basePath' must be a directory”

I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.

df = (spark.read.format("xml")
 .option("rootTag","catalog") 
 .option("rowTag", "book") 
 .load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


 Exception Details:

 java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

 StackTrace: 

 /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, 
 format, schema, **options)
 164 self.options(**options)
 165 if isinstance(path, basestring):
 --> 166 return self._df(self._jreader.load(path))
 167 elif path is not None:
 168 if type(path) != list:

 /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in 
 __call__(self, *args)
 1255 answer = self.gateway_client.send_command(command)
 1256 return_value = get_return_value(
 -> 1257 answer, self.gateway_client, self.target_id, 
 self.name)
 1258

Are there any other dependencies I need to define for parsing the xml? Appreciate the help.

asked Mar 8 at 16:20

Satya Azure

939

add a comment |

df = (spark.read.format("xml")
 .option("rootTag","catalog") 
 .option("rowTag", "book") 
 .load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


 Exception Details:

 java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

 StackTrace: 

 /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, 
 format, schema, **options)
 164 self.options(**options)
 165 if isinstance(path, basestring):
 --> 166 return self._df(self._jreader.load(path))
 167 elif path is not None:
 168 if type(path) != list:

 /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in 
 __call__(self, *args)
 1255 answer = self.gateway_client.send_command(command)
 1256 return_value = get_return_value(
 -> 1257 answer, self.gateway_client, self.target_id, 
 self.name)
 1258

Are there any other dependencies I need to define for parsing the xml? Appreciate the help.

asked Mar 8 at 16:20

Satya Azure

939

add a comment |

df = (spark.read.format("xml")
 .option("rootTag","catalog") 
 .option("rowTag", "book") 
 .load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


 Exception Details:

 java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

 StackTrace: 

 /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, 
 format, schema, **options)
 164 self.options(**options)
 165 if isinstance(path, basestring):
 --> 166 return self._df(self._jreader.load(path))
 167 elif path is not None:
 168 if type(path) != list:

 /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in 
 __call__(self, *args)
 1255 answer = self.gateway_client.send_command(command)
 1256 return_value = get_return_value(
 -> 1257 answer, self.gateway_client, self.target_id, 
 self.name)
 1258

Are there any other dependencies I need to define for parsing the xml? Appreciate the help.

asked Mar 8 at 16:20

Satya Azure

939

df = (spark.read.format("xml")
 .option("rootTag","catalog") 
 .option("rowTag", "book") 
 .load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


 Exception Details:

 java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

 StackTrace: 

 /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, 
 format, schema, **options)
 164 self.options(**options)
 165 if isinstance(path, basestring):
 --> 166 return self._df(self._jreader.load(path))
 167 elif path is not None:
 168 if type(path) != list:

 /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in 
 __call__(self, *args)
 1255 answer = self.gateway_client.send_command(command)
 1256 return_value = get_return_value(
 -> 1257 answer, self.gateway_client, self.target_id, 
 self.name)
 1258

Are there any other dependencies I need to define for parsing the xml? Appreciate the help.

apache-spark azure-databricks

asked Mar 8 at 16:20

Satya Azure

939

asked Mar 8 at 16:20

Satya Azure

939

asked Mar 8 at 16:20

Satya Azure

939

asked Mar 8 at 16:20

Satya Azure

939

asked Mar 8 at 16:20

Satya Azure

939

add a comment |

1 Answer
1

active

oldest

votes

Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.

answered Mar 9 at 15:41

Satya Azure

939

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55067117%2fazure-databricks-spark-xml-library-trying-to-read-xml-files%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Mar 9 at 15:41

Satya Azure

939

add a comment |

answered Mar 9 at 15:41

Satya Azure

939

add a comment |

answered Mar 9 at 15:41

Satya Azure

939

answered Mar 9 at 15:41

Satya Azure

939

answered Mar 9 at 15:41

Satya Azure

939

answered Mar 9 at 15:41

Satya Azure

939

answered Mar 9 at 15:41

Satya Azure

939

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

1 Answer
1

1 Answer
1

1 Answer
1