Azure Databricks Spark XML Library - Trying to read xml files The Next CEO of Stack OverflowSpark 2.0 with Zeppelin 0.6.1 - SQLContext not availableWindows (Spyder): How to read csv file using pysparkWeird error in initializing sparkContext pythonKinesisUtils.createStream error in Spark streaming + KinesisWhy does Spark-XML on AWS Glue fail with AbstractMethodError?How should I load file on s3 using Spark?KeyError: '1' after zip method - following learning pyspark tutorialKafka Stream to Spark Stream using PySparkError while trying to use count method at a rdd with pysparkIllegalArgumentException: u“Option 'basePath' must be a directory”

Where do students learn to solve polynomial equations these days?

Is it possible to replace duplicates of a character with one character using tr

What does "Its cash flow is deeply negative" mean?

Can Plant Growth be repeatedly cast on the same area to exponentially increase the yield of harvests there (more than twice)?

How to avoid supervisors with prejudiced views?

How many extra stops do monopods offer for tele photographs?

Find non-case sensitive string in a mixed list of elements?

Why don't programming languages automatically manage the synchronous/asynchronous problem?

0-rank tensor vs vector in 1D

RigExpert AA-35 - Interpreting The Information

Why do airplanes bank sharply to the right after air-to-air refueling?

Do I need to write [sic] when a number is less than 10 but isn't written out?

Can we say or write : "No, it'sn't"?

Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?

Using Rolle's theorem to show an equation has only one real root

How to edit “Name” property in GCI output?

How to invert MapIndexed on a ragged structure? How to construct a tree from rules?

Flying from Cape Town to England and return to another province

How to write a definition with variants?

How to count occurrences of text in a file?

Is it ever safe to open a suspicious HTML file (e.g. email attachment)?

Is wanting to ask what to write an indication that you need to change your story?

TikZ: How to reverse arrow direction without switching start/end point?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?



Azure Databricks Spark XML Library - Trying to read xml files



The Next CEO of Stack OverflowSpark 2.0 with Zeppelin 0.6.1 - SQLContext not availableWindows (Spyder): How to read csv file using pysparkWeird error in initializing sparkContext pythonKinesisUtils.createStream error in Spark streaming + KinesisWhy does Spark-XML on AWS Glue fail with AbstractMethodError?How should I load file on s3 using Spark?KeyError: '1' after zip method - following learning pyspark tutorialKafka Stream to Spark Stream using PySparkError while trying to use count method at a rdd with pysparkIllegalArgumentException: u“Option 'basePath' must be a directory”










1















I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.



df = (spark.read.format("xml")
.option("rootTag","catalog")
.option("rowTag", "book")
.load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


Exception Details:

java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

StackTrace:

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
format, schema, **options)
164 self.options(**options)
165 if isinstance(path, basestring):
--> 166 return self._df(self._jreader.load(path))
167 elif path is not None:
168 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
__call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id,
self.name)
1258


Are there any other dependencies I need to define for parsing the xml? Appreciate the help.










share|improve this question


























    1















    I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.



    df = (spark.read.format("xml")
    .option("rootTag","catalog")
    .option("rowTag", "book")
    .load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


    Exception Details:

    java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

    StackTrace:

    /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
    format, schema, **options)
    164 self.options(**options)
    165 if isinstance(path, basestring):
    --> 166 return self._df(self._jreader.load(path))
    167 elif path is not None:
    168 if type(path) != list:

    /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
    __call__(self, *args)
    1255 answer = self.gateway_client.send_command(command)
    1256 return_value = get_return_value(
    -> 1257 answer, self.gateway_client, self.target_id,
    self.name)
    1258


    Are there any other dependencies I need to define for parsing the xml? Appreciate the help.










    share|improve this question
























      1












      1








      1


      0






      I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.



      df = (spark.read.format("xml")
      .option("rootTag","catalog")
      .option("rowTag", "book")
      .load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


      Exception Details:

      java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

      StackTrace:

      /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
      format, schema, **options)
      164 self.options(**options)
      165 if isinstance(path, basestring):
      --> 166 return self._df(self._jreader.load(path))
      167 elif path is not None:
      168 if type(path) != list:

      /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
      __call__(self, *args)
      1255 answer = self.gateway_client.send_command(command)
      1256 return_value = get_return_value(
      -> 1257 answer, self.gateway_client, self.target_id,
      self.name)
      1258


      Are there any other dependencies I need to define for parsing the xml? Appreciate the help.










      share|improve this question














      I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working.



      df = (spark.read.format("xml")
      .option("rootTag","catalog")
      .option("rowTag", "book")
      .load("adl://mysandbox.azuredatalakestore.net/Source/catalog.xml"))


      Exception Details:

      java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

      StackTrace:

      /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path,
      format, schema, **options)
      164 self.options(**options)
      165 if isinstance(path, basestring):
      --> 166 return self._df(self._jreader.load(path))
      167 elif path is not None:
      168 if type(path) != list:

      /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in
      __call__(self, *args)
      1255 answer = self.gateway_client.send_command(command)
      1256 return_value = get_return_value(
      -> 1257 answer, self.gateway_client, self.target_id,
      self.name)
      1258


      Are there any other dependencies I need to define for parsing the xml? Appreciate the help.







      apache-spark azure-databricks






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 8 at 16:20









      Satya AzureSatya Azure

      939




      939






















          1 Answer
          1






          active

          oldest

          votes


















          0














          Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55067117%2fazure-databricks-spark-xml-library-trying-to-read-xml-files%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.






            share|improve this answer



























              0














              Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.






              share|improve this answer

























                0












                0








                0







                Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.






                share|improve this answer













                Phew, Finally got the issue resolved. The error message doesn't give any details of the exception but the issue is with the version difference between the spark-xml library to the scala version of the cluster. I updated the library to match with my cluster version and the problem resolved. Hope it helps someone having the same issue.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 9 at 15:41









                Satya AzureSatya Azure

                939




                939





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55067117%2fazure-databricks-spark-xml-library-trying-to-read-xml-files%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

                    Identity Server 4 is not redirecting to Angular app after login2019 Community Moderator ElectionIdentity Server 4 and dockerIdentityserver implicit flow unauthorized_clientIdentityServer Hybrid Flow - Access Token is null after user successful loginIdentity Server to MVC client : Page Redirect After loginLogin with Steam OpenId(oidc-client-js)Identity Server 4+.NET Core 2.0 + IdentityIdentityServer4 post-login redirect not working in Edge browserCall to IdentityServer4 generates System.NullReferenceException: Object reference not set to an instance of an objectIdentityServer4 without HTTPS not workingHow to get Authorization code from identity server without login form

                    2005 Ahvaz unrest Contents Background Causes Casualties Aftermath See also References Navigation menue"At Least 10 Are Killed by Bombs in Iran""Iran"Archived"Arab-Iranians in Iran to make April 15 'Day of Fury'"State of Mind, State of Order: Reactions to Ethnic Unrest in the Islamic Republic of Iran.10.1111/j.1754-9469.2008.00028.x"Iran hangs Arab separatists"Iran Overview from ArchivedConstitution of the Islamic Republic of Iran"Tehran puzzled by forged 'riots' letter""Iran and its minorities: Down in the second class""Iran: Handling Of Ahvaz Unrest Could End With Televised Confessions""Bombings Rock Iran Ahead of Election""Five die in Iran ethnic clashes""Iran: Need for restraint as anniversary of unrest in Khuzestan approaches"Archived"Iranian Sunni protesters killed in clashes with security forces"Archived