How to convert Row to Dataset in spark Java2019 Community Moderator ElectionIs Java “pass-by-reference” or “pass-by-value”?How do I efficiently iterate over each entry in a Java Map?What is the difference between public, protected, package-private and private in Java?Fastest way to determine if an integer's square root is an integerHow do I read / convert an InputStream into a String in Java?When to use LinkedList over ArrayList in Java?How do I generate random integers within a specific range in Java?How do I determine whether an array contains a particular value in Java?How do I convert a String to an int in Java?Creating a memory leak with Java
How is the Swiss post e-voting system supposed to work, and how was it wrong?
If I can solve Sudoku can I solve Travelling Salesman Problem(TSP)? If yes, how?
PTIJ: Who should I vote for? (21st Knesset Edition)
Combining an idiom with a metonymy
Gantt Chart like rectangles with log scale
What is the least resource intensive way to generate the luatex font cache for a new font?
Should we release the security issues we found in our product as CVE or we can just update those on weekly release notes?
Have researchers managed to "reverse time"? If so, what does that mean for physics?
Is a party consisting of only a bard, a cleric, and a warlock functional long-term?
Is it possible to upcast ritual spells?
Can I use USB data pins as power source
Why is the President allowed to veto a cancellation of emergency powers?
What approach do we need to follow for projects without a test environment?
What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?
Why one should not leave fingerprints on bulbs and plugs?
How to make healing in an exploration game interesting
Time travel from stationary position?
Define, (actually define) the "stability" and "energy" of a compound
How can I track script which gives me "command not found" right after the login?
My Graph Theory Students
What is the significance behind "40 days" that often appears in the Bible?
Unexpected result from ArcLength
A sequence that has integer values for prime indexes only:
How could a scammer know the apps on my phone / iTunes account?
How to convert Row to Dataset in spark Java
2019 Community Moderator ElectionIs Java “pass-by-reference” or “pass-by-value”?How do I efficiently iterate over each entry in a Java Map?What is the difference between public, protected, package-private and private in Java?Fastest way to determine if an integer's square root is an integerHow do I read / convert an InputStream into a String in Java?When to use LinkedList over ArrayList in Java?How do I generate random integers within a specific range in Java?How do I determine whether an array contains a particular value in Java?How do I convert a String to an int in Java?Creating a memory leak with Java
I'm Iterating a Dataset<Row>
using ForeachFunction
while in the iteration I don't know how to append some custom columns to the Row and and append it to another Dataset<Row>
in spark Java
Code:
groupedDataset.foreach((ForeachFunction<Row>) row ->
double average = //some value
// the Row has four columns
// All I want is to have a new Dataset<Row> with specific columns
// from the Row i.e row(0),row(1),row(3) and average value
Dataset<Row> newDs = row.getString("ID"),row.getString("time"),row.getInt("value"),average;
);
I have tried a lot but I couldn't able to solve it.
Thank you!
java apache-spark apache-spark-sql
add a comment |
I'm Iterating a Dataset<Row>
using ForeachFunction
while in the iteration I don't know how to append some custom columns to the Row and and append it to another Dataset<Row>
in spark Java
Code:
groupedDataset.foreach((ForeachFunction<Row>) row ->
double average = //some value
// the Row has four columns
// All I want is to have a new Dataset<Row> with specific columns
// from the Row i.e row(0),row(1),row(3) and average value
Dataset<Row> newDs = row.getString("ID"),row.getString("time"),row.getInt("value"),average;
);
I have tried a lot but I couldn't able to solve it.
Thank you!
java apache-spark apache-spark-sql
add a comment |
I'm Iterating a Dataset<Row>
using ForeachFunction
while in the iteration I don't know how to append some custom columns to the Row and and append it to another Dataset<Row>
in spark Java
Code:
groupedDataset.foreach((ForeachFunction<Row>) row ->
double average = //some value
// the Row has four columns
// All I want is to have a new Dataset<Row> with specific columns
// from the Row i.e row(0),row(1),row(3) and average value
Dataset<Row> newDs = row.getString("ID"),row.getString("time"),row.getInt("value"),average;
);
I have tried a lot but I couldn't able to solve it.
Thank you!
java apache-spark apache-spark-sql
I'm Iterating a Dataset<Row>
using ForeachFunction
while in the iteration I don't know how to append some custom columns to the Row and and append it to another Dataset<Row>
in spark Java
Code:
groupedDataset.foreach((ForeachFunction<Row>) row ->
double average = //some value
// the Row has four columns
// All I want is to have a new Dataset<Row> with specific columns
// from the Row i.e row(0),row(1),row(3) and average value
Dataset<Row> newDs = row.getString("ID"),row.getString("time"),row.getInt("value"),average;
);
I have tried a lot but I couldn't able to solve it.
Thank you!
java apache-spark apache-spark-sql
java apache-spark apache-spark-sql
edited Mar 7 at 17:40
gudok
2,79121324
2,79121324
asked Mar 7 at 14:23
VigneshVignesh
1011221
1011221
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Rows are not supposed to be modified directly (it is possible but not convenient). When manipulating dataframes (Dataset of rows), you are supposed to use the SparkSQL API for two main reasons: 1. it's easy to use 2. it allows spark to perform a lot of optimizations on your requests.
Now, here is an example that seem to look like what you are trying to achieve. Basically I create a dataset with three columns and I use a select to average the result of two of them and discard the last one. Let me know if you need more details.
SparkSession spark = SparkSession.builder().getOrCreate();
Dataset<Row> data = spark
.range(10)
.select(col("id").as("id"),
col("id").cast("string").as("str"),
col("id").plus(5).as("id5") );
data.show();
Dataset<Row> result = data
.select(col("id"), col("id5"),
col("id").plus(col("id5")).divide(2).as("avg"));
result.show();
which yields:
+---+---+---+
| id|str|id5|
+---+---+---+
| 0| 0| 5|
| 1| 1| 6|
| 2| 2| 7|
+---+---+---+
+---+---+---+
| id|id5|avg|
+---+---+---+
| 0| 5|2.5|
| 1| 6|3.5|
| 2| 7|4.5|
+---+---+---+
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55046091%2fhow-to-convert-row-to-datasetrow-in-spark-java%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Rows are not supposed to be modified directly (it is possible but not convenient). When manipulating dataframes (Dataset of rows), you are supposed to use the SparkSQL API for two main reasons: 1. it's easy to use 2. it allows spark to perform a lot of optimizations on your requests.
Now, here is an example that seem to look like what you are trying to achieve. Basically I create a dataset with three columns and I use a select to average the result of two of them and discard the last one. Let me know if you need more details.
SparkSession spark = SparkSession.builder().getOrCreate();
Dataset<Row> data = spark
.range(10)
.select(col("id").as("id"),
col("id").cast("string").as("str"),
col("id").plus(5).as("id5") );
data.show();
Dataset<Row> result = data
.select(col("id"), col("id5"),
col("id").plus(col("id5")).divide(2).as("avg"));
result.show();
which yields:
+---+---+---+
| id|str|id5|
+---+---+---+
| 0| 0| 5|
| 1| 1| 6|
| 2| 2| 7|
+---+---+---+
+---+---+---+
| id|id5|avg|
+---+---+---+
| 0| 5|2.5|
| 1| 6|3.5|
| 2| 7|4.5|
+---+---+---+
add a comment |
Rows are not supposed to be modified directly (it is possible but not convenient). When manipulating dataframes (Dataset of rows), you are supposed to use the SparkSQL API for two main reasons: 1. it's easy to use 2. it allows spark to perform a lot of optimizations on your requests.
Now, here is an example that seem to look like what you are trying to achieve. Basically I create a dataset with three columns and I use a select to average the result of two of them and discard the last one. Let me know if you need more details.
SparkSession spark = SparkSession.builder().getOrCreate();
Dataset<Row> data = spark
.range(10)
.select(col("id").as("id"),
col("id").cast("string").as("str"),
col("id").plus(5).as("id5") );
data.show();
Dataset<Row> result = data
.select(col("id"), col("id5"),
col("id").plus(col("id5")).divide(2).as("avg"));
result.show();
which yields:
+---+---+---+
| id|str|id5|
+---+---+---+
| 0| 0| 5|
| 1| 1| 6|
| 2| 2| 7|
+---+---+---+
+---+---+---+
| id|id5|avg|
+---+---+---+
| 0| 5|2.5|
| 1| 6|3.5|
| 2| 7|4.5|
+---+---+---+
add a comment |
Rows are not supposed to be modified directly (it is possible but not convenient). When manipulating dataframes (Dataset of rows), you are supposed to use the SparkSQL API for two main reasons: 1. it's easy to use 2. it allows spark to perform a lot of optimizations on your requests.
Now, here is an example that seem to look like what you are trying to achieve. Basically I create a dataset with three columns and I use a select to average the result of two of them and discard the last one. Let me know if you need more details.
SparkSession spark = SparkSession.builder().getOrCreate();
Dataset<Row> data = spark
.range(10)
.select(col("id").as("id"),
col("id").cast("string").as("str"),
col("id").plus(5).as("id5") );
data.show();
Dataset<Row> result = data
.select(col("id"), col("id5"),
col("id").plus(col("id5")).divide(2).as("avg"));
result.show();
which yields:
+---+---+---+
| id|str|id5|
+---+---+---+
| 0| 0| 5|
| 1| 1| 6|
| 2| 2| 7|
+---+---+---+
+---+---+---+
| id|id5|avg|
+---+---+---+
| 0| 5|2.5|
| 1| 6|3.5|
| 2| 7|4.5|
+---+---+---+
Rows are not supposed to be modified directly (it is possible but not convenient). When manipulating dataframes (Dataset of rows), you are supposed to use the SparkSQL API for two main reasons: 1. it's easy to use 2. it allows spark to perform a lot of optimizations on your requests.
Now, here is an example that seem to look like what you are trying to achieve. Basically I create a dataset with three columns and I use a select to average the result of two of them and discard the last one. Let me know if you need more details.
SparkSession spark = SparkSession.builder().getOrCreate();
Dataset<Row> data = spark
.range(10)
.select(col("id").as("id"),
col("id").cast("string").as("str"),
col("id").plus(5).as("id5") );
data.show();
Dataset<Row> result = data
.select(col("id"), col("id5"),
col("id").plus(col("id5")).divide(2).as("avg"));
result.show();
which yields:
+---+---+---+
| id|str|id5|
+---+---+---+
| 0| 0| 5|
| 1| 1| 6|
| 2| 2| 7|
+---+---+---+
+---+---+---+
| id|id5|avg|
+---+---+---+
| 0| 5|2.5|
| 1| 6|3.5|
| 2| 7|4.5|
+---+---+---+
answered Mar 7 at 16:50
OliOli
1,408414
1,408414
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55046091%2fhow-to-convert-row-to-datasetrow-in-spark-java%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown