Efficient way to get most recent of many transaction nodes connected to a single account node by dateWhat is the most efficient way to insert nodes into a neo4j database using cypherspecific query with cypherComputing Graph Metrics in Neo4j/CypherExtract subgraph from Neo4j graph with CypherDB model for logging ongoing transactionsEfficiently check if there is at least one relationship of type connected to node, if not - remove nodeneo4j apoc.periodic.rock_n_roll() performanceMost efficient method of matching two individual nodesInserting 2M nodes/relationship into Neo4J in a single transaction never succeedsHow to return top n biggest cluster in Neo4j?
Why Shazam when there is already Superman?
Non-trope happy ending?
It grows, but water kills it
Has the laser at Magurele, Romania reached a tenth of the Sun's power?
Which was the first story featuring espers?
The Digit Triangles
Why do Radio Buttons not fill the entire outer circle?
Does the reader need to like the PoV character?
Does Doodling or Improvising on the Piano Have Any Benefits?
Does an advisor owe his/her student anything? Will an advisor keep a PhD student only out of pity?
US tourist/student visa
Why is it that I can sometimes guess the next note?
What is the highest possible scrabble score for placing a single tile
Does the Linux kernel need a file system to run?
What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?
How to preserve electronics (computers, iPads and phones) for hundreds of years
Doesn't the system of the Supreme Court oppose justice?
Is there any evidence that Cleopatra and Caesarion considered fleeing to India to escape the Romans?
Are Captain Marvel's powers affected by Thanos breaking the Tesseract and claiming the stone?
Has any country ever had 2 former presidents in jail simultaneously?
What is the English pronunciation of "pain au chocolat"?
Change the color of a single dot in `ddot` symbol
I found an audio circuit and I built it just fine, but I find it a bit too quiet. How do I amplify the output so that it is a bit louder?
What is going on with gets(stdin) on the site coderbyte?
Efficient way to get most recent of many transaction nodes connected to a single account node by date
What is the most efficient way to insert nodes into a neo4j database using cypherspecific query with cypherComputing Graph Metrics in Neo4j/CypherExtract subgraph from Neo4j graph with CypherDB model for logging ongoing transactionsEfficiently check if there is at least one relationship of type connected to node, if not - remove nodeneo4j apoc.periodic.rock_n_roll() performanceMost efficient method of matching two individual nodesInserting 2M nodes/relationship into Neo4J in a single transaction never succeedsHow to return top n biggest cluster in Neo4j?
I have a large number of nodes representing accounts, which we could label as say (a :Account)
. Each (:Account)
can have potentially tens of thousands of (t :Transaction)
nodes connected to it, each representing the data for a transaction that occurred involving that account.
The (:Transaction)
nodes have a date
property. Given a date to query on what would be the most efficient way to get the latest (:Transaction)
node for each (a :Account)
that occurs before or on the query date? This could be one way to do it:
// run for all address nodes
match (a :Address)
with distinct a
optional match (a)-->(t :Transaction)
where t.timestamp <= date("2014-03-07")
with a, t
where t.date = max(t.date)
return a, t
However I'm not sure if this method is very efficient when the number of (t)
connected to each (a)
becomes very large. Is there a way to write the query or to index the database such that the query time scales linearly with the number of accounts, no matter the number of transactions connected to those account nodes?
For disclosure I posted a version of this question on the neo4j community forum, but I'm hoping the greater traffic on this site gives this question more exposure.
graph neo4j cypher
add a comment |
I have a large number of nodes representing accounts, which we could label as say (a :Account)
. Each (:Account)
can have potentially tens of thousands of (t :Transaction)
nodes connected to it, each representing the data for a transaction that occurred involving that account.
The (:Transaction)
nodes have a date
property. Given a date to query on what would be the most efficient way to get the latest (:Transaction)
node for each (a :Account)
that occurs before or on the query date? This could be one way to do it:
// run for all address nodes
match (a :Address)
with distinct a
optional match (a)-->(t :Transaction)
where t.timestamp <= date("2014-03-07")
with a, t
where t.date = max(t.date)
return a, t
However I'm not sure if this method is very efficient when the number of (t)
connected to each (a)
becomes very large. Is there a way to write the query or to index the database such that the query time scales linearly with the number of accounts, no matter the number of transactions connected to those account nodes?
For disclosure I posted a version of this question on the neo4j community forum, but I'm hoping the greater traffic on this site gives this question more exposure.
graph neo4j cypher
1
Your query will only return the result for a singleAccount
(or, since you usedOPTIONAL
, perhaps a nullAccount
). Did you want the result for everyAccount
(that has an matching transaction)?
– cybersam
Mar 8 at 0:54
Hi @cybersam - yes sorry I should have mentioned. I do want to run it for all addresses.
– Simon O'Hanlon
Mar 8 at 1:10
add a comment |
I have a large number of nodes representing accounts, which we could label as say (a :Account)
. Each (:Account)
can have potentially tens of thousands of (t :Transaction)
nodes connected to it, each representing the data for a transaction that occurred involving that account.
The (:Transaction)
nodes have a date
property. Given a date to query on what would be the most efficient way to get the latest (:Transaction)
node for each (a :Account)
that occurs before or on the query date? This could be one way to do it:
// run for all address nodes
match (a :Address)
with distinct a
optional match (a)-->(t :Transaction)
where t.timestamp <= date("2014-03-07")
with a, t
where t.date = max(t.date)
return a, t
However I'm not sure if this method is very efficient when the number of (t)
connected to each (a)
becomes very large. Is there a way to write the query or to index the database such that the query time scales linearly with the number of accounts, no matter the number of transactions connected to those account nodes?
For disclosure I posted a version of this question on the neo4j community forum, but I'm hoping the greater traffic on this site gives this question more exposure.
graph neo4j cypher
I have a large number of nodes representing accounts, which we could label as say (a :Account)
. Each (:Account)
can have potentially tens of thousands of (t :Transaction)
nodes connected to it, each representing the data for a transaction that occurred involving that account.
The (:Transaction)
nodes have a date
property. Given a date to query on what would be the most efficient way to get the latest (:Transaction)
node for each (a :Account)
that occurs before or on the query date? This could be one way to do it:
// run for all address nodes
match (a :Address)
with distinct a
optional match (a)-->(t :Transaction)
where t.timestamp <= date("2014-03-07")
with a, t
where t.date = max(t.date)
return a, t
However I'm not sure if this method is very efficient when the number of (t)
connected to each (a)
becomes very large. Is there a way to write the query or to index the database such that the query time scales linearly with the number of accounts, no matter the number of transactions connected to those account nodes?
For disclosure I posted a version of this question on the neo4j community forum, but I'm hoping the greater traffic on this site gives this question more exposure.
graph neo4j cypher
graph neo4j cypher
edited Mar 8 at 1:17
Simon O'Hanlon
asked Mar 7 at 23:37
Simon O'HanlonSimon O'Hanlon
46.5k6112152
46.5k6112152
1
Your query will only return the result for a singleAccount
(or, since you usedOPTIONAL
, perhaps a nullAccount
). Did you want the result for everyAccount
(that has an matching transaction)?
– cybersam
Mar 8 at 0:54
Hi @cybersam - yes sorry I should have mentioned. I do want to run it for all addresses.
– Simon O'Hanlon
Mar 8 at 1:10
add a comment |
1
Your query will only return the result for a singleAccount
(or, since you usedOPTIONAL
, perhaps a nullAccount
). Did you want the result for everyAccount
(that has an matching transaction)?
– cybersam
Mar 8 at 0:54
Hi @cybersam - yes sorry I should have mentioned. I do want to run it for all addresses.
– Simon O'Hanlon
Mar 8 at 1:10
1
1
Your query will only return the result for a single
Account
(or, since you used OPTIONAL
, perhaps a null Account
). Did you want the result for every Account
(that has an matching transaction)?– cybersam
Mar 8 at 0:54
Your query will only return the result for a single
Account
(or, since you used OPTIONAL
, perhaps a null Account
). Did you want the result for every Account
(that has an matching transaction)?– cybersam
Mar 8 at 0:54
Hi @cybersam - yes sorry I should have mentioned. I do want to run it for all addresses.
– Simon O'Hanlon
Mar 8 at 1:10
Hi @cybersam - yes sorry I should have mentioned. I do want to run it for all addresses.
– Simon O'Hanlon
Mar 8 at 1:10
add a comment |
1 Answer
1
active
oldest
votes
In neo4j 3.5, a new "index-backed order by" optimization was added. This means that if you create a "native" index (see here for the details), then the index will be stored in sorted order, and the ORDER BY
clause on a property on which the index is used won't actually have to do any sorting.
So, assuming that you have created in index on :Transaction(timestamp)
, like so:
CREATE INDEX ON :Transaction(timestamp);
then, in neo4j 3.5+, this query (with an optional hint to use that index) should avoid any sorting when finding the Transaction
with the maximum timestamp
for each Address
:
MATCH (a:Address)-->(t:Transaction)
USING INDEX t:Transaction(timestamp)
WHERE t.timestamp <= date("2014-03-07")
WITH a, t
ORDER BY t.timestamp DESC
RETURN a, COLLECT(t)[0] AS transaction
This query should do the following:
- Use the index to get all
Transaction
nodes with an appropriatetimestamp
(in descending order, without sorting). - Get the
Address
nodes related to eachTransaction
. - For each distinct
Address
node, create a list of all the relatedTransaction
nodes (in descendingtimestamp
order, without sorting), and get the first one from the list. - Return each distinct
Address
node and its most recent appropriateTransaction
node.
This query will scale linearly with the number of appropriate Transactions
. If your use case permits it, you could get faster results by reducing the number of appropriate Transactions
by also putting a lower bound in your WHERE
clause.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054545%2fefficient-way-to-get-most-recent-of-many-transaction-nodes-connected-to-a-single%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
In neo4j 3.5, a new "index-backed order by" optimization was added. This means that if you create a "native" index (see here for the details), then the index will be stored in sorted order, and the ORDER BY
clause on a property on which the index is used won't actually have to do any sorting.
So, assuming that you have created in index on :Transaction(timestamp)
, like so:
CREATE INDEX ON :Transaction(timestamp);
then, in neo4j 3.5+, this query (with an optional hint to use that index) should avoid any sorting when finding the Transaction
with the maximum timestamp
for each Address
:
MATCH (a:Address)-->(t:Transaction)
USING INDEX t:Transaction(timestamp)
WHERE t.timestamp <= date("2014-03-07")
WITH a, t
ORDER BY t.timestamp DESC
RETURN a, COLLECT(t)[0] AS transaction
This query should do the following:
- Use the index to get all
Transaction
nodes with an appropriatetimestamp
(in descending order, without sorting). - Get the
Address
nodes related to eachTransaction
. - For each distinct
Address
node, create a list of all the relatedTransaction
nodes (in descendingtimestamp
order, without sorting), and get the first one from the list. - Return each distinct
Address
node and its most recent appropriateTransaction
node.
This query will scale linearly with the number of appropriate Transactions
. If your use case permits it, you could get faster results by reducing the number of appropriate Transactions
by also putting a lower bound in your WHERE
clause.
add a comment |
In neo4j 3.5, a new "index-backed order by" optimization was added. This means that if you create a "native" index (see here for the details), then the index will be stored in sorted order, and the ORDER BY
clause on a property on which the index is used won't actually have to do any sorting.
So, assuming that you have created in index on :Transaction(timestamp)
, like so:
CREATE INDEX ON :Transaction(timestamp);
then, in neo4j 3.5+, this query (with an optional hint to use that index) should avoid any sorting when finding the Transaction
with the maximum timestamp
for each Address
:
MATCH (a:Address)-->(t:Transaction)
USING INDEX t:Transaction(timestamp)
WHERE t.timestamp <= date("2014-03-07")
WITH a, t
ORDER BY t.timestamp DESC
RETURN a, COLLECT(t)[0] AS transaction
This query should do the following:
- Use the index to get all
Transaction
nodes with an appropriatetimestamp
(in descending order, without sorting). - Get the
Address
nodes related to eachTransaction
. - For each distinct
Address
node, create a list of all the relatedTransaction
nodes (in descendingtimestamp
order, without sorting), and get the first one from the list. - Return each distinct
Address
node and its most recent appropriateTransaction
node.
This query will scale linearly with the number of appropriate Transactions
. If your use case permits it, you could get faster results by reducing the number of appropriate Transactions
by also putting a lower bound in your WHERE
clause.
add a comment |
In neo4j 3.5, a new "index-backed order by" optimization was added. This means that if you create a "native" index (see here for the details), then the index will be stored in sorted order, and the ORDER BY
clause on a property on which the index is used won't actually have to do any sorting.
So, assuming that you have created in index on :Transaction(timestamp)
, like so:
CREATE INDEX ON :Transaction(timestamp);
then, in neo4j 3.5+, this query (with an optional hint to use that index) should avoid any sorting when finding the Transaction
with the maximum timestamp
for each Address
:
MATCH (a:Address)-->(t:Transaction)
USING INDEX t:Transaction(timestamp)
WHERE t.timestamp <= date("2014-03-07")
WITH a, t
ORDER BY t.timestamp DESC
RETURN a, COLLECT(t)[0] AS transaction
This query should do the following:
- Use the index to get all
Transaction
nodes with an appropriatetimestamp
(in descending order, without sorting). - Get the
Address
nodes related to eachTransaction
. - For each distinct
Address
node, create a list of all the relatedTransaction
nodes (in descendingtimestamp
order, without sorting), and get the first one from the list. - Return each distinct
Address
node and its most recent appropriateTransaction
node.
This query will scale linearly with the number of appropriate Transactions
. If your use case permits it, you could get faster results by reducing the number of appropriate Transactions
by also putting a lower bound in your WHERE
clause.
In neo4j 3.5, a new "index-backed order by" optimization was added. This means that if you create a "native" index (see here for the details), then the index will be stored in sorted order, and the ORDER BY
clause on a property on which the index is used won't actually have to do any sorting.
So, assuming that you have created in index on :Transaction(timestamp)
, like so:
CREATE INDEX ON :Transaction(timestamp);
then, in neo4j 3.5+, this query (with an optional hint to use that index) should avoid any sorting when finding the Transaction
with the maximum timestamp
for each Address
:
MATCH (a:Address)-->(t:Transaction)
USING INDEX t:Transaction(timestamp)
WHERE t.timestamp <= date("2014-03-07")
WITH a, t
ORDER BY t.timestamp DESC
RETURN a, COLLECT(t)[0] AS transaction
This query should do the following:
- Use the index to get all
Transaction
nodes with an appropriatetimestamp
(in descending order, without sorting). - Get the
Address
nodes related to eachTransaction
. - For each distinct
Address
node, create a list of all the relatedTransaction
nodes (in descendingtimestamp
order, without sorting), and get the first one from the list. - Return each distinct
Address
node and its most recent appropriateTransaction
node.
This query will scale linearly with the number of appropriate Transactions
. If your use case permits it, you could get faster results by reducing the number of appropriate Transactions
by also putting a lower bound in your WHERE
clause.
edited Mar 8 at 3:22
answered Mar 8 at 3:16
cybersamcybersam
40.4k53252
40.4k53252
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054545%2fefficient-way-to-get-most-recent-of-many-transaction-nodes-connected-to-a-single%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Your query will only return the result for a single
Account
(or, since you usedOPTIONAL
, perhaps a nullAccount
). Did you want the result for everyAccount
(that has an matching transaction)?– cybersam
Mar 8 at 0:54
Hi @cybersam - yes sorry I should have mentioned. I do want to run it for all addresses.
– Simon O'Hanlon
Mar 8 at 1:10