Bigquery to get Speed based on time/location data (rows above/below current row)How to get the current time in PythonGet current time in milliseconds in Python?BigQuery Data Location Settingget list of pandas dataframe columns based on data typeCondition based on nth row above or below current row - PythonIs there a way to resample time series data in BigQuery?BigQuery speed - time from __SHUFFLE1_REPARTITION0BigQuery: SELECT in WHERE-clause with filter based on a value in the current rowbigquery for time series dataPandas time operations
Does the fruit of Mantra Japa automatically go to Indra if Japa Samarpana Mantra is not chanted?
Why are electrically insulating heatsinks so rare? Is it just cost?
How is it possible to have an ability score that is less than 3?
Risk of getting Chronic Wasting Disease (CWD) in the United States?
"to be prejudice towards/against someone" vs "to be prejudiced against/towards someone"
How much RAM could one put in a typical 80386 setup?
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
LaTeX closing $ signs makes cursor jump
Why was the small council so happy for Tyrion to become the Master of Coin?
How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?
Today is the Center
Why can't I see bouncing of a switch on an oscilloscope?
Why dont electromagnetic waves interact with each other?
What are these boxed doors outside store fronts in New York?
Approximately how much travel time was saved by the opening of the Suez Canal in 1869?
Service Entrance Breakers Rain Shield
Font hinting is lost in Chrome-like browsers (for some languages )
Did Shadowfax go to Valinor?
Minkowski space
What would happen to a modern skyscraper if it rains micro blackholes?
Are the number of citations and number of published articles the most important criteria for a tenure promotion?
Fencing style for blades that can attack from a distance
Why not use SQL instead of GraphQL?
strToHex ( string to its hex representation as string)
Bigquery to get Speed based on time/location data (rows above/below current row)
How to get the current time in PythonGet current time in milliseconds in Python?BigQuery Data Location Settingget list of pandas dataframe columns based on data typeCondition based on nth row above or below current row - PythonIs there a way to resample time series data in BigQuery?BigQuery speed - time from __SHUFFLE1_REPARTITION0BigQuery: SELECT in WHERE-clause with filter based on a value in the current rowbigquery for time series dataPandas time operations
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
add a comment |
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
add a comment |
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
I have a table in Bigquery with tracking data for Nascar drivers (dummy data for a project I am working on). The x and y coordinates are taken 10 times a second. The capture_frame
signifies the current frame, and each sequential capture_frame
should be 100 milliseconds apart because the data is taken every 100 ms.
I want to calculate each driver's speed per lap. I know how to do this in pandas but I think this is possible in bigquery. To calculate speed, I am looking at 2 rows before capture_frame
and 2 rows after and then dividing by the difference in epoch time, which should be 400 milliseconds.
Here is an example of a few capture frames for 1 race for one driver for the first lap. There are a few hundred capture frames per lap and then 20 drivers mixed in as well, but it is easier to understand if we look at just one driver/race/lap.
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| Race | Capture | Lap | Driver | … | X | Y | Epoch_time | Delta_dist | Curr_speed |
| | _frame | | | | | | | | |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 1 | 1 | Logano | …. | 2.1 | 1 | 1552089720 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 2 | 1 | Logano | … | 2.2 | 1.1 | 1552089820 | NULL | Null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 3 | 1 | Logano | … | 2.22 | 1.2 | 1552089920 | 2.265921446 | 0.005664804 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 4 | 1 | Logano | .. | 3.22 | 1.5 | 1552090020 | 3.124163888 | 0.00781041 |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 5 | 1 | Logano | .. | 4.22 | 1.8 | 1552090120 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
| I500 | 6 | 1 | Logano | .. | 5.22 | 1.9 | 1552090220 | NULL | null |
+------+---------+-----+--------+----+------+-----+------------+-------------+-------------+
The delta_dist
for frame 3 is calculated by sqrt((4.22-2.1)^2 + (1.8-1)^2)/1
and the the curr_speed
is that number divided by 400. The first/last 2 distances and speeds of the race will be null because there are no prior x or y coordinates which is okay as there isn't really any speed when you are .1 second from starting or stopping.
In pandas I would do (this is not great code as I am just bringing each driver and race in on its own):
#laps_per_race dictionary with num laps per race
for driver in driver_list:
for race in race_list:
driver_race_query = “SELECT * from nascar_xyz where driver=driver and Race=race”.format(driver=driver, race=race)
df_entire_race = client.query(driver_race_query).to_dataframe()
num_laps = laps_per_race[race]
for lap in num_laps:
#get subset of dataframe just for this lap
df = df_entire_race.loc[df_entire_race['Lap'] == lap]
df.sort_values(‘Epoch_time’, inplace=True)
df[‘prev_x’] = df[‘X’].shift(2)
df[‘next_x’] = df[‘X’].shift(-2)
df[‘prev_y’] = df[‘Y’].shift(2)
df[‘next_y’] = df[‘Y’].shift(-2)
#this is just distance function sqrt((x2-x1)^2 + (y2-y1)^2)
df['delta_dist'] = np.sqrt((df[‘X’].shift(-2) - df[‘X’].shift(2))**2 + (df[‘Y’].shift(-2) - df[‘Y’].shift(2))**2))
#400.0 is the time actual difference
df['Curr_speed'] = df['delta_dist']/400.0
I think in my sql query I either have to do a group by or partition by to because I want to look in each race by driver_id
, then lap (if that level of abstraction makes sense). Maybe for the speed and looking capture_frames ahead I can do something with windowing (https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts) or something called lag which seems like the equivalent of .shift()
in pandas.
python sql pandas google-bigquery geospatial
python sql pandas google-bigquery geospatial
edited Mar 9 at 2:58
Pikachu the Purple Wizard
2,06161529
2,06161529
asked Mar 9 at 0:51
H. RoseH. Rose
191
191
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
add a comment |
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
2
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17
add a comment |
1 Answer
1
active
oldest
votes
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072921%2fbigquery-to-get-speed-based-on-time-location-data-rows-above-below-current-row%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
add a comment |
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
add a comment |
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
You are in the right path. I'll take a public dataset of buses moving around Staten Island - and I'll use the geographical distance by looking at their lat,lon:
WITH data AS (
SELECT bus, ST_GeogPoint(longitude, latitude) point
, PARSE_TIMESTAMP('%Y%m%d %H%M%S',FORMAT('%i %06d', day, time)) ts
FROM `fh-bigquery.mta_nyc_si.201410_bustime`
WHERE day=20141014
AND bus IN (7043, 7086, 7076, 2421, 7052, 7071)
)
SELECT *
FROM (
SELECT bus, ts, distance/time speed
FROM (
SELECT bus, ts
, ST_DISTANCE(point, LAG(point, 3) OVER(PARTITION BY bus ORDER BY ts)) distance
, TIMESTAMP_DIFF(ts, LAG(ts, 3) OVER(PARTITION BY bus ORDER BY ts), SECOND) time
FROM data
)
WHERE time IS NOT null
)
WHERE speed < 500
answered Mar 9 at 1:44
Felipe HoffaFelipe Hoffa
22.7k253122
22.7k253122
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072921%2fbigquery-to-get-speed-based-on-time-location-data-rows-above-below-current-row%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Tangential comment: I'm surprised this question got 3 upvotes in less than 5 minutes.
– Felipe Hoffa
Mar 9 at 1:11
it is not clear - what output you expect to have. can you provide example please, so we can help without speculating too much
– Mikhail Berlyant
Mar 9 at 12:17