problem while storing tweets into csv file2019 Community Moderator ElectionHow do I check whether a file exists without exceptions?How do I copy a file in Python?How to output MySQL query results in CSV format?Dealing with commas in a CSV fileSave PL/pgSQL output from PostgreSQL to a CSV fileHow do I list all files of a directory?How to read a file line-by-line into a list?Delete a file or folderPandas writing dataframe to CSV fileTWYTHON . Getting user name from tweets. Generating friends list from username
What is the significance behind "40 days" that often appears in the Bible?
What options are left, if Britain cannot decide?
How difficult is it to simply disable/disengage the MCAS on Boeing 737 Max 8 & 9 Aircraft?
Is it ever recommended to use mean/multiple imputation when using tree-based predictive models?
A single argument pattern definition applies to multiple-argument patterns?
What is the relationship between relativity and the Doppler effect?
How could an airship be repaired midflight?
Is there a place to find the pricing for things not mentioned in the PHB? (non-magical)
Why is there is so much iron?
What is the purpose or proof behind chain rule?
Explaining pyrokinesis powers
Most cost effective thermostat setting: consistent temperature vs. lowest temperature possible
Simplify an interface for flexibly applying rules to periods of time
New passport but visa is in old (lost) passport
Fastest way to pop N items from a large dict
Describing a chess game in a novel
Why is a white electrical wire connected to 2 black wires?
How are passwords stolen from companies if they only store hashes?
Are ETF trackers fundamentally better than individual stocks?
World War I as a war of liberals against authoritarians?
Why does overlay work only on the first tcolorbox?
How can we have a quark condensate without a quark potential?
Four married couples attend a party. Each person shakes hands with every other person, except their own spouse, exactly once. How many handshakes?
Adventure Game (text based) in C++
problem while storing tweets into csv file
2019 Community Moderator ElectionHow do I check whether a file exists without exceptions?How do I copy a file in Python?How to output MySQL query results in CSV format?Dealing with commas in a CSV fileSave PL/pgSQL output from PostgreSQL to a CSV fileHow do I list all files of a directory?How to read a file line-by-line into a list?Delete a file or folderPandas writing dataframe to CSV fileTWYTHON . Getting user name from tweets. Generating friends list from username
I am working with Python attempting to store tweets (more precisely only their date, user, bio and text) related to a specific keyword in a csv file.
As I am working on the free-to-use API of Twitter, I am limited to 450 tweets every 15 minutes.
So I have coded something which is supposed to store exactly 450 tweets in 15 minutes.
BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.
Any help would be much appreciated !!
Thanks in advance
import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET)
sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id]
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours
# tweet extract method with the last list item as the max_id
print("new crawl, max_id:", id_list[-1])
tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)
for status in tweets:
id_list.append(status["id"]) ## append tweet id's
if status==tweets[0]:
continue
if status==tweets[1]:
date = status["created_at"].encode('utf-8')
user = status["user"]["screen_name"].encode('utf-8')
bio = status["user"]["description"].encode('utf-8')
text = status["text"].encode('utf-8')
with open(sfile,'a') as sf:
sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")
count += 1
print(count)
print(date, text)
python csv twitter twython
add a comment |
I am working with Python attempting to store tweets (more precisely only their date, user, bio and text) related to a specific keyword in a csv file.
As I am working on the free-to-use API of Twitter, I am limited to 450 tweets every 15 minutes.
So I have coded something which is supposed to store exactly 450 tweets in 15 minutes.
BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.
Any help would be much appreciated !!
Thanks in advance
import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET)
sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id]
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours
# tweet extract method with the last list item as the max_id
print("new crawl, max_id:", id_list[-1])
tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)
for status in tweets:
id_list.append(status["id"]) ## append tweet id's
if status==tweets[0]:
continue
if status==tweets[1]:
date = status["created_at"].encode('utf-8')
user = status["user"]["screen_name"].encode('utf-8')
bio = status["user"]["description"].encode('utf-8')
text = status["text"].encode('utf-8')
with open(sfile,'a') as sf:
sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")
count += 1
print(count)
print(date, text)
python csv twitter twython
I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.
– Martin Evans
Mar 8 at 8:55
add a comment |
I am working with Python attempting to store tweets (more precisely only their date, user, bio and text) related to a specific keyword in a csv file.
As I am working on the free-to-use API of Twitter, I am limited to 450 tweets every 15 minutes.
So I have coded something which is supposed to store exactly 450 tweets in 15 minutes.
BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.
Any help would be much appreciated !!
Thanks in advance
import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET)
sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id]
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours
# tweet extract method with the last list item as the max_id
print("new crawl, max_id:", id_list[-1])
tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)
for status in tweets:
id_list.append(status["id"]) ## append tweet id's
if status==tweets[0]:
continue
if status==tweets[1]:
date = status["created_at"].encode('utf-8')
user = status["user"]["screen_name"].encode('utf-8')
bio = status["user"]["description"].encode('utf-8')
text = status["text"].encode('utf-8')
with open(sfile,'a') as sf:
sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")
count += 1
print(count)
print(date, text)
python csv twitter twython
I am working with Python attempting to store tweets (more precisely only their date, user, bio and text) related to a specific keyword in a csv file.
As I am working on the free-to-use API of Twitter, I am limited to 450 tweets every 15 minutes.
So I have coded something which is supposed to store exactly 450 tweets in 15 minutes.
BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.
Any help would be much appreciated !!
Thanks in advance
import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET)
sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id]
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours
# tweet extract method with the last list item as the max_id
print("new crawl, max_id:", id_list[-1])
tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)
for status in tweets:
id_list.append(status["id"]) ## append tweet id's
if status==tweets[0]:
continue
if status==tweets[1]:
date = status["created_at"].encode('utf-8')
user = status["user"]["screen_name"].encode('utf-8')
bio = status["user"]["description"].encode('utf-8')
text = status["text"].encode('utf-8')
with open(sfile,'a') as sf:
sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")
count += 1
print(count)
print(date, text)
python csv twitter twython
python csv twitter twython
edited Mar 7 at 16:03
Etienne Numérogliss Drt
asked Mar 7 at 15:42
Etienne Numérogliss DrtEtienne Numérogliss Drt
63
63
I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.
– Martin Evans
Mar 8 at 8:55
add a comment |
I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.
– Martin Evans
Mar 8 at 8:55
I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.
– Martin Evans
Mar 8 at 8:55
I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.
– Martin Evans
Mar 8 at 8:55
add a comment |
1 Answer
1
active
oldest
votes
You should use Python's CSV library to write your CSV files. It takes a list containing all of the items for a row and automatically adds the delimiters for you. If a value contains a comma, it automatically adds quotes for you (which is how CSV files are meant to work). It can even handle newlines inside a value. If you open the resulting file into a spreadsheet application you will see it is correctly read in.
Rather than trying to use time.sleep()
, a better approach is to work with absolute times. So the idea is to take your starting time and add three hours to it. You can then keep looping until this finish_time
is reached.
The same approach can be made to your API call allocations. Keep a counter holding how many calls you have left and downcount it. If it reaches 0
then stop making calls until the next fifteen minute slot is reached.
timedelta()
can be used to add minutes or hours to an existing datetime
object. By doing it this way, your times will never slip out of sync.
The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:
from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID
fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)
calls_allowed = 450
calls_remaining = calls_allowed
now = datetime.now()
next_allocation = now + fifteen
todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()
with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
while now < finish_time:
time.sleep(2)
now = datetime.now()
if now >= next_allocation:
next_allocation += fifteen
calls_remaining = calls_allowed
print("New call allocation")
if calls_remaining:
calls_remaining -= 1
print(f"Get tweets - calls_remaining calls remaining")
# Simulate a tweet response
id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
date = "01.01.2019"
user = "Fred"
bio = "I am Fred"
text = "Hello, this is a tweetnusing a comma and a newline."
if id not in ids_seen:
csv_output.writerow([id, date, user, bio, text])
ids_seen.add(id)
As for the problem of keep writing the same Tweets. You could use a set()
to hold all of the IDs that you have written. You could then test if a new tweet has already been seen before writing it again.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55047659%2fproblem-while-storing-tweets-into-csv-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You should use Python's CSV library to write your CSV files. It takes a list containing all of the items for a row and automatically adds the delimiters for you. If a value contains a comma, it automatically adds quotes for you (which is how CSV files are meant to work). It can even handle newlines inside a value. If you open the resulting file into a spreadsheet application you will see it is correctly read in.
Rather than trying to use time.sleep()
, a better approach is to work with absolute times. So the idea is to take your starting time and add three hours to it. You can then keep looping until this finish_time
is reached.
The same approach can be made to your API call allocations. Keep a counter holding how many calls you have left and downcount it. If it reaches 0
then stop making calls until the next fifteen minute slot is reached.
timedelta()
can be used to add minutes or hours to an existing datetime
object. By doing it this way, your times will never slip out of sync.
The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:
from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID
fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)
calls_allowed = 450
calls_remaining = calls_allowed
now = datetime.now()
next_allocation = now + fifteen
todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()
with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
while now < finish_time:
time.sleep(2)
now = datetime.now()
if now >= next_allocation:
next_allocation += fifteen
calls_remaining = calls_allowed
print("New call allocation")
if calls_remaining:
calls_remaining -= 1
print(f"Get tweets - calls_remaining calls remaining")
# Simulate a tweet response
id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
date = "01.01.2019"
user = "Fred"
bio = "I am Fred"
text = "Hello, this is a tweetnusing a comma and a newline."
if id not in ids_seen:
csv_output.writerow([id, date, user, bio, text])
ids_seen.add(id)
As for the problem of keep writing the same Tweets. You could use a set()
to hold all of the IDs that you have written. You could then test if a new tweet has already been seen before writing it again.
add a comment |
You should use Python's CSV library to write your CSV files. It takes a list containing all of the items for a row and automatically adds the delimiters for you. If a value contains a comma, it automatically adds quotes for you (which is how CSV files are meant to work). It can even handle newlines inside a value. If you open the resulting file into a spreadsheet application you will see it is correctly read in.
Rather than trying to use time.sleep()
, a better approach is to work with absolute times. So the idea is to take your starting time and add three hours to it. You can then keep looping until this finish_time
is reached.
The same approach can be made to your API call allocations. Keep a counter holding how many calls you have left and downcount it. If it reaches 0
then stop making calls until the next fifteen minute slot is reached.
timedelta()
can be used to add minutes or hours to an existing datetime
object. By doing it this way, your times will never slip out of sync.
The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:
from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID
fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)
calls_allowed = 450
calls_remaining = calls_allowed
now = datetime.now()
next_allocation = now + fifteen
todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()
with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
while now < finish_time:
time.sleep(2)
now = datetime.now()
if now >= next_allocation:
next_allocation += fifteen
calls_remaining = calls_allowed
print("New call allocation")
if calls_remaining:
calls_remaining -= 1
print(f"Get tweets - calls_remaining calls remaining")
# Simulate a tweet response
id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
date = "01.01.2019"
user = "Fred"
bio = "I am Fred"
text = "Hello, this is a tweetnusing a comma and a newline."
if id not in ids_seen:
csv_output.writerow([id, date, user, bio, text])
ids_seen.add(id)
As for the problem of keep writing the same Tweets. You could use a set()
to hold all of the IDs that you have written. You could then test if a new tweet has already been seen before writing it again.
add a comment |
You should use Python's CSV library to write your CSV files. It takes a list containing all of the items for a row and automatically adds the delimiters for you. If a value contains a comma, it automatically adds quotes for you (which is how CSV files are meant to work). It can even handle newlines inside a value. If you open the resulting file into a spreadsheet application you will see it is correctly read in.
Rather than trying to use time.sleep()
, a better approach is to work with absolute times. So the idea is to take your starting time and add three hours to it. You can then keep looping until this finish_time
is reached.
The same approach can be made to your API call allocations. Keep a counter holding how many calls you have left and downcount it. If it reaches 0
then stop making calls until the next fifteen minute slot is reached.
timedelta()
can be used to add minutes or hours to an existing datetime
object. By doing it this way, your times will never slip out of sync.
The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:
from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID
fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)
calls_allowed = 450
calls_remaining = calls_allowed
now = datetime.now()
next_allocation = now + fifteen
todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()
with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
while now < finish_time:
time.sleep(2)
now = datetime.now()
if now >= next_allocation:
next_allocation += fifteen
calls_remaining = calls_allowed
print("New call allocation")
if calls_remaining:
calls_remaining -= 1
print(f"Get tweets - calls_remaining calls remaining")
# Simulate a tweet response
id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
date = "01.01.2019"
user = "Fred"
bio = "I am Fred"
text = "Hello, this is a tweetnusing a comma and a newline."
if id not in ids_seen:
csv_output.writerow([id, date, user, bio, text])
ids_seen.add(id)
As for the problem of keep writing the same Tweets. You could use a set()
to hold all of the IDs that you have written. You could then test if a new tweet has already been seen before writing it again.
You should use Python's CSV library to write your CSV files. It takes a list containing all of the items for a row and automatically adds the delimiters for you. If a value contains a comma, it automatically adds quotes for you (which is how CSV files are meant to work). It can even handle newlines inside a value. If you open the resulting file into a spreadsheet application you will see it is correctly read in.
Rather than trying to use time.sleep()
, a better approach is to work with absolute times. So the idea is to take your starting time and add three hours to it. You can then keep looping until this finish_time
is reached.
The same approach can be made to your API call allocations. Keep a counter holding how many calls you have left and downcount it. If it reaches 0
then stop making calls until the next fifteen minute slot is reached.
timedelta()
can be used to add minutes or hours to an existing datetime
object. By doing it this way, your times will never slip out of sync.
The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:
from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID
fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)
calls_allowed = 450
calls_remaining = calls_allowed
now = datetime.now()
next_allocation = now + fifteen
todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()
with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
while now < finish_time:
time.sleep(2)
now = datetime.now()
if now >= next_allocation:
next_allocation += fifteen
calls_remaining = calls_allowed
print("New call allocation")
if calls_remaining:
calls_remaining -= 1
print(f"Get tweets - calls_remaining calls remaining")
# Simulate a tweet response
id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
date = "01.01.2019"
user = "Fred"
bio = "I am Fred"
text = "Hello, this is a tweetnusing a comma and a newline."
if id not in ids_seen:
csv_output.writerow([id, date, user, bio, text])
ids_seen.add(id)
As for the problem of keep writing the same Tweets. You could use a set()
to hold all of the IDs that you have written. You could then test if a new tweet has already been seen before writing it again.
edited Mar 8 at 10:52
answered Mar 8 at 10:02
Martin EvansMartin Evans
28.4k133156
28.4k133156
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55047659%2fproblem-while-storing-tweets-into-csv-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.
– Martin Evans
Mar 8 at 8:55