problem while storing tweets into csv file2019 Community Moderator ElectionHow do I check whether a file exists without exceptions?How do I copy a file in Python?How to output MySQL query results in CSV format?Dealing with commas in a CSV fileSave PL/pgSQL output from PostgreSQL to a CSV fileHow do I list all files of a directory?How to read a file line-by-line into a list?Delete a file or folderPandas writing dataframe to CSV fileTWYTHON . Getting user name from tweets. Generating friends list from username

What is the significance behind "40 days" that often appears in the Bible?

What options are left, if Britain cannot decide?

How difficult is it to simply disable/disengage the MCAS on Boeing 737 Max 8 & 9 Aircraft?

Is it ever recommended to use mean/multiple imputation when using tree-based predictive models?

A single argument pattern definition applies to multiple-argument patterns?

What is the relationship between relativity and the Doppler effect?

How could an airship be repaired midflight?

Is there a place to find the pricing for things not mentioned in the PHB? (non-magical)

Why is there is so much iron?

What is the purpose or proof behind chain rule?

Explaining pyrokinesis powers

Most cost effective thermostat setting: consistent temperature vs. lowest temperature possible

Simplify an interface for flexibly applying rules to periods of time

New passport but visa is in old (lost) passport

Fastest way to pop N items from a large dict

Describing a chess game in a novel

Why is a white electrical wire connected to 2 black wires?

How are passwords stolen from companies if they only store hashes?

Are ETF trackers fundamentally better than individual stocks?

World War I as a war of liberals against authoritarians?

Why does overlay work only on the first tcolorbox?

How can we have a quark condensate without a quark potential?

Four married couples attend a party. Each person shakes hands with every other person, except their own spouse, exactly once. How many handshakes?

Adventure Game (text based) in C++

problem while storing tweets into csv file

2019 Community Moderator ElectionHow do I check whether a file exists without exceptions?How do I copy a file in Python?How to output MySQL query results in CSV format?Dealing with commas in a CSV fileSave PL/pgSQL output from PostgreSQL to a CSV fileHow do I list all files of a directory?How to read a file line-by-line into a list?Delete a file or folderPandas writing dataframe to CSV fileTWYTHON . Getting user name from tweets. Generating friends list from username

I am working with Python attempting to store tweets (more precisely only their date, user, bio and text) related to a specific keyword in a csv file.
As I am working on the free-to-use API of Twitter, I am limited to 450 tweets every 15 minutes.
So I have coded something which is supposed to store exactly 450 tweets in 15 minutes.

BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.

Any help would be much appreciated !!
Thanks in advance

import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET) 

sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id] 
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours

 # tweet extract method with the last list item as the max_id
 print("new crawl, max_id:", id_list[-1])
 tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
 time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)

 for status in tweets:
 id_list.append(status["id"]) ## append tweet id's

 if status==tweets[0]:
 continue

 if status==tweets[1]:
 date = status["created_at"].encode('utf-8')
 user = status["user"]["screen_name"].encode('utf-8') 
 bio = status["user"]["description"].encode('utf-8')
 text = status["text"].encode('utf-8')

 with open(sfile,'a') as sf:
 sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")

 count += 1
 print(count)
 print(date, text)

edited Mar 7 at 16:03

asked Mar 7 at 15:42

Etienne Numérogliss Drt

I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.

– Martin Evans
Mar 8 at 8:55

add a comment |

BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.

Any help would be much appreciated !!
Thanks in advance

import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET) 

sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id] 
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours

 # tweet extract method with the last list item as the max_id
 print("new crawl, max_id:", id_list[-1])
 tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
 time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)

 for status in tweets:
 id_list.append(status["id"]) ## append tweet id's

 if status==tweets[0]:
 continue

 if status==tweets[1]:
 date = status["created_at"].encode('utf-8')
 user = status["user"]["screen_name"].encode('utf-8') 
 bio = status["user"]["description"].encode('utf-8')
 text = status["text"].encode('utf-8')

 with open(sfile,'a') as sf:
 sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")

 count += 1
 print(count)
 print(date, text)

edited Mar 7 at 16:03

asked Mar 7 at 15:42

Etienne Numérogliss Drt

I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.

– Martin Evans
Mar 8 at 8:55

add a comment |

BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.

Any help would be much appreciated !!
Thanks in advance

import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET) 

sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id] 
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours

 # tweet extract method with the last list item as the max_id
 print("new crawl, max_id:", id_list[-1])
 tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
 time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)

 for status in tweets:
 id_list.append(status["id"]) ## append tweet id's

 if status==tweets[0]:
 continue

 if status==tweets[1]:
 date = status["created_at"].encode('utf-8')
 user = status["user"]["screen_name"].encode('utf-8') 
 bio = status["user"]["description"].encode('utf-8')
 text = status["text"].encode('utf-8')

 with open(sfile,'a') as sf:
 sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")

 count += 1
 print(count)
 print(date, text)

edited Mar 7 at 16:03

asked Mar 7 at 15:42

Etienne Numérogliss Drt

BUT the problem is something goes wrong when extracting the tweets so that at a specific point the same tweet is stored again and again.

Any help would be much appreciated !!
Thanks in advance

import time
from twython import Twython, TwythonError, TwythonStreamer
twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET) 

sfile = "tweets_" + keyword + todays_date + ".csv"
id_list = [last_id] 
count = 0
while count < 3*60*60*2: #we set the loop to run for 3hours

 # tweet extract method with the last list item as the max_id
 print("new crawl, max_id:", id_list[-1])
 tweets = twitter.search(q=keyword, count=2, max_id=id_list[-1])["statuses"]
 time.sleep(2) ## 2 seconds rest between api calls (450 allowed within 15min window)

 for status in tweets:
 id_list.append(status["id"]) ## append tweet id's

 if status==tweets[0]:
 continue

 if status==tweets[1]:
 date = status["created_at"].encode('utf-8')
 user = status["user"]["screen_name"].encode('utf-8') 
 bio = status["user"]["description"].encode('utf-8')
 text = status["text"].encode('utf-8')

 with open(sfile,'a') as sf:
 sf.write(str(status["id"])+ "|||" + str(date) + "|||" + str(user) + "|||" + str(bio) + "|||" + str(text) + "n")

 count += 1
 print(count)
 print(date, text)

python csv twitter twython

edited Mar 7 at 16:03

asked Mar 7 at 15:42

Etienne Numérogliss Drt

edited Mar 7 at 16:03

asked Mar 7 at 15:42

Etienne Numérogliss Drt

edited Mar 7 at 16:03

asked Mar 7 at 15:42

Etienne Numérogliss Drt

asked Mar 7 at 15:42

Etienne Numérogliss Drt

asked Mar 7 at 15:42

Etienne Numérogliss Drt

I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.

– Martin Evans
Mar 8 at 8:55

add a comment |

I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.

– Martin Evans
Mar 8 at 8:55

I would recommend you stick to a standard comma delimiter for your CSV file. If your tweet contains a comma then the field is normally enclosed with quotes. It is also able to cope with newlines. Python's CSV library will handle all of this for you automatically.

– Martin Evans
Mar 8 at 8:55

add a comment |

1 Answer
1

active

oldest

votes

You should use Python's CSV library to write your CSV files. It takes a list containing all of the items for a row and automatically adds the delimiters for you. If a value contains a comma, it automatically adds quotes for you (which is how CSV files are meant to work). It can even handle newlines inside a value. If you open the resulting file into a spreadsheet application you will see it is correctly read in.

Rather than trying to use time.sleep(), a better approach is to work with absolute times. So the idea is to take your starting time and add three hours to it. You can then keep looping until this finish_time is reached.

The same approach can be made to your API call allocations. Keep a counter holding how many calls you have left and downcount it. If it reaches 0 then stop making calls until the next fifteen minute slot is reached.

timedelta() can be used to add minutes or hours to an existing datetime object. By doing it this way, your times will never slip out of sync.

The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:

from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID

fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)

calls_allowed = 450
calls_remaining = calls_allowed

now = datetime.now()
next_allocation = now + fifteen

todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()

with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
 csv_output = csv.writer(f_output)

 while now < finish_time:
 time.sleep(2)
 now = datetime.now()

 if now >= next_allocation:
 next_allocation += fifteen
 calls_remaining = calls_allowed
 print("New call allocation")

 if calls_remaining:
 calls_remaining -= 1
 print(f"Get tweets - calls_remaining calls remaining")

 # Simulate a tweet response
 id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
 date = "01.01.2019"
 user = "Fred"
 bio = "I am Fred"
 text = "Hello, this is a tweetnusing a comma and a newline."

 if id not in ids_seen:
 csv_output.writerow([id, date, user, bio, text])
 ids_seen.add(id)

As for the problem of keep writing the same Tweets. You could use a set() to hold all of the IDs that you have written. You could then test if a new tweet has already been seen before writing it again.

edited Mar 8 at 10:52

answered Mar 8 at 10:02

Martin Evans

28.4k133156

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55047659%2fproblem-while-storing-tweets-into-csv-file%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

timedelta() can be used to add minutes or hours to an existing datetime object. By doing it this way, your times will never slip out of sync.

The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:

from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID

fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)

calls_allowed = 450
calls_remaining = calls_allowed

now = datetime.now()
next_allocation = now + fifteen

todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()

with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
 csv_output = csv.writer(f_output)

 while now < finish_time:
 time.sleep(2)
 now = datetime.now()

 if now >= next_allocation:
 next_allocation += fifteen
 calls_remaining = calls_allowed
 print("New call allocation")

 if calls_remaining:
 calls_remaining -= 1
 print(f"Get tweets - calls_remaining calls remaining")

 # Simulate a tweet response
 id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
 date = "01.01.2019"
 user = "Fred"
 bio = "I am Fred"
 text = "Hello, this is a tweetnusing a comma and a newline."

 if id not in ids_seen:
 csv_output.writerow([id, date, user, bio, text])
 ids_seen.add(id)

edited Mar 8 at 10:52

answered Mar 8 at 10:02

Martin Evans

28.4k133156

add a comment |

timedelta() can be used to add minutes or hours to an existing datetime object. By doing it this way, your times will never slip out of sync.

The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:

from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID

fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)

calls_allowed = 450
calls_remaining = calls_allowed

now = datetime.now()
next_allocation = now + fifteen

todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()

with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
 csv_output = csv.writer(f_output)

 while now < finish_time:
 time.sleep(2)
 now = datetime.now()

 if now >= next_allocation:
 next_allocation += fifteen
 calls_remaining = calls_allowed
 print("New call allocation")

 if calls_remaining:
 calls_remaining -= 1
 print(f"Get tweets - calls_remaining calls remaining")

 # Simulate a tweet response
 id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
 date = "01.01.2019"
 user = "Fred"
 bio = "I am Fred"
 text = "Hello, this is a tweetnusing a comma and a newline."

 if id not in ids_seen:
 csv_output.writerow([id, date, user, bio, text])
 ids_seen.add(id)

edited Mar 8 at 10:52

answered Mar 8 at 10:02

Martin Evans

28.4k133156

add a comment |

timedelta() can be used to add minutes or hours to an existing datetime object. By doing it this way, your times will never slip out of sync.

The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:

from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID

fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)

calls_allowed = 450
calls_remaining = calls_allowed

now = datetime.now()
next_allocation = now + fifteen

todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()

with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
 csv_output = csv.writer(f_output)

 while now < finish_time:
 time.sleep(2)
 now = datetime.now()

 if now >= next_allocation:
 next_allocation += fifteen
 calls_remaining = calls_allowed
 print("New call allocation")

 if calls_remaining:
 calls_remaining -= 1
 print(f"Get tweets - calls_remaining calls remaining")

 # Simulate a tweet response
 id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
 date = "01.01.2019"
 user = "Fred"
 bio = "I am Fred"
 text = "Hello, this is a tweetnusing a comma and a newline."

 if id not in ids_seen:
 csv_output.writerow([id, date, user, bio, text])
 ids_seen.add(id)

edited Mar 8 at 10:52

answered Mar 8 at 10:02

Martin Evans

28.4k133156

timedelta() can be used to add minutes or hours to an existing datetime object. By doing it this way, your times will never slip out of sync.

The following shows a simulation of how you can make things work. You just need to add back your code to get your Tweets:

from datetime import datetime, timedelta
import time
import csv
import random # just for simulating a random ID

fifteen = timedelta(minutes=15)
finish_time = datetime.now() + timedelta(hours=3)

calls_allowed = 450
calls_remaining = calls_allowed

now = datetime.now()
next_allocation = now + fifteen

todays_date = now.strftime("%d_%m_%Y")
ids_seen = set()

with open(f'tweets_todays_date.csv', 'w', newline='') as f_output:
 csv_output = csv.writer(f_output)

 while now < finish_time:
 time.sleep(2)
 now = datetime.now()

 if now >= next_allocation:
 next_allocation += fifteen
 calls_remaining = calls_allowed
 print("New call allocation")

 if calls_remaining:
 calls_remaining -= 1
 print(f"Get tweets - calls_remaining calls remaining")

 # Simulate a tweet response
 id = random.choice(["1111", "2222", "3333", "4444"]) # pick a random ID
 date = "01.01.2019"
 user = "Fred"
 bio = "I am Fred"
 text = "Hello, this is a tweetnusing a comma and a newline."

 if id not in ids_seen:
 csv_output.writerow([id, date, user, bio, text])
 ids_seen.add(id)

edited Mar 8 at 10:52

answered Mar 8 at 10:02

Martin Evans

28.4k133156

edited Mar 8 at 10:52

answered Mar 8 at 10:02

Martin Evans

28.4k133156

answered Mar 8 at 10:02

Martin Evans

28.4k133156

answered Mar 8 at 10:02

Martin Evans

28.4k133156

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

1 Answer
1

1 Answer
1

1 Answer
1