-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[QUESTION]A question about 'Resume' #677
Description
Issue Template
Please use this template!
Initial Check
If the issue is a request please specify that it is a request in the title (Example: [REQUEST] more features). If this is a question regarding 'twint' please specify that it's a question in the title (Example: [QUESTION] What is x?). Please only submit issues related to 'twint'. Thanks.
Make sure you've checked the following:
- [] Python version is 3.6;
- [] Updated Twint with
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
; - [] I have searched the issues and there are no duplicates of this issue/question/request.
Command Ran
Please provide the exact command ran including the username/search/code so I may reproduce the issue.
config = twint.Config()
config.Limit = 100000
config.Store_csv = True
config.Search = 'China'
config.Since = '2019-12-1'
config.Until = '2020-1-1'
config.Lang = 'en'
config.Output = '/root/datasets/unprocessed/China12.csv'
# config.Min_likes = 20
twint.run.Search(config)
I got this message:
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
Description of Issue
Please use as much detail as possible.
I'm having the same problem as #670 . So I won't repeat the details. I guess it is because Twitter has updated its anti-scrawler system. I'm actually doing research on data science and seeking a way to retrieve a large amount of tweets efficiently. When I set Limit to 100,000, the retrieval process always stops at around 20,000.
So I consider using 'Resume' as an alternative to deal with this situation. But I can hardly find any information on your page about how to use it. The doc only mentions providing the path of a file containing the scroll id. But what is that file? Is that the csv file created by twint in the last loop? And, what is that scroll id? How could I get the so-call scroll id from the last loop and store it to a file? I suggest you provide a more specific explaination about it in the doc.
I really appreciate your work. Thank you.
Environment Details
Using Windows, Linux? What OS version? Running this in Anaconda? Jupyter Notebook? Terminal?
Centos