Google Poi Crawler

noodles 19f1b843af md_edit 2 years ago
result 5444ae2209 url_list 3 years ago
utility 8719bd91d8 w 2 years ago
HKS須重爬店家.csv 73f7dbb91c e 2 years ago
README.md 19f1b843af md_edit 2 years ago
category.csv e1c01538c6 w 3 years ago
details.xls 351f83b5b0 w 2 years ago
get_google_id.py cb1ac6f6f2 'edit' 2 years ago
hot_pot.xls 351f83b5b0 w 2 years ago
jared_pureselenium_shop_item_list.py fd9129d878 w 3 years ago
jared_run.py e2dab507f9 test 2 years ago
jared_shop_item_list.py e2dab507f9 test 2 years ago
lat_long_location.csv b696670aac 上傳檔案到 '' 3 years ago
lat_long_search.py b696670aac 上傳檔案到 '' 3 years ago
linux_loop.sh 976f9dc15c w 2 years ago
location_list.csv b696670aac 上傳檔案到 '' 3 years ago
loop_5555.bat 14b721db42 w 2 years ago
loop_6666.bat 14b721db42 w 2 years ago
loop_rep.py e944b9e077 w 2 years ago
loop_storelist.bat cb9bffedba w 2 years ago
run.py 351f83b5b0 w 2 years ago
run2.py 2296d4de40 edit 3 years ago
run3.py 72e5d14360 w 2 years ago
run4.py e8fec9375a w 2 years ago
run5.py 6010d36566 e 2 years ago
shop_item_list.py 295a356f5b w 2 years ago
start.sh 5e08212582 edit 3 years ago
swire_docker_itemlist.py 67186b0079 w 2 years ago
swire_shop_item_list.py 1abe606703 w 2 years ago
swire_shop_review.py 52632418ca review_update 2 years ago

README.md

GooglePoiCrawler

Google Poi Crawler

Reviews crawler

Code : swire_shop_review.py Execute:

python swire_shop_review.py [port] [proxyport]

DB information

swire_store_list(line 103): store shop list reviews_table(line 232): store crawler result, use Function save_js_to_db to save parsing data into database

db_columns = ['author_id','author_page','author_name', 'author_image','author_review_count','review_time','review_content','review_image','store_review_time','store_review'] review_process (line 271): store cralwer status

Crawler list

Function get_next_job (line 98): get shop list, data column need ==shop url== (code called item_url), ==fid==, ==shop rating counts== (code called user_ratings_total)

Use ==fid== as all db key

Page down function

Function get_reviews (line 205): Check if the store has reviews first and use shop rating countsdivided by three as page down counts.