|
@@ -1,3 +1,27 @@
|
|
|
# GooglePoiCrawler
|
|
|
|
|
|
-Google Poi Crawler
|
|
|
+Google Poi Crawler
|
|
|
+
|
|
|
+## Reviews crawler
|
|
|
+
|
|
|
+Code : `swire_shop_review.py`
|
|
|
+Execute:
|
|
|
+```python
|
|
|
+python swire_shop_review.py [port] [proxyport]
|
|
|
+```
|
|
|
+
|
|
|
+## DB information
|
|
|
+`swire_store_list`(line 103): store shop list
|
|
|
+`reviews_table`(line 232): store crawler result, use Function `save_js_to_db` to save parsing data into database
|
|
|
+> db_columns = ['author_id','author_page','author_name', 'author_image','author_review_count','review_time','review_content','review_image','store_review_time','store_review']
|
|
|
+`review_process` (line 271): store cralwer status
|
|
|
+
|
|
|
+### Crawler list
|
|
|
+Function `get_next_job` (line 98): get shop list, data column need ==shop url== (code called item_url), ==fid==, ==shop rating counts== (code called user_ratings_total)
|
|
|
+
|
|
|
+Use ==fid== as all db key
|
|
|
+
|
|
|
+## Page down function
|
|
|
+Function `get_reviews` (line 205): Check if the store has reviews first and use shop rating countsdivided by three as page down counts.
|
|
|
+
|
|
|
+
|