noodles před 2 roky
rodič
revize
19f1b843af
1 změnil soubory, kde provedl 25 přidání a 1 odebrání
  1. 25 1
      README.md

+ 25 - 1
README.md

@@ -1,3 +1,27 @@
 # GooglePoiCrawler
 
-Google Poi Crawler
+Google Poi Crawler
+
+## Reviews crawler
+
+Code : `swire_shop_review.py`
+Execute:
+```python
+python swire_shop_review.py [port] [proxyport]
+```
+
+## DB information
+`swire_store_list`(line 103): store shop list
+`reviews_table`(line 232): store crawler result, use Function `save_js_to_db` to save parsing data into database
+> db_columns = ['author_id','author_page','author_name', 'author_image','author_review_count','review_time','review_content','review_image','store_review_time','store_review']
+`review_process` (line 271): store cralwer status
+
+### Crawler list
+Function `get_next_job` (line 98): get shop list, data column need ==shop url== (code called item_url), ==fid==, ==shop rating counts== (code called user_ratings_total)
+
+Use ==fid== as all db key
+
+## Page down function
+Function `get_reviews` (line 205): Check if the store has reviews first and use shop rating countsdivided by three as page down counts. 
+
+