執行完畢後,在tutorial根目錄下就會有一個名為 items.json的文件。
內容如下:
-
#items.json 文件内容
-
[
-
{"title": "\u4e2d\u56fd\u8bba\u6587\u5199\u53d1\u7f51", "desc": "\u4e2d\u56fd\u8bba\u6587\u5199\u53d1\u7f51\u63d0\u4f9b\u514d\u8d39\u8bba\u6587,\u804c\u79f0\u8bba\u6587,\u6bd5\u4e1a\u8bba\u6587,\u7855\u58eb\u8bba\u6587,\u672c\u79d1\u8bba\u6587,MBA\u8bba\u6587,\u7535\u5927\u8bba\u6587,\u8ff0\u804c\u62a5\u544a,\u8bba\u6587\u4e0b\u8f7d,\u5de5\u4f5c\u603b\u7ed3,\u8bba\u6587\u63a8\u8350\u53d1\u8868,\u8bba\u6587\u5199\u4f5c\u6307\u5bfc,\u8bba\u6587\u7ffb\u8bd1\u7b49\u670d\u52a1,\u7f51\u5740www.lwxfw.com", "link": "http://www.dmozdir.org/SiteInformation/?www.lwxfw.com-----13589-----.shtml"},
-
{"title": "\u4e13\u6ce8\u4ee3\u5199\u8bba\u6587\u7f51,\u8bba\u6587\u4ee3\u5199,\u7855\u58eb\u8bba\u6587\u4ee3\u5199,\u535a\u58eb\u8bba\u6587\u4ee3\u5199", "desc": "\u4e13\u6ce8\u4ee3\u5199\u8bba\u6587\u7f51,\u8bba\u6587\u4ee3\u5199,\u7855\u58eb\u8bba\u6587\u4ee3\u5199,\u535a\u58eb\u8bba\u6587\u4ee3\u5199,\u5404\u7c7b\u804c\u79f0\u8bba\u6587\u4ee3\u5199\u4ee3\u53d1!", "link": "http://www.dmozdir.org/SiteInformation/?www.zzlunwen010.com-----28351-----.shtml"},
-
{"title": "\u8bba\u6587\u5929\u4e0b", "desc": "\u8bba\u6587\u5929\u4e0b\uff0c\u514d\u8d39\u63d0\u4f9b\uff1a\u8bba\u6587\u8303\u6587\uff0c\u514d\u8d39\u8bba\u6587\uff0c\u8bba\u6587\u5927\u5168\uff0c \u8bba\u6587\u4e0b\u8f7d\uff0c\u8bba\u6587\u683c\u5f0f\uff0c\u8bba\u6587\u63d0\u7eb2\uff0c\u8bba\u6587\u53d1\u8868\uff0c\u8bba\u6587\u5f00\u9898\u62a5\u544a\uff0c\u8bba\u6587\u9898\u76ee\u7b49\u8d44\u6599\u7684\u67e5\u9605\uff0c\u6709\u507f\u63d0\u4f9b\uff1a\u8bba\u6587\u4ee3\u5199\u3001\u4ee3\u53d1\u670d\u52a1\uff01", "link": "http://www.dmozdir.org/SiteInformation/?www.su30.net-----20547-----.shtml"},
-
{"title": "\u6cb3\u5357\u6559\u5e08\u7f51", "desc": "\u6cb3\u5357\u6559\u5e08\u7f51/\u6cb3\u5357\u6559\u5e08\u8003\u8bd5\u7f51/\u6cb3\u5357\u6559\u5e08\u8d44\u683c\u7f51/\u6cb3\u5357\u6559\u80b2\u4fe1\u606f\u7f51/\u6cb3\u5357\u6559\u5e08\u8d44\u683c\u8bc1\u5386\u5e74\u771f\u9898/\u6cb3\u5357\u6559\u5e08\u8d44\u683c\u8bc1\u590d\u4e60\u8d44\u6599/\u6cb3\u5357\u62db\u6559\u8003\u8bd5\u771f\u9898/\u6cb3\u5357\u62db\u6559\u8003\u8bd5\u590d\u4e60\u8d44\u6599/\u5b66\u4e60\u7b14\u8bb0/\u4e2d\u56fd\u62db\u6559\u7f51/\u6cb3\u5357\u62db\u6559\u7f51/\u6cb3\u5357\u6559\u5e08\u8d44\u683c\u7f51", "link": "http://www.dmozdir.org/SiteInformation/?www.hateacher.com-----31307-----.shtml"},
-
{"title": "\u4e45\u4e45\u8bba\u6587\u68c0\u6d4b", "desc": "\u4e45\u4e45\u8bba\u6587\u68c0\u6d4b\u7f51\u4e13\u4e1a\u63d0\u4f9b\u514d\u8d39\u8bba\u6587\u68c0\u6d4b\u3001\u8bba\u6587\u68c0\u6d4b\u8f6f\u4ef6\u3001\u8bba\u6587\u6284\u88ad\u68c0\u6d4b\u3001\u77e5\u7f51\u8bba\u6587\u68c0\u6d4b\u3001\u4e07\u65b9\u8bba\u6587\u68c0\u6d4b\u3001\u8bba\u6587\u4fee\u6539\u8d44\u6599\u4ee5\u53ca\u514d\u8d39\u8bba\u6587\u68c0\u6d4b\u7cfb\u7edf\u3002\u8ba9\u60a8\u6bd5\u4e1a\u7b54\u8fa9\u65e0\u5fe7\uff01", "link": "http://www.dmozdir.org/SiteInformation/?www.99fx.net-----38891-----.shtml"},
-
{"title": "\u674e\u56fd\u65fa\u5de5\u4f5c\u5ba4", "desc": "\u9ad8\u4e09\u653f\u6cbb\u6559\u5b66\uff0c\u653f\u6cbb\u9ad8\u8003\uff0c\u9ad8\u4e2d\u653f\u6cbb\u65b0\u8bfe\u6807\uff0c\u653f\u6cbb\u8bd5\u5377\uff0c\u9ad8\u4e2d\u653f\u6cbb\u7f51\u5740\u3002", "link": "http://www.dmozdir.org/SiteInformation/?www.lgwlncy.com-----12221-----.shtml"},
-
{"title": "\u7b14\u6746\u5b50\u8bba\u6587", "desc": "\u7b14\u6746\u5b50\u8bba\u6587\u7f51\u63d0\u4f9b\u514d\u8d39\u8bba\u6587\u3001\u6bd5\u4e1a\u8bba\u6587\u3001\u8bba\u6587\u8303\u6587\u3001\u8bba\u6587\u4e0b\u8f7d\u3001\u5404\u4e13\u4e1a\u8bba\u6587\u3001\u5de5\u4f5c\u603b\u7ed3\u3001\u8bba\u6587\u5b9a\u5236\u3001\u53d1\u8868\u8bba\u6587\u3001\u8d2d\u4e70\u8bba\u6587\u3001\u8bba\u6587\u5199\u4f5c\u6307\u5bfc\u7b49\u670d\u52a1", "link": "http://www.dmozdir.org/SiteInformation/?www.bgzlw.com-----45851-----.shtml"},
-
{"title": "\u4e2d\u56fd\u8bba\u6587\u70ed\u7ebf\u7f51", "desc": "\u4e2d\u56fd\u8bba\u6587\u70ed\u7ebf\u7f51\u63d0\u4f9b\u804c\u79f0\u8bba\u6587\u63a8\u8350\u53d1\u8868\u3001\u7701\u7ea7\u520a\u7269\u3001\u6838\u5fc3\u520a\u7269\u3001CN\u3001ISSN\u520a\u7269\u63a8\u8350\u53d1\u8868\u7b49\u670d\u52a1,\u53ef\u4ee5\u63a8\u8350\u53d1\u8868\u591a\u4e13\u4e1a\u804c\u79f0\u8bba\u6587,\u662f\u60a8\u804c\u79f0\u8bc4\u5ba1\u8bba\u6587\u53d1\u8868\u7684\u6700\u4f73\u4f19\u4f34,\u7f51\u5740www.lwrxw.com", "link": "http://www.dmozdir.org/SiteInformation/?www.lwrxw.com-----15692-----.shtml"},
-
{"title": "\u5c31\u8981\u5b66\u4e60\u7f51", "desc": "\u5c31\u8981\u5b66\u4e60\u7f51\u662f\u96c6\u6559\u6848\uff0c\u8bfe\u4ef6\uff0c\u8bd5\u5377\uff0c\u6bd5\u4e1a\u8bba\u6587\uff0c\u6559\u5b66\u89c6\u9891\u4e3a\u4e00\u4f53\u7684\u514d\u8d39\u8d44\u6e90\u7f51\u3002", "link": "http://www.dmozdir.org/SiteInformation/?www.62355065.cn-----11960-----.shtml"},
-
{"title": "\u65b0\u8bba\u6587\u4ee3\u5199\u7f51", "desc": "\u6bd5\u4e1a\u8bba\u6587|\u6bd5\u4e1a\u8bbe\u8ba1|\u6bd5\u4e1a\u8bba\u6587\u8303\u6587|\u8ba1\u7b97\u673a\u6bd5\u4e1a\u8bbe\u8ba1|\u6bd5\u4e1a\u8bba\u6587\u683c\u5f0f\u8303\u6587|\u673a\u68b0\u6bd5\u4e1a\u8bbe\u8ba1|\u884c\u653f\u7ba1\u7406\u6bd5\u4e1a\u8bba\u6587|\u6bd5\u4e1a\u8bbe\u8ba1\u5f00\u9898\u62a5\u544a|\u8ba1\u7b97\u673a\u7f51\u7edc\u6bd5\u4e1a\u8bba\u6587|\u6bd5\u4e1a\u8bbe\u8ba1\u8bba\u6587|\u6bd5\u4e1a\u8bba\u6587\u7f51|\u4ee3\u505a\u6bd5\u4e1a\u8bbe\u8ba1|\u600e\u6837\u5199\u6bd5\u4e1a\u8bba\u6587", "link": "http://www.dmozdir.org/SiteInformation/?www.newlw.com-----25276-----.shtml"},
-
{"title": "\u5929\u559c\u7f18\u5a5a\u4ecb\u7f51-\u6700\u597d\u7684\u5a5a\u5f81\u5a5a\u4ecb\u7f51\u7ad9", "desc": "\u5929\u559c\u7f18\u5a5a\u4ecb\u5a5a\u5e86\u7f51\u662f\u6d4e\u5357\u6700\u4e13\u4e1a\u7684\u5a5a\u4ecb\u7f51\u7ad9\u3001\u5a5a\u5e86\u7f51\u7ad9\uff0c\u4ea4\u53cb\u7f51\u7ad9\uff0c\u53ca\u6d4e\u5357\u5f81\u5a5a\u3001\u6d4e\u5357\u4ea4\u53cb\u3001\u6d4e\u5357\u5a5a\u4ecb\u3001\u6d4e\u5357\u5e86\u5178\u3001\u6d4e\u5357\u793c\u4eea\u4e8e\u4e00\u4f53\uff0c\u7f51\u4e0b\u6709\u5b9e\u4f53\u5e97\u9762-\u6d4e\u5357\u5e02\u5e02\u4e2d\u533a\u5929\u559c\u7f18\u5a5a\u4ecb\u5a5a\u5e86\u4e2d\u5fc3\uff0c\u4e0d\u5b9a\u671f\u4e3e\u529e\u8054\u8c0a\u6d3b\u52a8\uff0c\u4fdd\u8bc1\u4f1a\u5458\u6210\u529f\u7387", "link": "http://www.dmozdir.org/SiteInformation/?www.love219.com-----14846-----.shtml"},
-
{"title": "\u6210\u90fd\u76db\u4e16\u9633\u5149\u5a5a\u5e86\u7b56\u5212\u6709\u9650\u516c\u53f8", "desc": "\u8bda\u4fe1\u6295\u8d44\u63a7\u80a1\u96c6\u56e2\u5c5e\u4e8e\u56db\u5ddd\u7701\u5927\u578b\u4f01\u4e1a\u96c6\u56e2\uff0c\u5ddd\u5185\u6392\u4e8e\u524d20\u540d\uff0c\u6ce8\u518c\u8d44\u91d13.5\u4ebf\u5143\uff0c\u62e5\u6709\u56fa\u5b9a\u8d44\u4ea746.5\u4ebf\u3002\u516c\u53f8\u603b\u90e8\u4f4d\u4e8e\u6210\u90fd\u5e02\u81f4\u6c11\u4e1c\u8def1\u53f7\u3002\u5728\u5317\u4eac\u3001\u4e0a\u6d77\u3001\u65b0\u7586\u7b49\u5730\u8bbe\u6709\u5206\u516c\u53f8\u3002\u8bda\u4fe1\u76db\u4e16\u9633\u5149\u5a5a\u5e86\u516c\u53f8\u662f\u5176\u5b50\u516c\u53f8\u3002", "link": "http://www.dmozdir.org/SiteInformation/?www.ssyg520.com-----27215-----.shtml"},
-
{"title": "\u60c5\u4eba\u7f51", "desc": "\u60c5\u4eba\u7f51\u4ea4\u53cb\u4e2d\u5fc3\u4e3a\u4f60\u63d0\u4f9b\u6700\u4f73\u7684\u7f51\u4e0a\u60c5\u4eba\u4ea4\u53cb\u673a\u4f1a\uff0c\u8db3\u4e0d\u51fa\u6237\u4fbf\u80fd\u8ba9\u4f60\u6709\u66f4\u591a\u7684\u9009\u62e9\uff01", "link": "http://www.dmozdir.org/SiteInformation/?www.591lover.net-----36999-----.shtml"},
-
{"title": "\u56fd\u9645\u514d\u8d39\u5a5a\u4ecb\u4ea4\u53cb\u7f51\u7ad9-\u76f8\u7ea6100", "desc": "\u56fd\u9645\u514d\u8d39\u5a5a\u4ecb\u4ea4\u53cb\u7f51\u7ad9\u662f\u76f8\u7ea6100\u63d0\u4f9b\u7684\u5b8c\u5168\u514d\u8d39\u7684\u56fd\u9645\u4ea4\u53cb\u7f51\u7ad9\u3002\u4f1a\u5458\u4ee5\u534e\u4eba\u4e3a\u4e3b\u904d\u5e03\u4e94\u6e56\u56db\u6d77,\u6240\u6709\u4f1a\u5458\u5b8c\u5168\u514d\u8d39\u3002\u6240\u6709\u5bfb\u627e\u56fd\u9645\u514d\u8d39\u5a5a\u4ecb\u4ea4\u53cb\u7f51\u7ad9\u7684\u670b\u53cb\u90fd\u80fd\u5728\u56fd\u9645\u4ea4\u53cb\u7f51\u7ad9\u5728\u627e\u5230\u5b8c\u5168\u514d\u8d39\u7684\u56fd\u9645\u514d\u8d39\u5a5a\u4ecb\u4ea4\u53cb\u7f51\u7ad9\u670d\u52a1", "link": "http://www.dmozdir.org/SiteInformation/?www.free-onlinedating.me-----10110-----.shtml"},
-
{"title": "\u5b89\u5fbd\u5a5a\u5e86\u7f51", "desc": "\u5b89\u5fbd\u5a5a\u5e86\u7f51", "link": "http://www.dmozdir.org/SiteInformation/?www.ahhqw.com-----18983-----.shtml"},
-
{"title": "\u805a\u7f18\u5317\u6d77\u4ea4\u53cb\u7f51", "desc": "\u805a\u7f18\u5317\u6d77\u4ea4\u53cb\u7f51\u662f\u5317\u6d77\u5730\u533a\u8f83\u89c4\u8303\u7684\u5a5a\u604b\u4ea4\u53cb\u7f51\u7ad9\uff0c\u81f4\u529b\u4e8e\u8425\u9020\u6709\u8da3\u800c\u5b89\u5168\u7684\u7f51\u7edc\u4ea4\u53cb\u793e\u533a\uff0c\u63d0\u4f9b\u641c\u7d22\u3001\u7f8e\u6587\u3001\u7ea6\u4f1a\u3001\u65e5\u8bb0\u3001\u804a\u5929\u3001\u7b49\u591a\u9879\u4ea4\u53cb\u670d\u52a1\u3002\u5e76\u4e0e\u5730\u65b9\u5a5a\u4ecb\u90e8\u95e8\u5efa\u7acb\u4e86\u826f\u597d\u7684\u5408\u4f5c\u5173\u7cfb\u3002", "link": "http://www.dmozdir.org/SiteInformation/?www.jyjjyy.com-----19343-----.shtml"},
-
{"title": "\u7231\u6211\u5427\u5a5a\u604b\u7f51", "desc": "\u7231\u6211\u5427\u5a5a\u604b\u7f51\u662f\u4e00\u4e2a\u771f\u5b9e\u3001\u4e25\u8083\u3001\u9ad8\u54c1\u4f4d\u7684\u5a5a\u604b\u5e73\u53f0\uff0c\u63d0\u4f9b\u79d1\u5b66\u3001\u9ad8\u6548\u7684\u5168\u7a0b\u670d\u52a1\uff0c\u5e2e\u52a9\u771f\u5fc3\u5bfb\u627e\u7ec8\u8eab\u4f34\u4fa3\u7684\u4eba\u58eb\u5b9e\u73b0\u548c\u8c10\u5a5a\u604b\uff0c\u52aa\u529b\u8425\u9020\u56fd\u5185\u6700\u4e13\u4e1a\u3001\u4e25\u8083\u7684\u5a5a\u604b\u4ea4\u53cb\u5e73", "link": "http://www.dmozdir.org/SiteInformation/?www.lovemeba.com-----9983-----.shtml"},
-
{"title": "77\u56fd\u9645\u4ea4\u53cb\u7f51", "desc": "\u7eaf\u516c\u76ca\u6027\uff0c\u7231\u5fc3\u793e\u4ea4\u7f51\u7ad9\uff0c\u4e3a\u5e7f\u5927\u9752\u5e74\u53ca\u5355\u8eab\u4eba\u58eb\u63d0\u4f9b\u7684\u5168\u514d\u8d39\u4ea4\u53cb\u5e73\u53f0\u3002", "link": "http://www.dmozdir.org/SiteInformation/?www.77lds.com-----37176-----.shtml"},
-
{"title": "\u4e1c\u839e\u97e9\u98ce\u5c1a\u5a5a\u7eb1\u6444\u5f71\u5de5\u4f5c\u5ba4", "desc": "\u4e1c\u839e\u97e9\u98ce\u5c1a\u5a5a\u7eb1\u6444\u5f71\u5de5\u4f5c\u5ba4\u662f\u5177\u6709\u72ec\u7279\u7684\u97e9\u56fd\u98ce\u683c\u7684\u4e1c\u839e\u5a5a\u7eb1\u6444\u5f71\u5de5\u4f5c\u5ba4\uff0c\u97e9\u98ce\u5c1a\u4f4d\u4e8e\u4e1c\u839e\u4e1c\u57ce\u533a\u65d7\u5cf0\u8def\u56fd\u6cf0\u5927\u53a610\u53f7,\u6211\u4eec\u6c38\u8fdc\u6ee1\u6000\u521b\u610f\u4e0e\u6e29\u60c5,\u901a\u8fc7\u4e00\u5bf9\u4e00\u7684\u670d\u52a1\u4e3a\u60a8\u63d0\u4f9b\u8d85\u8d8a\u60a8\u671f\u671b", "link": "http://www.dmozdir.org/SiteInformation/?www.dg-hfs.com-----18760-----.shtml"},
-
{"title": "\u767e\u5408\u5a5a\u793c\u793e\u533a", "desc": "\u767e\u5408\u5a5a\u793c\u793e\u533a\u8ba8\u8bba\u8bdd\u9898\u6db5\u76d6\u5a5a\u7eb1\u7167\u3001\u5a5a\u7eb1\u6444\u5f71\u3001\u5a5a\u793c\u7b79\u5907\u3001\u5a5a\u7eb1\u793c\u670d\u3001\u5a5a\u5e86\u7b49\u65b9\u9762", "link": "http://www.dmozdir.org/SiteInformation/?www.lilywed.cn-----9976-----.shtml"}
-
]
得到的保存的文件的內容就是我們需要的,但是這是二進制編碼的形式。
(我目前還沒有找到用於Python 3 的解決方案,以後解決了再補充,也希望各位大佬看到了,能夠不吝賜教。謝謝!)
哈哈,問題已經解決了,請看下面:
首先需要解釋一點就是:pipeline.py 就是用於處理item 的,所以,我們在pipeline.py 文件中對保存的文件進行處理操作:
將pipeline.py 寫成這樣:
-
# -*- coding: utf-8 -*-
-
-
# Define your item pipelines here
-
#
-
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
-
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
-
-
import json
-
-
class TutorialPipeline(object):
-
def __init__(self):
-
self.f = open('items.json', 'wb')
-
def process_item(self, item, spider):
-
line = json.dumps(dict(item), ensure_ascii = False) + "\n"
-
self.f.write(line.encode('utf-8'))
-
return item
-
def close_spider(self, spider):
-
self.f.close()
因為讀取到的網頁是二進製文件,所以我們在__init__ 方法中, 建一個名為items.json 的文件,以二進制形式寫入。
在process_item 方法中,對item 文件進行編碼寫入操作,最後在close_spider 方法中,關閉文件。
接下來,就在settings.py 文件中開啟pipeline,加入下面的命令即可:
-
ITEM_PIPELINES = {
-
'tutorial.pipelines.TutorialPipeline': 300,
-
}
其中,TutorialPipeline 就是pipeline.py 文件中的類名
另外有一點需要提醒的是:
因為我們在pipeline.py中完成了新建文件的操作,所以在CMD中輸入的命令應該改為:scrapy crawl dmoz -t json
-
C:\Users\XiangyangDai\Desktop\tutorial>scrapy crawl dmoz -t json
-
2018-12-17 21:43:57 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: tutorial)
-
2018-12-17 21:43:57 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Windows-10-10.0.17134-SP0
-
2018-12-17 21:43:57 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
-
2018-12-17 21:43:57 [scrapy.middleware] INFO: Enabled extensions:
-
['scrapy.extensions.logstats.LogStats',
-
'scrapy.extensions.telnet.TelnetConsole',
-
'scrapy.extensions.corestats.CoreStats']
-
2018-12-17 21:43:58 [scrapy.middleware] INFO: Enabled downloader middlewares:
-
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
-
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
-
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
-
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
-
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
-
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
-
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
-
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
-
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
-
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
-
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
-
'scrapy.downloadermiddlewares.stats.DownloaderStats']
-
2018-12-17 21:43:58 [scrapy.middleware] INFO: Enabled spider middlewares:
-
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
-
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
-
'scrapy.spidermiddlewares.referer.RefererMiddleware',
-
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
-
'scrapy.spidermiddlewares.depth.DepthMiddleware']
-
2018-12-17 21:43:58 [scrapy.middleware] INFO: Enabled item pipelines:
-
['tutorial.pipelines.TutorialPipeline']
-
2018-12-17 21:43:58 [scrapy.core.engine] INFO: Spider opened
-
2018-12-17 21:43:58 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
-
2018-12-17 21:43:58 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
-
2018-12-17 21:43:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.dmozdir.org/robots.txt> (referer: None)
-
2018-12-17 21:43:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.dmozdir.org/Category/?SmallPath=230> (referer: None)
-
2018-12-17 21:43:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.dmozdir.org/Category/?SmallPath=411> (referer: None)
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '中国论文写发网提供免费论文,职称论文,毕业论文,硕士论文,本科论文,MBA论文,电大论文,述职报告,论文下载,工作总结,论 文推荐发表,论文写作指导,论文翻译等服务,网址www.lwxfw.com',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.lwxfw.com-----13589-----.shtml',
-
'title': '中国论文写发网'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '专注代写论文网,论文代写,硕士论文代写,博士论文代写,各类职称论文代写代发!',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.zzlunwen010.com-----28351-----.shtml',
-
'title': '专注代写论文网,论文代写,硕士论文代写,博士论文代写'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '论文天下,免费提供:论文范文,免费论文,论文大全, '
-
'论文下载,论文格式,论文提纲,论文发表,论文开题报告,论文题目等资料的查阅,有偿提供:论文代写、代发服务!',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.su30.net-----20547-----.shtml',
-
'title': '论文天下'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '河南教师网/河南教师考试网/河南教师资格网/河南教育信息网/河南教师资格证历年真题/河南教师资格证复习资料/河南招教考试真题/河南招教考试复习资料/学习笔记/中国招教网/河南招教网/河南教师资格网',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.hateacher.com-----31307-----.shtml',
-
'title': '河南教师网'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '久久论文检测网专业提供免费论文检测、论文检测软件、论文抄袭检测、知网论文检测、万方论文检测、论文修改资料以及免费论文检测系统。让您毕业答辩无忧!',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.99fx.net-----38891-----.shtml',
-
'title': '久久论文检测'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '高三政治教学,政治高考,高中政治新课标,政治试卷,高中政治网址。',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.lgwlncy.com-----12221-----.shtml',
-
'title': '李国旺工作室'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '笔杆子论文网提供免费论文、毕业论文、论文范文、论文下载、各专业论文、工作总结、论文定制、发表论文、购买论文、论文写作指导等服务',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.bgzlw.com-----45851-----.shtml',
-
'title': '笔杆子论文'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '中国论文热线网提供职称论文推荐发表、省级刊物、核心刊物、CN、ISSN刊物推荐发表等服务,可以推荐发表多专业职称论文,是您职称评审论文发表的最佳伙伴,网址www.lwrxw.com',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.lwrxw.com-----15692-----.shtml',
-
'title': '中国论文热线网'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '就要学习网是集教案,课件,试卷,毕业论文,教学视频为一体的免费资源网。',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.62355065.cn-----11960-----.shtml',
-
'title': '就要学习网'}
-
2018-12-17 21:43:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=230>
-
{'desc': '毕业论文|毕业设计|毕业论文范文|计算机毕业设计|毕业论文格式范文|机械毕业设计|行政管理毕业论文|毕业设计开题报告|计算机网络毕业论文|毕业设计论文|毕业论文网|代做毕业设计|怎样写毕业论文',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.newlw.com-----25276-----.shtml',
-
'title': '新论文代写网'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '天喜缘婚介婚庆网是济南最专业的婚介网站、婚庆网站,交友网站,及济南征婚、济南交友、济南婚介、济南庆典、济南礼仪于一体,网下有实体店面-济南市市中区天喜缘婚介婚庆中心,不定期举办联谊活动,保证会员成功率',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.love219.com-----14846-----.shtml',
-
'title': '天喜缘婚介网-最好的婚征婚介网站'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '诚信投资控股集团属于四川省大型企业集团,川内排于前20名,注册资金3.5亿元,拥有固定资产46.5亿。公司总部位于成都 市致民东路1号。在北京、上海、新疆等地设有分公司。诚信盛世阳光婚庆公司是其子公司。',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.ssyg520.com-----27215-----.shtml',
-
'title': '成都盛世阳光婚庆策划有限公司'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '情人网交友中心为你提供最佳的网上情人交友机会,足不出户便能让你有更多的选择!',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.591lover.net-----36999-----.shtml',
-
'title': '情人网'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '国际免费婚介交友网站是相约100提供的完全免费的国际交友网站。会员以华人为主遍布五湖四海,所有会员完全免费。所有寻找国际免费婚介交友网站的朋友都能在国际交友网站在找到完全免费的国际免费婚介交友网站服务',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.free-onlinedating.me-----10110-----.shtml',
-
'title': '国际免费婚介交友网站-相约100'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '安徽婚庆网',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.ahhqw.com-----18983-----.shtml',
-
'title': '安徽婚庆网'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '聚缘北海交友网是北海地区较规范的婚恋交友网站,致力于营造有趣而安全的网络交友社区,提供搜索、美文、约会、日记、聊天、等多项交友服务。并与地方婚介部门建立了良好的合作关系。',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.jyjjyy.com-----19343-----.shtml',
-
'title': '聚缘北海交友网'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '爱我吧婚恋网是一个真实、严肃、高品位的婚恋平台,提供科学、高效的全程服务,帮助真心寻找终身伴侣的人士实现和谐婚恋,努力营造国内最专业、严肃的婚恋交友平',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.lovemeba.com-----9983-----.shtml',
-
'title': '爱我吧婚恋网'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '纯公益性,爱心社交网站,为广大青年及单身人士提供的全免费交友平台。',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.77lds.com-----37176-----.shtml',
-
'title': '77国际交友网'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '东莞韩风尚婚纱摄影工作室是具有独特的韩国风格的东莞婚纱摄影工作室,韩风尚位于东莞东城区旗峰路国泰大厦10号,我们 永远满怀创意与温情,通过一对一的服务为您提供超越您期望',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.dg-hfs.com-----18760-----.shtml',
-
'title': '东莞韩风尚婚纱摄影工作室'}
-
2018-12-17 21:43:59 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dmozdir.org/Category/?SmallPath=411>
-
{'desc': '百合婚礼社区讨论话题涵盖婚纱照、婚纱摄影、婚礼筹备、婚纱礼服、婚庆等方面',
-
'link': 'http://www.dmozdir.org/SiteInformation/?www.lilywed.cn-----9976-----.shtml',
-
'title': '百合婚礼社区'}
-
2018-12-17 21:43:59 [scrapy.core.engine] INFO: Closing spider (finished)
-
2018-12-17 21:43:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
-
{'downloader/request_bytes': 698,
-
'downloader/request_count': 3,
-
'downloader/request_method_count/GET': 3,
-
'downloader/response_bytes': 14618,
-
'downloader/response_count': 3,
-
'downloader/response_status_count/200': 3,
-
'finish_reason': 'finished',
-
'finish_time': datetime.datetime(2018, 12, 17, 13, 43, 59, 33263),
-
'item_scraped_count': 20,
-
'log_count/DEBUG': 24,
-
'log_count/INFO': 7,
-
'response_received_count': 3,
-
'scheduler/dequeued': 2,
-
'scheduler/dequeued/memory': 2,
-
'scheduler/enqueued': 2,
-
'scheduler/enqueued/memory': 2,
-
'start_time': datetime.datetime(2018, 12, 17, 13, 43, 58, 626475)}
-
2018-12-17 21:43:59 [scrapy.core.engine] INFO: Spider closed (finished)
items.json 文件內容如下:
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.lwxfw.com-----13589-----.shtml", "title": "中国论文写发网", "desc": "中国论文写发网提供免费论文,职称论文,毕业论文,硕士论文,本科论文,MBA论文,电大论文,述职报告,论文下载,工作总结,论文推荐发表,论文写作指导,论文翻译等服务,网址www.lwxfw.com"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.zzlunwen010.com-----28351-----.shtml", "title": "专注代写论文网,论文代写,硕士论文代写,博士论文代写", "desc": "专注代写论文网,论文代写,硕士论文代写,博士论文代写,各类职称论文代写代发!"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.su30.net-----20547-----.shtml", "title": "论文天下", "desc": "论文天下,免费提供:论文范文,免费论文,论文大全, 论文下载,论文格式,论文提纲,论文发表,论文开题报告,论文题目等资料的查阅,有偿提供:论文代写、代发服务!"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.hateacher.com-----31307-----.shtml", "title": "河南教师网", "desc": "河南教师网/河南教师考试网/河南教师资格网/河南教育信息网/河南教师资格证历年真题/河南教师资格证复习资料/河南招教考试真题/河南招教考试复习资料/学习笔记/中国招教网/河南招教网/河南教师资格网"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.99fx.net-----38891-----.shtml", "title": "久久论文检测", "desc": "久久论文检测网专业提供免费论文检测、论文检测软件、论文抄袭检测、知网论文检测、万方论文检测、论文修改资料以及免费论文检测系统。让您毕业答辩无忧!"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.lgwlncy.com-----12221-----.shtml", "title": "李国旺工作室", "desc": "高三政治教学,政治高考,高中政治新课标,政治试卷,高中政治网址。"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.bgzlw.com-----45851-----.shtml", "title": "笔杆子论文", "desc": "笔杆子论文网提供免费论文、毕业论文、论文范文、论文下载、各专业论文、工作总结、论文定制、发表论文、购买论文、论文写作指导等服务"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.lwrxw.com-----15692-----.shtml", "title": "中国论文热线网", "desc": "中国论文热线网提供职称论文推荐发表、省级刊物、核心刊物、CN、ISSN刊物推荐发表等服务,可以推荐发表多专业职称论文,是您职称评审论文发表的最佳伙伴,网址www.lwrxw.com"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.62355065.cn-----11960-----.shtml", "title": "就要学习网", "desc": "就要学习网是集教案,课件,试卷,毕业论文,教学视频为一体的免费资源网。"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.newlw.com-----25276-----.shtml", "title": "新论文代写网", "desc": "毕业论文|毕业设计|毕业论文范文|计算机毕业设计|毕业论文格式范文|机械毕业设计|行政管理毕业论文|毕业设计开题报告|计算机网络毕业论文|毕业设计论文|毕业论文网|代做毕业设计|怎样写毕业论文"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.love219.com-----14846-----.shtml", "title": "天喜缘婚介网-最好的婚征婚介网站", "desc": "天喜缘婚介婚庆网是济南最专业的婚介网站、婚庆网站,交友网站,及济南征婚、济南交友、济南婚介、济南庆典、济南礼仪于一体,网下有实体店面-济南市市中区天喜缘婚介婚庆中心,不定期举办联谊活动,保证会员成功率"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.ssyg520.com-----27215-----.shtml", "title": "成都盛世阳光婚庆策划有限公司", "desc": "诚信投资控股集团属于四川省大型企业集团,川内排于前20名,注册资金3.5亿元,拥有固定资产46.5亿。公司总部位于成都市致民东路1号。在北京、上海、新疆等地设有分公司。诚信盛世阳光婚庆公司是其子公司。"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.591lover.net-----36999-----.shtml", "title": "情人网", "desc": "情人网交友中心为你提供最佳的网上情人交友机会,足不出户便能让你有更多的选择!"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.free-onlinedating.me-----10110-----.shtml", "title": "国际免费婚介交友网站-相约100", "desc": "国际免费婚介交友网站是相约100提供的完全免费的国际交友网站。会员以华人为主遍布五湖四海,所有会员完全免费。所有寻找国际免费婚介交友网站的朋友都能在国际交友网站在找到完全免费的国际免费婚介交友网站服务"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.ahhqw.com-----18983-----.shtml", "title": "安徽婚庆网", "desc": "安徽婚庆网"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.jyjjyy.com-----19343-----.shtml", "title": "聚缘北海交友网", "desc": "聚缘北海交友网是北海地区较规范的婚恋交友网站,致力于营造有趣而安全的网络交友社区,提供搜索、美文、约会、日记、聊天、等多项交友服务。并与地方婚介部门建立了良好的合作关系。"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.lovemeba.com-----9983-----.shtml", "title": "爱我吧婚恋网", "desc": "爱我吧婚恋网是一个真实、严肃、高品位的婚恋平台,提供科学、高效的全程服务,帮助真心寻找终身伴侣的人士实现和谐婚恋,努力营造国内最专业、严肃的婚恋交友平"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.77lds.com-----37176-----.shtml", "title": "77国际交友网", "desc": "纯公益性,爱心社交网站,为广大青年及单身人士提供的全免费交友平台。"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.dg-hfs.com-----18760-----.shtml", "title": "东莞韩风尚婚纱摄影工作室", "desc": "东莞韩风尚婚纱摄影工作室是具有独特的韩国风格的东莞婚纱摄影工作室,韩风尚位于东莞东城区旗峰路国泰大厦10号,我们永远满怀创意与温情,通过一对一的服务为您提供超越您期望"}
-
{"link": "http://www.dmozdir.org/SiteInformation/?www.lilywed.cn-----9976-----.shtml", "title": "百合婚礼社区", "desc": "百合婚礼社区讨论话题涵盖婚纱照、婚纱摄影、婚礼筹备、婚纱礼服、婚庆等方面"}
0 留言:
發佈留言