You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
203 lines
14 KiB
203 lines
14 KiB
2020-09-15 11:26:10 [scrapy.extensions.telnet] INFO: Telnet Password: 423034b8342a486e
|
|
2020-09-15 11:26:10 [scrapy.middleware] INFO: Enabled extensions:
|
|
['scrapy.extensions.corestats.CoreStats',
|
|
'scrapy.extensions.telnet.TelnetConsole',
|
|
'scrapy.extensions.logstats.LogStats']
|
|
2020-09-15 11:26:11 [scrapy.middleware] INFO: Enabled downloader middlewares:
|
|
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
|
|
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
|
|
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
|
|
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
|
|
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
|
|
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
|
|
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
|
|
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
|
|
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
|
|
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
|
|
'scrapy.downloadermiddlewares.stats.DownloaderStats']
|
|
2020-09-15 11:26:11 [scrapy.middleware] INFO: Enabled spider middlewares:
|
|
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
|
|
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
|
|
'scrapy.spidermiddlewares.referer.RefererMiddleware',
|
|
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
|
|
'scrapy.spidermiddlewares.depth.DepthMiddleware']
|
|
2020-09-15 11:26:11 [scrapy.middleware] INFO: Enabled item pipelines:
|
|
['demo1.pipelines.ziranweiyuanhuiPipline']
|
|
2020-09-15 11:26:11 [scrapy.core.engine] INFO: Spider opened
|
|
2020-09-15 11:26:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
|
|
2020-09-15 11:26:11 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
|
|
2020-09-15 11:26:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://kjj.taiyuan.gov.cn/zfxxgk/gggs/index.shtml> (referer: None)
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/09/07/1008391.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/09/04/1008199.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/08/21/1004590.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/08/13/1001630.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/08/08/999926.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/07/31/997727.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/07/17/993580.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/06/23/988275.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/06/22/988019.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/06/19/987592.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/06/15/986244.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/06/15/986238.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/06/15/986237.shtml
|
|
2020-09-15 11:26:11 [root] INFO: 这个链接已经爬过了-----:http://kjj.taiyuan.gov.cn/doc/2020/06/15/986236.shtml
|
|
2020-09-15 11:26:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://kjj.taiyuan.gov.cn/doc/2020/09/12/1010113.shtml> (referer: http://kjj.taiyuan.gov.cn/zfxxgk/gggs/index.shtml)
|
|
2020-09-15 11:26:12 [scrapy.core.scraper] DEBUG: Scraped from <200 http://kjj.taiyuan.gov.cn/doc/2020/09/12/1010113.shtml>
|
|
|
|
{'biaoti': '关于征求太原市地方标准《科技成果评价规范(征求意见稿)》意见的通知',
|
|
'laiyuan': '太原市科学技术局',
|
|
'lianjie': 'http://kjj.taiyuan.gov.cn/doc/2020/09/12/1010113.shtml',
|
|
'shijian': '2020-09-12',
|
|
'wenjian': [{'file_name': '1.科技成果评价规范(征求意见稿).doc',
|
|
'file_url': 'http://kjj.taiyuan.gov.cn/uploadfiles/202009/12/2020091222053429459132.doc',
|
|
'new_file': '/2020/09/Yys4ES6z_2020091222053429459132.doc'},
|
|
{'file_name': '2.地方标准征求意见反馈表.doc',
|
|
'file_url': 'http://kjj.taiyuan.gov.cn/uploadfiles/202009/12/2020091221401014098186.doc',
|
|
'new_file': '/2020/09/ucvansUw_2020091221401014098186.doc'}],
|
|
'xiangqing': '<div id="Zoom"> \n'
|
|
' <!--<$[CONTENT]>start-->\n'
|
|
' <!--<p style="text-align:center;"><img src="" '
|
|
'/></p>-->\n'
|
|
' <p></p><p align="justify" style="text-align: justify; '
|
|
'line-height: 200%; text-indent: 0pt; -ms-text-autospace: '
|
|
'ideograph-numeric; -ms-text-justify: inter-ideograph;"><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">各相关单位</span></span><span style="font-size: 11pt;"><span '
|
|
'style="font-family: 宋体;">和个人</span></span><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">:</span></span></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;"><span style="font-size: '
|
|
'11pt;"><span style="font-family: '
|
|
'宋体;">根据国家《地方标准管理办法》要求,现就太原市科学技术局提出,太原技术转移促进中心、山西产业互联网研究院、山西省大众科技评估中心起草的地方标准《科技成果评价规范(征求意见稿)》,向社会公开征求意见,请各有关单位及个人提出意见,并填写《征求意见反馈表》,于2020年10月11日前反馈至市科技局计划处。</span></span></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;"><span style="font-size: '
|
|
'11pt;"><span style="font-family: 宋体;">联 系 人:</span></span><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">张晓军</span></span></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;"><span style="font-size: '
|
|
'11pt;"><span style="font-family: 宋体;">联系电话:</span></span><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">4223750</span></span></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;"><span style="font-size: '
|
|
'11pt;"><span style="font-family: 宋体;">电子邮箱:</span></span><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">cxfz701</span></span><span style="font-size: 11pt;"><span '
|
|
'style="font-family: 宋体;">@1</span></span><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">63</span></span><span style="font-size: 11pt;"><span '
|
|
'style="font-family: 宋体;">.com</span></span></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;">\xa0</p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;"><span style="font-size: '
|
|
'11pt;"><span style="font-family: '
|
|
'宋体;">附</span></span>\xa0\xa0\xa0\xa0<span style="font-size: '
|
|
'11pt;"><span style="font-family: 宋体;">件:</span></span></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;"><a '
|
|
'href="https://www.sxwikionline.com/gateway/enterprise/file/download/know?path=/home/enterprise/staticrec/policy/2020/09/Yys4ES6z_2020091222053429459132.doc" '
|
|
'target="_blank" '
|
|
'title="1.科技成果评价规范(征求意见稿).doc">1.科技成果评价规范(征求意见稿).doc</a></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 22pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;"><a '
|
|
'href="https://www.sxwikionline.com/gateway/enterprise/file/download/know?path=/home/enterprise/staticrec/policy/2020/09/ucvansUw_2020091221401014098186.doc" '
|
|
'target="_blank" '
|
|
'title="2.地方标准征求意见反馈表.doc">2.地方标准征求意见反馈表.doc</a></p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 0pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;">\xa0</p>\n'
|
|
'\n'
|
|
'<p align="justify" style="text-align: justify; line-height: '
|
|
'200%; text-indent: 0pt; -ms-text-autospace: ideograph-numeric; '
|
|
'-ms-text-justify: inter-ideograph;">\xa0</p>\n'
|
|
'\n'
|
|
'<p align="right" style="text-align: right; line-height: 200%; '
|
|
'text-indent: 0pt; -ms-text-autospace: ideograph-numeric;"><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">太原市科学技术局</span></span></p>\n'
|
|
'\n'
|
|
'<p align="right" style="text-align: right; line-height: 200%; '
|
|
'text-indent: 0pt; -ms-text-autospace: ideograph-numeric;"><span '
|
|
'style="font-size: 11pt;"><span style="font-family: '
|
|
'宋体;">2020年9月12日</span></span></p>\n'
|
|
'\n'
|
|
' <!--<$[CONTENT]>end--> \n'
|
|
' </div>'}
|
|
2020-09-15 11:26:12 [scrapy.core.engine] INFO: Closing spider (finished)
|
|
2020-09-15 11:26:12 [root] INFO: 爬虫运行完毕了
|
|
2020-09-15 11:26:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
|
|
{'downloader/request_bytes': 555,
|
|
'downloader/request_count': 2,
|
|
'downloader/request_method_count/GET': 2,
|
|
'downloader/response_bytes': 33217,
|
|
'downloader/response_count': 2,
|
|
'downloader/response_status_count/200': 2,
|
|
'elapsed_time_seconds': 1.491522,
|
|
'finish_reason': 'finished',
|
|
'finish_time': datetime.datetime(2020, 9, 15, 3, 26, 12, 594548),
|
|
'item_scraped_count': 1,
|
|
'log_count/DEBUG': 3,
|
|
'log_count/INFO': 25,
|
|
'request_depth_max': 1,
|
|
'response_received_count': 2,
|
|
'scheduler/dequeued': 2,
|
|
'scheduler/dequeued/memory': 2,
|
|
'scheduler/enqueued': 2,
|
|
'scheduler/enqueued/memory': 2,
|
|
'start_time': datetime.datetime(2020, 9, 15, 3, 26, 11, 103026)}
|
|
2020-09-15 11:26:12 [scrapy.core.engine] INFO: Spider closed (finished)
|
|
2020-09-16 08:47:17 [scrapy.extensions.telnet] INFO: Telnet Password: d2a8a3ac7c4697ab
|
|
2020-09-16 08:47:17 [scrapy.middleware] INFO: Enabled extensions:
|
|
['scrapy.extensions.corestats.CoreStats',
|
|
'scrapy.extensions.telnet.TelnetConsole',
|
|
'scrapy.extensions.logstats.LogStats']
|
|
2020-09-16 08:47:17 [scrapy.middleware] INFO: Enabled downloader middlewares:
|
|
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
|
|
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
|
|
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
|
|
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
|
|
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
|
|
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
|
|
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
|
|
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
|
|
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
|
|
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
|
|
'scrapy.downloadermiddlewares.stats.DownloaderStats']
|
|
2020-09-16 08:47:17 [scrapy.middleware] INFO: Enabled spider middlewares:
|
|
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
|
|
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
|
|
'scrapy.spidermiddlewares.referer.RefererMiddleware',
|
|
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
|
|
'scrapy.spidermiddlewares.depth.DepthMiddleware']
|
|
2020-09-16 08:47:17 [scrapy.middleware] INFO: Enabled item pipelines:
|
|
['demo1.pipelines.ziranweiyuanhuiPipline']
|
|
2020-09-16 08:47:17 [scrapy.core.engine] INFO: Spider opened
|
|
2020-09-16 08:47:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
|
|
2020-09-16 08:47:17 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6037
|
|
2020-09-16 08:47:17 [scrapy.crawler] INFO: Overridden settings:
|
|
{'BOT_NAME': 'demo1',
|
|
'DOWNLOAD_DELAY': 1,
|
|
'LOG_FILE': 'logs/taiyuangongyehexinxihuaju_2020_9.log',
|
|
'NEWSPIDER_MODULE': 'demo1.spiders',
|
|
'RETRY_HTTP_CODES': [500, 502, 503, 504, 400, 403, 404, 408, 302],
|
|
'RETRY_TIMES': True,
'SPIDER_MODULES': ['demo1.spiders']}
|
|
|