我有一个Python脚本,它将读取一些JSON文件,然后将它们导入到MongoDB中。
我希望它只插入1个月或更少的Published密钥的记录。
我现在的代码是: -
import json import logging import logging.handlers import os import pymongo from pymongo import MongoClient def import_json(mongo_server,mongo_port, vuln_folder): try: logging.info('Connecting to MongoDB') client = MongoClient(mongo_server, mongo_port) db = client['vuln_sets'] coll = db['vulnerabilities'] logging.info('Connected to MongoDB') basepath = os.path.dirname(__file__) filepath = os.path.abspath(os.path.join(basepath, "..")) archive_filepath = filepath + vuln_folder filedir = os.chdir(archive_filepath) file_count = 0 for item in os.listdir(filedir): if item.endswith('.json'): file_name = os.path.abspath(item) with open(item, 'r') as currentfile: vuln_counter = 0 duplicate_count = 0 logging.info('Currently processing ' + item) file_count +=1 json_data = currentfile.read() vuln_content = json.loads(json_data) for vuln in vuln_content: try: del vuln['_type'] new_vuln = {key: vuln[key] for key in vuln if key != '_source'} new_vuln.update(vuln['_source']) coll.insert(new_vuln, continue_on_error=True) vuln_counter +=1 except pymongo.errors.DuplicateKeyError: duplicate_count +=1 logging.info('Added ' + str(vuln_counter) + ' vulnerabilities for ' + item) logging.info('Found ' + str(duplicate_count) + ' duplicate records!') os.remove(file_name) logging.info('Processed ' + str(file_count) + ' files') except Exception as e: logging.exception(e)我想我可以做一个IF语句(伪代码!):
filter_vuln = if vuln.published = datetime.now -1: coll.insert(filter_vuln)我猜测它会删除任何不符合该模式的记录?
JSON看起来像这样:
[ { "_index": "bulletins", "_type": "bulletin", "_id": "OPENWRT-SA-000001", "_score": null, "lastseen": "2016-09-26T15:45:23", "references": "affectedPackage": [ { "OS": "OpenWrt", "OSVersion": "15.05", "packageVersion": "9.9.8-P3-1", "packageFilename": "UNKNOWN", "arch": "all", "packageName": "bind", "operator": "lt" } ], "edition": 1, "description": "Some Description", "reporter": "OpenWrt Project", "published": "2016-01-24T13:33:41", "modified": "2016-01-24T13:33:41", },为简洁起见,有些数据已从上述JSON中删除,因为实际记录相当长,而且这是较短的一个!
I have a Python script that will read through some JSON files and then import these to MongoDB.
I want it to only insert records that have the Published key 1 month or less.
My current code is:-
import json import logging import logging.handlers import os import pymongo from pymongo import MongoClient def import_json(mongo_server,mongo_port, vuln_folder): try: logging.info('Connecting to MongoDB') client = MongoClient(mongo_server, mongo_port) db = client['vuln_sets'] coll = db['vulnerabilities'] logging.info('Connected to MongoDB') basepath = os.path.dirname(__file__) filepath = os.path.abspath(os.path.join(basepath, "..")) archive_filepath = filepath + vuln_folder filedir = os.chdir(archive_filepath) file_count = 0 for item in os.listdir(filedir): if item.endswith('.json'): file_name = os.path.abspath(item) with open(item, 'r') as currentfile: vuln_counter = 0 duplicate_count = 0 logging.info('Currently processing ' + item) file_count +=1 json_data = currentfile.read() vuln_content = json.loads(json_data) for vuln in vuln_content: try: del vuln['_type'] new_vuln = {key: vuln[key] for key in vuln if key != '_source'} new_vuln.update(vuln['_source']) coll.insert(new_vuln, continue_on_error=True) vuln_counter +=1 except pymongo.errors.DuplicateKeyError: duplicate_count +=1 logging.info('Added ' + str(vuln_counter) + ' vulnerabilities for ' + item) logging.info('Found ' + str(duplicate_count) + ' duplicate records!') os.remove(file_name) logging.info('Processed ' + str(file_count) + ' files') except Exception as e: logging.exception(e)I am thinking that I could do either an IF statement (Pseudo code!):
filter_vuln = if vuln.published = datetime.now -1: coll.insert(filter_vuln)Which I am guessing it would drop any records not matching that pattern?
The JSON looks like this:
[ { "_index": "bulletins", "_type": "bulletin", "_id": "OPENWRT-SA-000001", "_score": null, "lastseen": "2016-09-26T15:45:23", "references": "affectedPackage": [ { "OS": "OpenWrt", "OSVersion": "15.05", "packageVersion": "9.9.8-P3-1", "packageFilename": "UNKNOWN", "arch": "all", "packageName": "bind", "operator": "lt" } ], "edition": 1, "description": "Some Description", "reporter": "OpenWrt Project", "published": "2016-01-24T13:33:41", "modified": "2016-01-24T13:33:41", },Some data has been removed from the above JSON for brevity as the actual record is quite long, and this is one of the shorter ones!
最满意答案
我猜,当你在上个月说你的意思是最近30天时,你会需要timedelta这个例子。
从datetime导入timedelta,datetime
today = datetime.now()
lastmonth =今天 - timedelta(天数= 30)
tests = ['2017-11-21','2017-10-20']
在测试中的日期:
if date >= str(lastmonth): print(date) else: pass其结果是:2017-11-21
这只是一个关于如何按日期过滤的例子
I'm guessing that when you say within the last month you mean the last 30 days, you would need timedelta for this example.
from datetime import timedelta, datetime
today = datetime.now()
lastmonth = today - timedelta(days=30)
tests = ['2017-11-21','2017-10-20']
for date in tests:
if date >= str(lastmonth): print(date) else: passThe result is : 2017-11-21
That's just an example on how to filter by date
更多推荐
发布评论