Python JSON按日期键过滤并写入新的JSON文件(Python JSON filter by date key and write to new JSON file)

我有一个Python脚本,它将读取一些JSON文件,然后将它们导入到MongoDB中。

我希望它只插入1个月或更少的Published密钥的记录。

我现在的代码是: -

import json import logging import logging.handlers import os import pymongo from pymongo import MongoClient def import_json(mongo_server,mongo_port, vuln_folder): try: logging.info('Connecting to MongoDB') client = MongoClient(mongo_server, mongo_port) db = client['vuln_sets'] coll = db['vulnerabilities'] logging.info('Connected to MongoDB') basepath = os.path.dirname(__file__) filepath = os.path.abspath(os.path.join(basepath, "..")) archive_filepath = filepath + vuln_folder filedir = os.chdir(archive_filepath) file_count = 0 for item in os.listdir(filedir): if item.endswith('.json'): file_name = os.path.abspath(item) with open(item, 'r') as currentfile: vuln_counter = 0 duplicate_count = 0 logging.info('Currently processing ' + item) file_count +=1 json_data = currentfile.read() vuln_content = json.loads(json_data) for vuln in vuln_content: try: del vuln['_type'] new_vuln = {key: vuln[key] for key in vuln if key != '_source'} new_vuln.update(vuln['_source']) coll.insert(new_vuln, continue_on_error=True) vuln_counter +=1 except pymongo.errors.DuplicateKeyError: duplicate_count +=1 logging.info('Added ' + str(vuln_counter) + ' vulnerabilities for ' + item) logging.info('Found ' + str(duplicate_count) + ' duplicate records!') os.remove(file_name) logging.info('Processed ' + str(file_count) + ' files') except Exception as e: logging.exception(e)

我想我可以做一个IF语句(伪代码!):

filter_vuln = if vuln.published = datetime.now -1: coll.insert(filter_vuln)

我猜测它会删除任何不符合该模式的记录?

JSON看起来像这样:

[ { "_index": "bulletins", "_type": "bulletin", "_id": "OPENWRT-SA-000001", "_score": null, "lastseen": "2016-09-26T15:45:23", "references": "affectedPackage": [ { "OS": "OpenWrt", "OSVersion": "15.05", "packageVersion": "9.9.8-P3-1", "packageFilename": "UNKNOWN", "arch": "all", "packageName": "bind", "operator": "lt" } ], "edition": 1, "description": "Some Description", "reporter": "OpenWrt Project", "published": "2016-01-24T13:33:41", "modified": "2016-01-24T13:33:41", },

为简洁起见,有些数据已从上述JSON中删除,因为实际记录相当长,而且这是较短的一个!

I have a Python script that will read through some JSON files and then import these to MongoDB.

I want it to only insert records that have the Published key 1 month or less.

My current code is:-

import json import logging import logging.handlers import os import pymongo from pymongo import MongoClient def import_json(mongo_server,mongo_port, vuln_folder): try: logging.info('Connecting to MongoDB') client = MongoClient(mongo_server, mongo_port) db = client['vuln_sets'] coll = db['vulnerabilities'] logging.info('Connected to MongoDB') basepath = os.path.dirname(__file__) filepath = os.path.abspath(os.path.join(basepath, "..")) archive_filepath = filepath + vuln_folder filedir = os.chdir(archive_filepath) file_count = 0 for item in os.listdir(filedir): if item.endswith('.json'): file_name = os.path.abspath(item) with open(item, 'r') as currentfile: vuln_counter = 0 duplicate_count = 0 logging.info('Currently processing ' + item) file_count +=1 json_data = currentfile.read() vuln_content = json.loads(json_data) for vuln in vuln_content: try: del vuln['_type'] new_vuln = {key: vuln[key] for key in vuln if key != '_source'} new_vuln.update(vuln['_source']) coll.insert(new_vuln, continue_on_error=True) vuln_counter +=1 except pymongo.errors.DuplicateKeyError: duplicate_count +=1 logging.info('Added ' + str(vuln_counter) + ' vulnerabilities for ' + item) logging.info('Found ' + str(duplicate_count) + ' duplicate records!') os.remove(file_name) logging.info('Processed ' + str(file_count) + ' files') except Exception as e: logging.exception(e)

I am thinking that I could do either an IF statement (Pseudo code!):

filter_vuln = if vuln.published = datetime.now -1: coll.insert(filter_vuln)

Which I am guessing it would drop any records not matching that pattern?

The JSON looks like this:

[ { "_index": "bulletins", "_type": "bulletin", "_id": "OPENWRT-SA-000001", "_score": null, "lastseen": "2016-09-26T15:45:23", "references": "affectedPackage": [ { "OS": "OpenWrt", "OSVersion": "15.05", "packageVersion": "9.9.8-P3-1", "packageFilename": "UNKNOWN", "arch": "all", "packageName": "bind", "operator": "lt" } ], "edition": 1, "description": "Some Description", "reporter": "OpenWrt Project", "published": "2016-01-24T13:33:41", "modified": "2016-01-24T13:33:41", },

Some data has been removed from the above JSON for brevity as the actual record is quite long, and this is one of the shorter ones!

最满意答案

我猜,当你在上个月说你的意思是最近30天时,你会需要timedelta这个例子。

从datetime导入timedelta,datetime

today = datetime.now()

lastmonth =今天 - timedelta(天数= 30)

tests = ['2017-11-21','2017-10-20']

在测试中的日期:

if date >= str(lastmonth): print(date) else: pass

其结果是:2017-11-21

这只是一个关于如何按日期过滤的例子

I'm guessing that when you say within the last month you mean the last 30 days, you would need timedelta for this example.

from datetime import timedelta, datetime

today = datetime.now()

lastmonth = today - timedelta(days=30)

tests = ['2017-11-21','2017-10-20']

for date in tests:

if date >= str(lastmonth): print(date) else: pass

The result is : 2017-11-21

That's just an example on how to filter by date

更多推荐