问题描述
目前AWS S3上面删除过期的文件可以使用AWS的生命周期管理,可是这个生命周期管理有个缺陷是只能够管理创建生命周期规则之后上传的文件,对哪些创建规则之前的文件无法进行管理,为了能够删除S3上面的过期文件,需要通过脚本来进行解决
问题解决
脚本如下所示:
#!/usr/bin/env python
#_*_ coding:utf8 _*_
import boto3
import time
import datetime
import pytz
import sys
import json
import yaml
from botocore.exceptions import ClientError
from dateutil.tz import tzutc
from datetime import timedelta
def parse_config_file(config_file):
local_path = config_file
with open(local_path) as stream:
config = json.dumps(yaml.load(stream,Loader=yaml.FullLoader))
return config
class Clean_S3_File():
def __init__(self, str):
config = json.loads(str)
self.aws_access_key_id = config['aws']['aws_access_key_id']
self.aws_secret_access_key = config['aws']['aws_secret_access_key']
self.aws_default_region = config['aws']['aws_default_region']
self.s3_client = boto3.client('s3', self.aws_default_region, aws_access_key_id=self.aws_access_key_id, aws_secret_access_key=self.aws_secret_access_key)
def Find_Expired_File(self, bucket_name, prefix, expired_day):
expired_files=[]
now_time = pytz.utc.localize(datetime.datetime.utcnow())
paginator = self.s3_client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=bucket_name, Prefix=prefix)
for page in pages:
for item in page['Contents']:
if now_time - item['LastModified'] > timedelta(days=int(expired_day)) and item['Size'] > 0:
expired_files.append(item['Key'])
return expired_files
def Del_Expired_File(self, bucket_name, delete_files):
for item in delete_files:
self.s3_client.delete_object(Bucket=bucket_name,Key=item)
if __name__ == '__main__':
bucket_name = sys.argv[1] #Get bucket name
prefix = sys.argv[2] #Get search path
expired_day = sys.argv[3] #Get expired day
config = parse_config_file('/usr/local/config/application.yml')
obj = Clean_S3_File(config)
delete_files = obj.Find_Expired_File(bucket_name, prefix, expired_day)
obj.Del_Expired_File(bucket_name, delete_files)
脚本介绍:
1)该脚本使用python语言开发,兼容python3.6.4版本,需要安装的依赖为:pytz, boto3, json,yaml等
2)该脚本需要传输三个参数:
第一个参数:存储桶的名字
第二个参数:搜索的路径/文件的前缀
第三个参数:日期,表示的是删除多少天之前的文件
3)该脚本需要配置文件,路径默认为“/usr/local/config/application.yml”
配置文件的格式为:
aws:
aws_access_key_id: *****
aws_secret_access_key: *****
aws_default_region: *****