以OMI传感器的L3级Gridded数据为例(全球尺度,0.25 x 0.25度分辨率,文件格式为he5(HDF-EOS5)),首先用gdalinfo命令打开一个文件查看信息,如下图:
这里可以看到OMI L3 Gridded数据有4个子数据集,这里将提取第四个数据(ColumnAmountNO2TropCloudScreened)集进行处理。在gdalinfo里查找到目标子数据集的相关信息,如下图:
其中可以看到MissingValue和FillValue的值,需要进行处理,设置为NA;还可以看到offset值和scalefactor分别为0和1,说明图像没有位移和缩放;最后可以看到单位为摩尔每平方厘米,可以换算成克每平方米,这样小数点位数不会太多。HDF格式的图像文件比如MODIS的都可以这样处理,有需要的小伙伴可以拿去用。
代码如下:
"""
部分代码引用自Python GDAL/OGR Cookbook 1.0 documentation
新建一个geotiff单波段图像,长宽为he5数据的列数和行数(columns & rows)
GeoTransform坐标是从左上角开始到右下角结束
设置新的图像的spatial reference为地理坐标系(EPSG4326)
具体信息可以在“https://www.spatialreference.org/ref/epsg/”查询
"""
import os
import gdal
import osr
import numpy as np
def array2raster(newRasterfn, rasterOrigin, pixelWidth, pixelHeight, array):
cols = array.shape[1] # obtain cols
rows = array.shape[0] # obtain rows
originX = rasterOrigin[0] # upper left corner X
originY = rasterOrigin[1] # upper left corner Y
format = 'GTiff'
driver = gdal.GetDriverByName(format)
# create a single band raster
outRaster = driver.Create(newRasterfn, cols, rows, 1, gdal.GDT_Float32)
# set GeoTransform parameters
outRaster.SetGeoTransform((originX, pixelWidth, 0, originY, 0, pixelHeight))
# read band 1
outband = outRaster.GetRasterBand(1)
outband.WriteArray(array)
# EPSG4326
outRasterSRS = osr.SpatialReference()
outRasterSRS.ImportFromEPSG(4326)
outRaster.SetProjection(outRasterSRS.ExportToWkt())
outband.FlushCache()
def main(newRasterfn, rasterOrigin, pixelWidth, pixelHeight, array):
reversed_arr = array[::-1]
array2raster(newRasterfn, rasterOrigin, pixelWidth, pixelHeight, reversed_arr)
# find .he5 files and process
in_dir = r'G:\DATA\OMINO2_L3\he5' # input dir
out_dir = r'G:\DATA\OMINO2_L3\tif' # output dir
file_list = os.listdir(in_dir)
for file in file_list:
if file.endswith('.he5'):
print('Processing >>> ' + file)
src_ds = gdal.Open(os.path.join(in_dir, file))
# open sub dataset
sub_ds = src_ds.GetSubDatasets()
# # print some info
# print('The number of sub-datasets is : {}'.format(len(sub_ds)))
# for sd in sub_ds:
# print('Name: {0}\nDescription:{1}\n'.format(*sd))
no2_ds = gdal.Open(sub_ds[3][0]).ReadAsArray() # NO2 tropcloudscreened = 4th
# so2_ds = gdal.Open(sub_ds[1][0]).ReadAsArray() # SO2 = 2nd
# date cleaning
# set Filling/Missing Value (-1.2676506e+30) to NaN
data = no2_ds[:]
data[data > 2e+16] = np.nan
data[data < 0] = np.nan
# molecules/cm^2 to grams/m^2
# data = data * (1 / (6022 * 10**20)) * 46 * 10**7
if __name__ == '__main__':
# keep date in output files
fn = os.path.splitext(file)[0][19:28]
fn = fn.replace('m', '_')
newRasterfn = os.path.join(out_dir, fn + '.tif')
# define upper left corner and pixel size
rasterOrigin = (-180, 90)
x_size = 0.25
y_size = -0.25
print('Writing ... ' + newRasterfn)
main(newRasterfn, rasterOrigin, x_size, y_size, data)
else:
print("No '.he5' file found ...")
print('... ... ... ... COMPLETED ... ... ... ...')