Would be helpful to port the automated zip file extraction code from the previous version of this repo:
def extract_files_from_zipped_files(init_directory, extract_to_path, extension='.pdf'):
"""
Function to extract .pdf files from zipped files
:param init_directory: initial top-level directory to walk through
:type init_directory: str
:param extract_to_path: directory to extract pdfs into
:type extract_to_path: str
:param extension: file extension of file type to extract, set to None to extract all files
:type extension: str or None
"""
for dirName, subdirList, fileList in walk(init_directory): # iterate through files and all sub-directories
for fileName in fileList:
if splitext(fileName)[1].lower == '.zip':
zip_file_path = join(dirName, fileName)
with zipfile.ZipFile(zip_file_path, 'r') as z:
for file_name in z.namelist():
if not isdir(file_name) and (extension is None or splitext(file_name)[1].lower == extension):
temp_path = join(extract_to_path)
z.extract(file_name, path=temp_path)
Would be helpful to port the automated zip file extraction code from the previous version of this repo:
https://github.com/cutright/IMRT-QA-Data-Miner/blob/85abf9dc66a139c02574c386377f46f0944c5893/IQDM/utilities.py#L190-L208