2023-02-22

Merge PDF files using CSV file list using Powershell

I want to create multiple merged PDF files from around 1400+ pdf files.

I have a data.csv file with 2 columns as below. The PDF files with filename matching Filename column and data.csv file are in the same folder.

I need to create multiple merged PDF files and each merged PDF will have group of files that have the same First three characters in the filename.

e.g., The filenames starting with EIN* need to be merged into one PDF file in the same sorting order as in the data.csv file. The filename of merged PDF should be Y followed by the first three characters. so in this example it should be YEIN.pdf

This process need to be looped in until all the rows in data.csv are actioned.

sample data.csv file

FilePath Filename

$FilePath1 EINCO01-174.pdf

$FilePath2 EINCO02-174.pdf

$FilePath3 EINCO03-174.pdf

$FilePath4 EINCO04-174.pdf

$FilePath5 EINCL01-174.pdf

$FilePath6 EINCL02-174.pdf

$FilePath7 EINCL03-174.pdf

$FilePath8 EINCL04-174.pdf

$FilePath9 EINCL05-174.pdf

$FilePath10 EINCL06-174.pdf

$FilePath11 EINCL07-174.pdf

$FilePath12 EINCL08-174.pdf

$FilePath13 EINCL09-174.pdf

$FilePath14 EINCL10-174.pdf

$FilePath15 EINCL11-174.pdf

$FilePath16 EINCL12-174.pdf

$FilePath17 EINCL13-174.pdf

$FilePath18 EINCL14-174.pdf

$FilePath19 EINCL15-174.pdf

$FilePath20 EINCL16-174.pdf

$FilePath21 EINCL17-174.pdf

$FilePath22 EINCL18-174.pdf

$FilePath23 EINCL19-174.pdf

$FilePath25 GINLG01-170.pdf

$FilePath26 GINLG02-166.pdf

$FilePath27 GINLG03-159.pdf

$FilePath28 GINLG04-159.pdf

$FilePath29 GINLG05-168.pdf

$FilePath30 GINLG06-152.pdf

$FilePath31 GINNO01-174.pdf

$FilePath32 GINNO02-131.pdf

$FilePath33 GINNO04-150.pdf

$FilePath34 GINNO05-174.pdf

$FilePath35 GINTA01-130.pdf

$FilePath36 GINTA02-139.pdf

$FilePath37 GINTA03-139.pdf

So to tackle this I have created a script to split data.csv file into multiple CSV files grouped by the First three characters as below.

$data = Import-Csv '.\data.csv' | 
Select-Object Filepath,Filename,@{n='Group';e={$_.Filename.Substring(0,3)}}
$data | Format-Table -GroupBy Group
Group-Object {$_.Group}| ForEach-Object {
$_.Group | Export-Csv "$($_.Group).csv" -NoTypeInformation
}   
foreach ($Group in $data | Group Group)
{     
$data | Where-Object {$_.Group -eq $group.name} | 
    ConvertTo-Csv -delimiter "`t" -NoTypeInformation | 
    foreach {$_.Replace('"','')} | 
    Out-File "$($group.name).csv" 
   }`

From here, I have developed another script.ps1 to read a CSV file and create Merged PDF file based on the list of filenames in each CSV file (e.g., EIN.csv) as below. This is working fine.

However, I need to know how to replicate this code for each CSV file in that folder. I don't know how to do it. Any help please.

python.exe .\create_from_template.py template_source_list_files .\EIN.csv > part1script.ps1
python.exe .\create_from_template.py template_sort_merge_destination_files .\EIN.csv > part2script.ps1
Get-Content part1script.ps1, part2script.ps1 | Set-Content execute.ps1
.\execute.ps1

PS: I have installed PSWritePDF module on my machine.



No comments:

Post a Comment