2022-07-19

Delete lines shorter than a certain length and the one above it (remove short sequences in a FASTA file)

I have a file containing the following text:

>seq1
GAAAT
>seq2
CATCTCGGGA
>seq3
GAC
>seq4
ATTCCGTGCC

If a line that doesn't start with ">" is shorter than 5 characters, I want to delete it and the one right above it.

Expected output:

>seq2
CATCTCGGGA
>seq4
ATTCCGTGCC

I have tried sed -r '/^.{,5}$/d', but it also deletes the lines with ">".



No comments:

Post a Comment