使用 wget
下載數據後發現文件名全帶了鏈接的 query 符號:
$ ls
download?fn=%2FPCAWG%2Fclinical_and_histology%2Fpcawg_donor_clinical_August2016_v9.xlsx
download?fn=%2FPCAWG%2Fclinical_and_histology%2Fpcawg_donor_subtype_cohort_list.xlsx
download?fn=%2FPCAWG%2Fclinical_and_histology%2Fpcawg_specimen_histology_August2016_v9.xlsx
download?fn=%2FPCAWG%2Fconsensus_cnv%2Fconsensus.20170119.somatic.cna.annotated.tar.gz
download?fn=%2FPCAWG%2Fconsensus_cnv%2Fconsensus.20170119.somatic.cna.icgc.public.tar.gz
download?fn=%2FPCAWG%2Fconsensus_cnv%2Fconsensus.20170119.somatic.cna.tcga.public.tar.gz
download?fn=%2FPCAWG%2Fconsensus_cnv%2Fconsensus.20170217.purity.ploidy.txt.gz
download?fn=%2FPCAWG%2Fconsensus_snv_indel%2Ffinal_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz
download?fn=%2FPCAWG%2Fconsensus_sv%2Ffinal_consensus_sv_bedpe_passonly.icgc.public.tgz
download?fn=%2FPCAWG%2Fconsensus_sv%2Ffinal_consensus_sv_bedpe_passonly.tcga.public.tgz
所以最好把前面的內容去掉,sed
可以使用模式匹配進行文本修改,而 mv
可以重命名文件,我們結合兩者試試。首先用單個文件測試修改方式是否正確:
$ echo download?fn=%2FPCAWG%2Fclinical_and_histology%2Fpcawg_specimen_histology_August2016_v9.xlsx | sed -E 's/.*%2(.*)/\1/'
Fpcawg_specimen_histology_August2016_v9.xlsx
然後檢測下目錄下的所有文件都可以這樣處理:
$ ls | sed -E 's/.*%2(.*)/\1/'
Fpcawg_donor_clinical_August2016_v9.xlsx
Fpcawg_donor_subtype_cohort_list.xlsx
Fpcawg_specimen_histology_August2016_v9.xlsx
Fconsensus.20170119.somatic.cna.annotated.tar.gz
Fconsensus.20170119.somatic.cna.icgc.public.tar.gz
Fconsensus.20170119.somatic.cna.tcga.public.tar.gz
Fconsensus.20170217.purity.ploidy.txt.gz
Ffinal_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz
Ffinal_consensus_sv_bedpe_passonly.icgc.public.tgz
Ffinal_consensus_sv_bedpe_passonly.tcga.public.tgz
實際改名字需要使用 for
循環進行迭代:
$ for f in `ls`; do echo `echo $f | sed -E 's/.*%2(.*)/\1/'`; done
Fpcawg_donor_clinical_August2016_v9.xlsx
Fpcawg_donor_subtype_cohort_list.xlsx
Fpcawg_specimen_histology_August2016_v9.xlsx
Fconsensus.20170119.somatic.cna.annotated.tar.gz
Fconsensus.20170119.somatic.cna.icgc.public.tar.gz
Fconsensus.20170119.somatic.cna.tcga.public.tar.gz
Fconsensus.20170217.purity.ploidy.txt.gz
Ffinal_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz
Ffinal_consensus_sv_bedpe_passonly.icgc.public.tgz
Ffinal_consensus_sv_bedpe_passonly.tcga.public.tgz
上面在實際調用 mv
之前檢測一下這樣操作不會有問題,然後修改爲實際要重命名的操作。
$ for f in `ls`; do mv $f `echo $f | sed -E 's/.*%2(.*)/\1/'`; done
$ ls
Fconsensus.20170119.somatic.cna.annotated.tar.gz Ffinal_consensus_sv_bedpe_passonly.icgc.public.tgz
Fconsensus.20170119.somatic.cna.icgc.public.tar.gz Ffinal_consensus_sv_bedpe_passonly.tcga.public.tgz
Fconsensus.20170119.somatic.cna.tcga.public.tar.gz Fpcawg_donor_clinical_August2016_v9.xlsx
Fconsensus.20170217.purity.ploidy.txt.gz Fpcawg_donor_subtype_cohort_list.xlsx
Ffinal_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz Fpcawg_specimen_histology_August2016_v9.xlsx
最後本文可以抽象出來的一個通用操作是:
for f in `ls`; do <cmd> `echo $f | sed -E <operation>; done
該模板可以應用於其他想要進行先修改文件名然後運行命名的操作。