Issue
This Content is from Stack Overflow. Question asked by harshit
file content
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="61"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="62"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="63"/>
<layer/>
I am trying to replace e.g. pageno=”62″ with pageno=”65″ and also subsequent page nos i.e. 63->64, 64->65.
I am using bash to do this. The file is very big about 930 pages so sed is slow, is there any fast way to do this?
My script
total=$(grep pageno= "$1" | tail -n1 | cut -d'"' -f4)
from="${2}"
to="${3}"
for i in $(eval "echo {${from}..${total}}")
do
sed -i "s#pageno="${i}"#pageno="${to}_new"#g" "${1}"
((to += 1))
done
_new
will prevent two occurences of same page no, I will delete it later on.
Solution
Assumptions:
- all data is nicely formatted as in OP’s example (otherwise OP may want to look at a tool specifically designed for processing HTML/XML formatted fields)
- there is at most one instance of
pageno="####"
on any line of input awk
is an acceptable solution
One awk
idea:
awk -v pgno=62 '
sub("pageno=\"" pgno "\"","pageno=\"" pgno+1 "\"") { pgno++ } # attempt replacement on current line and if successful then increment pgno for the next search-n-replace
1 # print current line
' pages.dat
This generates:
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="61"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="63"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="64"/>
<layer/>
This should be relatively fast since it requires just a single OS level call (awk
) and requires a single pass through the input file.
If the results look good and you’re using GNU awk
you can use the -i inplace
option to update the file in place …
awk -i inplace -v pgno=62 '
sub("pageno=\"" pgno "\"","pageno=\"" pgno+1 "\"") { pgno++ }
1
' pages.dat
… otherwise you can write the output to a temp file and then rename/mv accordingly.
This Question was asked in StackOverflow by harshit and Answered by markp-fuso It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.