[SOLVED] Trying to replace a set of nos in a file with another set

Issue

This Content is from Stack Overflow. Question asked by harshit

file content

<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="61"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="62"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="63"/>
<layer/>

I am trying to replace e.g. pageno=”62″ with pageno=”65″ and also subsequent page nos i.e. 63->64, 64->65.
I am using bash to do this. The file is very big about 930 pages so sed is slow, is there any fast way to do this?

My script

total=$(grep pageno= "$1" | tail -n1 | cut -d'"' -f4)

from="${2}"
to="${3}"

for i in $(eval "echo {${from}..${total}}")
do
    sed -i "s#pageno="${i}"#pageno="${to}_new"#g" "${1}"
    ((to += 1))
done

_new will prevent two occurences of same page no, I will delete it later on.



Solution

Assumptions:

  • all data is nicely formatted as in OP’s example (otherwise OP may want to look at a tool specifically designed for processing HTML/XML formatted fields)
  • there is at most one instance of pageno="####" on any line of input
  • awk is an acceptable solution

One awk idea:

awk -v pgno=62 '
sub("pageno=\"" pgno "\"","pageno=\"" pgno+1 "\"") { pgno++ }  # attempt replacement on current line and if successful then increment pgno for the next search-n-replace
1                                                              # print current line
' pages.dat

This generates:

<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="61"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="63"/>
<layer/>
</page>
<page width="595.28000000" height="841.89000000">
<background type="pdf" pageno="64"/>
<layer/>

This should be relatively fast since it requires just a single OS level call (awk) and requires a single pass through the input file.

If the results look good and you’re using GNU awk you can use the -i inplace option to update the file in place …

awk -i inplace -v pgno=62 '
sub("pageno=\"" pgno "\"","pageno=\"" pgno+1 "\"") { pgno++ }
1
' pages.dat

… otherwise you can write the output to a temp file and then rename/mv accordingly.


This Question was asked in StackOverflow by harshit and Answered by markp-fuso It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?