Issue
This Content is from Stack Overflow. Question asked by Mladen
I’ve got two massive files with millions of lines.
In the first file1 one of the lines is
Oz5,z!F,k”H,#$5,#%J,$&L,m’F,o(H,6X),c*7
and in the 2nd file2 there are many lines containing the above one, e.g.,
Oz5,z!F,k”H,#$5,#%J,$&L,m’F,o(H,6X),c*7.X5t,&&***b,ccc
I want to search for the lines from file1 in file2 and I face two problems:
- search itself clashes with special characters in any shell (sh,bash,csh,…)
!F,k”H,#$5,#%J,$: event not found
I also tried egrep, awk, ack, … – same result.
How can I go around that? The aforementioned nature of the strings to be searched does not allow me to treat them in any obvious way. E.g., I do not see how I can possibly substitute something for say “!”; because if I introduce “!” that would clash with “!” which is also a string in file1,2. Note that all prinatable ASCII characters in all combinations appear in file1 and file2.
What I would apparently need is a shell (perhaps a virtual one) which has no special characters. Is there such a Unix shell?
- how to take line by line from file1 in order to search for them in file2 and extract them from file2 into file3?
Solution
I solved the problem in the following way.
All shells and search engines in them as well as most editors (like vi, vim) have special characters built in. But not Emacs.
I used Emacs macro as follows:
Split the Emacs window into 3 sub-windows one atop of another. Put file1 in the top one, file2 in the middle, and the ouput one (file3) in the bottom one. Start macro "C^x (" with the cursor at the begging of file1. Copy the line. Go to the beginning of the next line. Go to file2: C^x o. Search for the copied line. Copy the first found line containing the line from file1. Go to the beggining of file2. Go to file3. Paste the line from file2. Go to the next line. Go to file1. Close the macro "C^x )". Repeat the macro as many times as there are remaining lines (say n) in file1: M^n C^x e . (M=Esc).
This Question was asked in StackOverflow by Mladen and Answered by Mladen It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.