Removing newline characters

Meng Lu, 2013-7-6

Suppose you want to remove newlines in between the Chinese characters:

南海少年遊俠客，    
詩成嘯傲凌滄州，  
曾因酒醉鞭名馬，
生怕情深累美人。

-- note that the 1st and 2nd Chinese comma ， actually have two or more white spaces following them -- and change it to a single line

南海少年遊俠客，詩成嘯傲凌滄州，曾因酒醉鞭名馬，生怕情深累美人。

One way to do this is using Emacs.

Use `query-replace-regexp`

Press M-x, and type query-replace-regexp, or as a shortcut C-M-%;

Type regexp to match:

\([[:nonascii:\]]\) *
 *\([[:nonascii:\]]\)

Note the line break in the regexp need to be typed into the Emacs minibuffer with C-q C-j.

Type regexp to substitute:

\1\2

This means the white space character(s) (if any) and newline character between non-ASCII characters will be removed in the substituted version, so the result is the character on the first line followed by that on the second line.

Use `fill-paragraph`

Set fill-column variable, which controls how wide a line of text can go before line-wrapping to a very large value for the current buffer: C-x f, 10000000
Highlight the paragraph you'd like to modify: move cursor to the beginning, hold Shift down and move up and down arrow to extend and decrease the selection;
Press M-x, and type fill-paragraph.

This should remove all newline characters in the text. Interestingly, if there are multiple white space characters at the end of lines before the new line character, it will keep one of them:

南海少年遊俠客， 詩成嘯傲凌滄州， 曾因酒醉鞭名馬，生怕情深累美人。

Note there is an additional white space after the 1st and the 2nd ，.

The single white space character is actually still redundant, that can be corrected by

M-x query-replace-regexp
， *
，

Comments on this page are closed.

←	Apr 2024					→
S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

blog	11
computing	10
note	8
programming	3
tip	3
Java	2
bash	2
china	2
chinese	2
data	2
emacs	2
git	2
journal	2
linguistics	2
mathematica	2
mathematics	2
mercurial	2
news	2
revision control	2
statistics	2

Use query-replace-regexp

Use fill-paragraph

Use `query-replace-regexp`

Use `fill-paragraph`