ANSI C语言标准向scanf函数增加了一种新特性,叫做扫描集。利用此特性可以解决一些处理文本时的棘手问题。
最开始源于我对CSV(逗号分隔值)文件的处理。(本文不考虑CSV文件中引号和转义字符等其他特性)
例如下面这段代码
char a[100],b[100]; scanf("%s,%s",a,b);
如果用它读入下面的数据:
Alice,Bob
会直接把“Alice,Bob”赋值给a而继续等待输入b。
我在网上查了这种问题的解决方法,较为简便的一种就是利用扫描集。
上面的问题可以用这段代码解决:
char a[100],b[100]; scanf("%[^,],%s",a,b);
其中格式化字符串”%[^,],%s”中的”%[^,]”表示匹配除了逗号之外的其他字符,直到遇到逗号;”,”表示读入一个逗号;”%s”表示读入剩下的部分直到遇到空白符。
这种表示格式和正则表达式有些相似之处。
下面是从这里复制的一段ANSI C标准:
[
Matches a nonempty sequence of characters from a set of expected
characters (the scanset ). The corresponding argument shall be a
pointer to the initial character of an array large enough to accept
the sequence and a terminating null character, which will be added
automatically. The conversion specifier includes all subsequent
characters in the format string, up to and including the matching
right bracket ( ] ). The characters between the brackets (the
scanlist ) comprise the scanset, unless the character after the left
bracket is a circumflex ( ^ ), in which case the scanset contains all
characters that do not appear in the scanlist between the circumflex
and the right bracket. As a special case, if the conversion specifier
begins with [] or [^] , the right bracket character is in the scanlist
and the next right bracket character is the matching right bracket
that ends the specification. If a – character is in the scanlist and
is not the first, nor the second where the first character is a ^ ,
nor the last character, the behavior is implementation-defined.
这段话大意是说,在格式字符串中,百分号后紧跟着的一对[]中的所有字符称为扫描集,匹配一个字符串,字符串中的所有字符都可以在扫描集中找到。如果扫描集中第一个字符是^,那么就取反义,表示字符串不包含^和]之间的字符。
下面举几个例子:
语句 | 输入 | 结果 |
---|---|---|
scanf(“%[123]”,a); | 11223344 | a=”112233″ |
scanf(“%[^123]”,a); | 4321 | a=”4″ |
scanf(“%[^d]”,a); | abcdef | a=”abc” |
scanf(“%[0-9]%[a-z]%s”,a,b,c); | 456ab789 | a=”456″ b=”ab” c=”789″ |
scanf(“%[^\n]”,a); | hello world | a=”hello world” |
scanf(“%[^@]@%[^.].%s”,a,b,c); | someone@example.com | a=”someone” b=”example” c=”com” |
scanf(“%[^(](%*[^)])%s”,a,b); | abc(def)ghi | a=”abc” b=”ghi” |
这种方法在fscanf函数和sscanf函数中都可以使用。
扫描集在处理文本时(尤其是根据特征字符分割文本)和判断输入的字符是否有效时(例如要求输入一个整数时)是一种特别简单实用的方式。