r之在 R 中使用正则表达式拆分时忽略字符串的一部分
现男友
阅读:223
2025-06-02 22:19:02
评论:0
我正在尝试在某些特定点(破折号、-)拆分 R 中的字符串(使用 strsplit),但是如果破折号位于方括号([)。
例子:
xx <- c("Radio Stations-Listened to Past Week-Toronto [FM-CFXJ-93.5 (93.5 The Move)]","Total Internet-Time Spent Online-Past 7 Days")
xx
[1] "Radio Stations-Listened to Past Week-Toronto [FM-CFXJ-93.5 (93.5 The Move)]"
[2] "Total Internet-Time Spent Online-Past 7 Days"
应该给我这样的东西:
list(c("Radio Stations","Listened to Past Week","Toronto [FM-CFXJ-93.5 (93.5 The Move)]"), c("Total Internet","Time Spent Online","Past 7 Days"))
[[1]]
[1] "Radio Stations" "Listened to Past Week"
[3] "Toronto [FM-CFXJ-93.5 (93.5 The Move)]"
[[2]]
[1] "Total Internet" "Time Spent Online" "Past 7 Days"
有没有办法用正则表达式来做到这一点?破折号的位置和数量在向量的每个元素内变化,并且并不总是有括号。但是,当有括号时,它们总是在末尾。
我尝试了不同的方法,但都不起作用:
## Trying to match "-" before "[" in Perl
strsplit(xx, split = "-(?=\\[)", perl=T)
# does nothing
## trying to first extract what follow "[" then splitting what is preceding that
temp <- strsplit(xx, "[", fixed = T)
temp <- lapply(temp, function(yy) substr(head(yy, -1),"-"))
# doesn't work as there are some elements with no brackets...
如有任何帮助,我们将不胜感激。
请您参考如下方法:
基于:Regex for matching a character, but not when it's enclosed in square bracket
您可以使用:
strsplit(xx, "-(?![^\\[]*\\])", perl = TRUE)
[[1]]
[1] "Radio Stations" "Listened to Past Week"
[3] "Toronto [FM-CFXJ-93.5 (93.5 The Move)]"
[[2]]
[1] "Total Internet" "Time Spent Online" "Past 7 Days"
声明
1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。



