r之在 R 中使用正则表达式拆分时忽略字符串的一部分

现男友 阅读:223 2025-06-02 22:19:02 评论:0

我正在尝试在某些特定点(破折号、-)拆分 R 中的字符串(使用 strsplit),但是如果破折号位于方括号([)。

例子:

xx <- c("Radio Stations-Listened to Past Week-Toronto [FM-CFXJ-93.5 (93.5 The Move)]","Total Internet-Time Spent Online-Past 7 Days") 
xx 
  [1] "Radio Stations-Listened to Past Week-Toronto [FM-CFXJ-93.5 (93.5 The Move)]" 
  [2] "Total Internet-Time Spent Online-Past 7 Days"  

应该给我这样的东西:

list(c("Radio Stations","Listened to Past Week","Toronto [FM-CFXJ-93.5 (93.5 The Move)]"), c("Total Internet","Time Spent Online","Past 7 Days")) 
  [[1]] 
  [1] "Radio Stations"                         "Listened to Past Week"                  
  [3] "Toronto [FM-CFXJ-93.5 (93.5 The Move)]" 
 
  [[2]] 
  [1] "Total Internet"    "Time Spent Online" "Past 7 Days"   

有没有办法用正则表达式来做到这一点?破折号的位置和数量在向量的每个元素内变化,并且并不总是有括号。但是,当有括号时,它们总是在末尾。

我尝试了不同的方法,但都不起作用:

## Trying to match "-" before "[" in Perl 
strsplit(xx, split = "-(?=\\[)", perl=T) 
# does nothing 
 
## trying to first extract what follow "[" then splitting what is preceding that 
temp <- strsplit(xx, "[", fixed = T) 
temp <- lapply(temp, function(yy) substr(head(yy, -1),"-")) 
# doesn't work as there are some elements with no brackets... 

如有任何帮助,我们将不胜感激。

请您参考如下方法:

基于:Regex for matching a character, but not when it's enclosed in square bracket

您可以使用:

strsplit(xx, "-(?![^\\[]*\\])", perl = TRUE) 
[[1]] 
[1] "Radio Stations"                         "Listened to Past Week"                  
[3] "Toronto [FM-CFXJ-93.5 (93.5 The Move)]" 
 
[[2]] 
[1] "Total Internet"    "Time Spent Online" "Past 7 Days"  


声明

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

关注我们

一个IT知识分享的公众号