Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onefile_to_egor fragile to column ordering #15

Open
mbojan opened this issue Dec 6, 2017 · 1 comment
Open

onefile_to_egor fragile to column ordering #15

mbojan opened this issue Dec 6, 2017 · 1 comment

Comments

@mbojan
Copy link
Collaborator

mbojan commented Dec 6, 2017

onefile_to_egor seems to be very fragile to the column order in the input data frame. Consider the following input data frame which is two rows from GSS 2004:

dd <- read.csv(textConnection(  '"id","numgiven","sex","sex1","sex2","sex3","sex4","sex5","close12","close13","close14","close15","close23","close24","close25","close34","close35","close45"
10,6,1,2,2,1,2,2,3,1,2,2,2,2,2,1,1,1
36,6,2,2,2,2,1,1,2,2,2,1,2,2,2,1,2,2'
), as.is=TRUE)

I can create egor object from that with

e <- with(
  dd,
  onefile_to_egor(
    egos = dd,
    pmin(numgiven,5), 
    ID.vars=list(ego="id"), 
    attr.start.col="sex1",
    attr.end.col="sex5",
    max.alters=5,
    aa.first.var="close12", 
    aa.regex="^(?<attr>[[:alpha:]]+)(?<src>[[:digit:]])(?<tgt>[[:digit:]])$"
  )
)

but it will fail if I add a variable at the end:

# add a variable at the end
dd$foobar <- 1

e <- with(
  dd,
  onefile_to_egor(
    egos = dd,
    pmin(numgiven,5), 
    ID.vars=list(ego="id"), 
    attr.start.col="sex1",
    attr.end.col="sex5",
    max.alters=5,
    aa.first.var="close12", 
    aa.regex="^(?<attr>[[:alpha:]]+)(?<src>[[:digit:]])(?<tgt>[[:digit:]])$"
  )
)

which gives

Error in `[.data.frame`(e.wide, -nm) : object 'nm' not found

It seems that the functions is interpreting the new foobar column as giving information on alter-alter ties.

I'm not yet sure what's the best way to address it, probably one of:

  • Write in the documentation that the input data frame has to have variables and blocks of variables in specific order (e.g. alter-alter variables last)
  • Modify the function so that the user will need to specify variable names for all alter and alter-alter variable block explicitly. E.g. something like f( aattrs=list(sex = "sex[0-9]"), aaties = list(close = "close[0-9]{2}") ) ....
  • Give up on trying to support every possible variation in which ppl may store egocentric data and provide constructor functions only for, say, (1) long format and (2) separate dataframes...
@krivit
Copy link
Collaborator

krivit commented Dec 7, 2017

Maybe just have aa.start.col and aa.end.col arguments, with aa.end.col defaulting to the last column in the data frame?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants