Maja Ramljak mramljak at edu.uwaterloo.ca
Tue Mar 26 00:30:35 EDT 2019

Here’s an example of some of the columns within my dataframe. imgSeqRoot refers to the morphing sequence presented in the middle of the screen. In this example, the participant saw the Anchor-Hat morph sequence.

-imgSeqNum refers to the iteration of the morphing sequence (i.e., imgSeqNum = 1 is the picture of the Anchor, imgSeqNum = 15 is the picture of the Hat).

-Transition is a boolean that is True when the participant has made a switch in their classification of the image in the center screen. For this example, the participant said that the center image was an Anchor up until the 7th iteration, when they classified it as a Hat.

-Outers.Type is the condition dictating the type of options participants clicked on when classifying the center image (either words or images). Here the pt was clicking on other images that included an image of an anchor and an image of a hat.

userID imgSeqRoot imgSeqNum transition Outers.Type
3 AnHa 1 FALSE Image
3 AnHa 2 FALSE Image
3 AnHa 3 FALSE Image
3 AnHa 4 FALSE Image
3 AnHa 5 FALSE Image
3 AnHa 6 FALSE Image
3 AnHa 7 TRUE Image
3 AnHa 8 FALSE Image

My question relates to grouping these different variables/columns. I’ve looked into grouping and think I’ve gotten a basic example to work. For example, the line:

MeanClickRT <- dfall %>% group_by(Outers.Type) %>% summarize(AvgClickRT = mean(Click.RT))

would give me a two-column table that looks like:

Outers.Type AvgClickRt
image 2.03
word 2.47

where it gives me the average click response time across all participants and across all blocks, separated by image and word condition.

What I would like is a table that looks something like:

Outers.Type Root AvgNumTransition
image AnHa 7
image BrCa 8
image BbSi 6
(etc)——    
word AnHa 9
word BrCa 8
word BbSi 7
(etc)——    

I’m trying to identify the point at which the transition occurs (i.e., transition is TRUE) by looking at the corresponding imgSeqNum. I’d like to find the average of the transition occurrences via imgSeqNums across all participants for each root. In other words, by looking at the table above, I should be able to see that in the image condition, on average, participants made a transition on the 7th iteration of the AnHa morph sequence. I’m stuck on this and any ideas/suggestions would be appreciated!

Thanks,

Maja


Peter Anthony Victor Diberardino pavdiberardino at edu.uwaterloo.ca
Tue Mar 26 10:38:58 EDT 2019

It seems like this problem has two steps.

  1. Get all the rows where transition = TRUE. This can be done with dfall[transition==TRUE]

  2. Now you can use the same logic you used for Average Click Rt, but use group_by with multiple columns. Something like:

MeanNumTransition <- dfall[transition==TRUE] %>% group_by(Outers.Type, Root) %>% summarize(AveNumTransition = mean(imSeqNum))

Hopefully that works. And be sure to make dfall a data.table.

Peter


Britt Anderson britt at uwaterloo.ca
Tue Mar 26 10:49:24 EDT 2019

My question relates to grouping these different variables/columns. I’ve looked into grouping and think I’ve gotten a basic example to work. For example, the line:

If you push your git commits back to the repo we would have access to dfall, and we could test our comments on your data to make sure they work as expected.

MeanClickRT <- dfall %>% group_by(Outers.Type) %>% summarize(AvgClickRT = mean(Click.RT))

I believe that group_by() will allow you to add more than one variable name. You could try: dfall %>% filter(transition == TRUE) %>% group_by(Outers.Type,Root) %>% summarise(mean(imgSeqNum))

The idea here is to first filter (Hanbin is your “filter” expert) your data.frame so that you only have the “true” transition rows. Then you group by outer.type first, and by root second. Look at the table that results to see if you are getting an organization that makes sense. If you are, then the summary should produce an outpult like you want.

Again, this is untested, because I don’t have access to the data.frame.


Maja Ramljak mramljak at edu.uwaterloo.ca
Tue Mar 26 15:08:05 EDT 2019

Both solutions worked equally well - thanks for the feedback. I’ll work on getting my commits to the repo updated so everyone has access to dfall.

Question related to Peter’s comment - why would I need to make dfall a data.table before running that line of code? dfall is currently a data.frame and worked the same with your line and Dr. Anderson’s line of code.

Thanks,

Maja


Peter Anthony Victor Diberardino pavdiberardino at edu.uwaterloo.ca
Tue Mar 26 20:54:50 EDT 2019

When I try ‘filtering’ using the syntax dfall[transition == TRUE] on a data.frame it does not work. This syntax should only work on a data.table.

Keep in mind that data.table ‘inherits’ from data.frame. Equivalently:

  • All data.tables are also data.frames, but not all data.frames are data.tables
  • Any solution that works for a data.frame will also work for a data.table, not not all solutions for data.tables will work for data.frames
  • data.tables are data.frames with extra features

So your dfall is certainly a data.frame, but I think it should also be a data.table if my suggestion worked for you.

I personally prefer working with data.tables over plain data.frames for the reasons above. Sometimes data.table syntax can be confusing, but once you get familiar with it you can get some cleaner and faster solutions.

For example, a data.table could filter out your true transition entries using either

dfall %>% filter(transition == TRUE)

OR

dfall[transition == TRUE]

whereas you’d be stuck with just dfall %>% filter(transition == TRUE) of the two solutions if using plain data.frames

Peter