In this session, we will look at basic functions in R that will help us in running some inferential statistics. These will help us to evaluate the relationship between one (or more) predictor(s) and an outcome. We will start with a basic example. Using the dataframe "cars".
# Loading packages
```{r warning=FALSE, message=FALSE, error=FALSE}
## Use the code below to check if you have all required packages installed. If some are not installed already, the code below will install these. If you have all packages installed, then you could load them with the second code.
requiredPackages = c('ggplot2','tidyverse','psycho','lme4','ordinal','lmerTest','PresenceAbsence')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
# library(ggplot2);library(tidyverse);library(psycho);library(lme4);library(ordinal);library(lmerTest)
```
# Starting with a linear model
```{r warning=FALSE, message=FALSE, error=FALSE}
head(cars)
str(cars)
summary(cars)
ggplot.cars1 <- ggplot(cars,aes(x=speed,y=dist))+
geom_point()+theme_bw(base_size = 20)+
labs(y = "Stopping distance (ft)", x = "Speed (mph)")
ggplot.cars1
```
## Basic correlations and plotting the linear relationship
Let us do some basic correlation analysis using base R and package psycho. And let's visualise this correlation using a linear regression line
```{r warning=FALSE, message=FALSE, error=FALSE}
cor(cars)
## correlation using "psycho"
cars %>%
correlation()
ggplot.cars2 <- ggplot(cars,aes(x=speed,y=dist))+
geom_point()+theme_bw(base_size = 20)+
labs(y = "Stopping distance (ft)", x = "Speed (mph)")+
geom_smooth(method = lm, se=F)
ggplot.cars2
```
## Our first statistical model
What about running our first statistical analysis of the data? We will use a Linear regression model. We can visualise the results using "summary". If you are new to linear models and prefer an ANOVA-style analysis, then use the function "ANOVA" on the linear model. We'll come back to the linear model later on.
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lm <- lm(dist~speed, data=cars)
mdl.lm#also print(mdl.lm)
summary(mdl.lm)
anova(mdl.lm)
```
To interpret the model, we need look at the coefficients. The "Intercept" is -17.5791 and the coefficient for "speed" is 3.9324. This tells us that when "speed" = 0, the "distance" is at -17.5791; with one unit increase in "speed", the "distance" increases by 3.9324 (ft). There may be some issues with interpretation of this model and we need to center, standardise scores. We are not going to cover this here, but you can center and/or scale with this formula:
1. Center = scale(x, center = TRUE, scale = FALSE)
2. Standardise = scale(x, center = TRUE, scale = TRUE)
### Dissecting the model
Let us dissect the model. If you use "str", you will be able to see what is available under our linear model. To access some info from the model
### "str" and "coef"
```{r warning=FALSE, message=FALSE, error=FALSE}
str(mdl.lm)
coef(mdl.lm)
## same as
## mdl.lm$coefficients
```
#### "coef" and "coefficients"
What if I want to obtain the "Intercept"? Or the coefficient for distance? What if I want the full row for distance?
```{r warning=FALSE, message=FALSE, error=FALSE}
coef(mdl.lm)[1] # same as mdl.lm$coefficients[1]
coef(mdl.lm)[2] # same as mdl.lm$coefficients[2]
summary(mdl.lm)$coefficients[2, ] # full row
summary(mdl.lm)$coefficients[2, 4] #for p value
```
What about residuals (difference between the observed value and the estimated value of the quantity) and fitted values?
```{r warning=FALSE, message=FALSE, error=FALSE}
residuals(mdl.lm)
fitted(mdl.lm)
```
#### Goodness of fit?
```{r warning=FALSE, message=FALSE, error=FALSE}
AIC(mdl.lm) # Akaike's Information Criterion, lower values are better
BIC(mdl.lm) # Bayesian AIC
logLik(mdl.lm) # log likelihood
```
#### Significance testing
Are the above informative? of course not directly. If we want to test for overall significance of model. We run a null model (aka intercept only) and compare models.
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lm.Null <- lm(dist~1, data=cars)
anova(mdl.lm,mdl.lm.Null)
```
#### Plotting fitted values
Let's plot our fitted values. Any problems with this?
```{r warning=FALSE, message=FALSE, error=FALSE}
ggplot.cars3 <- ggplot(cars,aes(x=speed,y=predict(mdl.lm)))+
geom_point()+theme_bw(base_size = 20)+
labs(y = "Stopping distance (ft)", x = "Speed (mph)")+
geom_smooth(method = lm, se=F)
ggplot.cars3
```
## Some cool correlations
Above, we had two predictors and we checked for correlations. What if we have more predictors and we want to have cool correlation tables and plots? (see here https://www.r-bloggers.com/beautiful-and-powerful-correlation-tables-in-r/amp/)
### With corrections
```{r warning=FALSE, message=FALSE, error=FALSE}
cor <- psycho::affective %>%
correlation()
summary(cor)
plot(cor)
print(cor)
```
### No corrections
What of we don't want to use corrections for multiple comparisons?
```{r warning=FALSE, message=FALSE, error=FALSE}
cor <- psycho::affective %>%
correlation(adjust="none")
summary(cor)
plot(cor)
```
### Issues with no corrections
Let's generate a dataframe with 11 predictors and 1000 rows each. Because these are simulated data, there is no underlying correlation between these.
```{r warning=FALSE, message=FALSE, error=FALSE}
df_with_11_vars <- data.frame(replicate(11, rnorm(1000)))
cor2 <- correlation(df_with_11_vars, adjust="none")
summary(cor2)
```
Deactivate this by using "i_am_cheating"!
```{r warning=FALSE, message=FALSE, error=FALSE}
cor3 <- correlation(df_with_11_vars, adjust="none",i_am_cheating=TRUE)
summary(cor3)
```
### Same model with corrections
```{r warning=FALSE, message=FALSE, error=FALSE}
cor4 <- correlation(df_with_11_vars)
summary(cor4)
```
# From Linear to Logistic models
Here we will look at an example when the outcome is a binary outcome. This simulated data is structured as follows. We asked one participant to listen to 165 sentences, and to judge whether these are "grammatical" or "ungrammatical". There were 105 sentences that were "grammatical" and 60 "ungrammatical". This fictitious example can apply in any other situation. Let's think Geography: 165 lands: 105 "flat" and 60 "non-flat", etc. This applies to any case where you need to "categorise" the outcome into two groups.
## Load and summaries
Let's load in the data and do some basic summaries
```{r warning=FALSE, message=FALSE, error=FALSE}
grammatical <- read.csv("grammatical.csv")
str(grammatical)
head(grammatical)
grammatical$response <- relevel(grammatical$response,ref="yes")
table(grammatical$grammaticality,grammatical$response)
```
## Accuracy and Signal Detection Theory
We are generally interested in performance, i.e., whether the we have "accurately" categorised the outcome or not. Let's do some stats on this
| | Yes | No | Total |
|----------------------------|--------------------|------------------|------------------|
| Grammatical (Yes Actual) | TP = 100 | FN = 5 | (Yes Actual) 105 |
| Ungrammatical (No Actual) | FP = 10 | TN = 50 | (No Actual) 60 |
| Total | (Yes Response) 110 | (No Response) 55 | 165 |
```{r warning=FALSE, message=FALSE, error=FALSE}
## TP = True Positive (Hit); FP = False Positive; FN = False Negative; TN = True Negative
TP = 100
FP = 10
FN = 5
TN = 50
Total = 165
(TP+TN)/Total # accuracy
(FP+FN)/Total # error, also 1-accuracy
# When stimulus = yes, how many times response = yes?
TP/(TP+FN) # also True Positive Rate:
# When stimulus = no, how many times response = yes?
FP/(FP+TN) # False Positive Rate,
# When stimulus = no, how many times response = no?
TN/(FP+TN) # specificity, also 1-sensetivity
# When subject responds "yes" how many times is (s)he correct?
TP/(TP+FP) # precision
# getting dprime (or the sensetivity index); beta (bias criterion, 0-1, lower=increase in "yes"); Aprime (estimate of discriminability, 0-1, 1=good discrimination; 0 at chance); bppd (b prime prime d, -1 to 1; 0 = no bias, negative = tendency to respond "yes", positive = tendency to respond "no"); c (index of bias, equals to SD)
#(see also https://www.r-bloggers.com/compute-signal-detection-theory-indices-with-r/amp/)
psycho::dprime(TP, FP, FN, TN)
```
## GLM
Let's run a first GLM (Generalised Linear Model). A GLM uses a special family "binomial" as it assumes the outcome has a binomial distribution. In general, results from a Logistic Regression are close to what we get from SDT (see above).
To run the results, we will change the reference level for both response and grammaticality.
### Results
The results below show the logodds for our model. The results show that for one unit increase in the response (i.e., from no to yes), the logodds of being "grammatical" is increased by 4.61 (the intercept shows that when the response is "no", the logodds are -1.61).
```{r warning=FALSE, message=FALSE, error=FALSE}
levels(grammatical$response)
levels(grammatical$grammaticality)
grammatical$response <- relevel(grammatical$response, ref="no")
grammatical$grammaticality <- relevel(grammatical$grammaticality, ref="ungrammatical")
mdl.glm <- glm(response~grammaticality,data=grammatical,family = binomial)
summary(mdl.glm)
```
### Getting "true" coefficients from GLM
Here we will run the same GLM model above but by supressing the "Intercept". The idea here is to get the "true" coefficient for a "grammatical" and a "yes" response.
```{r warning=FALSE, message=FALSE, error=FALSE}
# An intercept is always included in any regression, but you can specify it with "1"
## glm(response~1+grammaticality,data=grammatical,family = binomial)
mdl.glm2 <- glm(response~-1+grammaticality,data=grammatical,family = binomial)
summary(mdl.glm2)
```
### Logodds to Odd ratios
Logodds can be modified to talk about the odds of an event. For our model above, the odds of "grammatical" receiving a "yes" response increase by 100; whereas receiving a "no" is a mere 0.2. Using the second model (i.e., without an Intercept) allows us to get the odd rations for each of responses to "Ungrammatical" and "Grammatical"
```{r warning=FALSE, message=FALSE, error=FALSE}
exp(coef(mdl.glm))
exp(coef(mdl.glm2))
```
### LogOdds to proportions
If you want to talk about the percentage "accuracy" of our model, then we can transform our loggodds into proportions. This shows that the proportion of "grammatical" receiving a "yes" response increase by 99%
```{r warning=FALSE, message=FALSE, error=FALSE}
plogis(coef(mdl.glm))
plogis(coef(mdl.glm2))
```
### GLM as a classification tool
The code below demosntrates the links between our GLM model and what we had obtained above from SDT. The predictions' table shows that our GLM was successful at obtaining prediction that are identical to our initial data setup. Look at the table here and the table above. Once we have created our table of outcome, we can compute percent correct, the specificity, the sensetivity, the Kappa score, etc.. this yields the actual value with the SD that is related to variations in responses
```{r}
## predict(mdl.glm)>0.5 is identical to
## predict(glm(response~grammaticality,data=grammatical,family = binomial),type="response")
tbl.glm <- table(grammatical$response,predict(mdl.glm,type="response")>0.5)
tbl.glm
PresenceAbsence::pcc(tbl.glm)
PresenceAbsence::specificity(tbl.glm)
PresenceAbsence::sensitivity(tbl.glm)
PresenceAbsence::Kappa(tbl.glm)
###etc..
```
### Plotting
```{r warning=FALSE, message=FALSE, error=FALSE}
grammatical$prob <- predict(glm(response~grammaticality,data=grammatical,family = binomial),type="response")
ggplot(grammatical, aes(x=as.numeric(grammaticality), y=prob)) +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
se = T) + theme_bw(base_size = 20)+
labs(y = "Probability", x = "")+
coord_cartesian(ylim=c(0,1))+
scale_x_discrete(limits = c("Ungrammatical","Grammatical"))
```
## GLM: Other distributions
If your data does not fit a binomial distribution, and is a multinomial (i.e., three or more response categories) or poisson (count data), then you need to use the glm function with a specific family function.
```{r warning=FALSE, message=FALSE, error=FALSE, echo=FALSE}
## For a multinomial (3 or more response categories), see below and use the following specification
## https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/
## mdl.multi <- nnet::multinom(outcome~predictor, data=data)
## For a poisson (count data), see below and use the following specification
## https://stats.idre.ucla.edu/r/dae/poisson-regression/
## mdl.poisson <- glm(outcome~predictor, data=data,family="poisson")
```
# Cumulative Link Models
These models work perfectly with rating data. Ratings are inherently ordered, 1, 2, ... n, and expect to observe an increase (or decrease) in overall ratings from 1 to n. To demonstrate this, we will use an example using the package "ordinal". Data were from a rating experiment where six participants rated the percept of nasality in the production of particular consonants in Arabic. The data came from nine producing subjects. The ratings were from 1 to 5. This example can apply to any study, e.g., rating grammaticality of sentences, rating how positive the sentiments are in a article, interview responses, etc.
## Importing and pre-processing
We start by importing the data and process it. We change the reference level in the predictor
```{r warning=FALSE, message=FALSE, error=FALSE}
rating <- read.csv("rating.csv")
str(rating)
## we need to convert "Response" to a factor
rating$Response <- as.factor(rating$Response)
rating$Context <- relevel(rating$Context, ref="isolation")
```
## Our first model
We run our first clm model as a simple, i.e., with no random effects
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.clm <- clm(Response~Context, data=rating)
summary(mdl.clm)
```
## Testing significance
We can evaluate whether "Context" improves the model fit, by comparing a null model with our model. Of course "Context" is improving the model fit.
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.clm.Null <- clm(Response~1, data=rating)
anova(mdl.clm,mdl.clm.Null)
```
## Interpreting a cumulative model
As a way to interpret the model, we can look at the coefficients and make sense of the results. A CLM model is a Logistic model with a cumulative effect. The "Coefficients" are the estimates for each level of the fixed effect; the "Threshold coefficients" are those of the response. For the former, a negative coefficient indicates a negative association with the response; and a positive is positively associated with the response. The p values are indicating the significance of each level. For the "Threshold coefficients", we can see the cumulative effects of ratings 1|2, 2|3, 3|4 and 4|5 which indicate an overall increase in ht eratings from 1 to 5.
## Plotting
We use a modified version of a plotting function that allows us to visualise the effects. For this, we use the base R plotting functions
```{r warning=FALSE, message=FALSE, error=FALSE}
par(oma=c(1, 0, 0, 3),mgp=c(2, 1, 0))
xlimNas = c(min(mdl.clm$beta), max(mdl.clm$beta))
ylimNas = c(0,1)
plot(0,0,xlim=xlimNas, ylim=ylimNas, type="n", ylab=expression(Probability), xlab="", xaxt = "n",main="Predicted curves - Nasalisation",cex=2,cex.lab=1.5,cex.main=1.5,cex.axis=1.5)
axis(side = 1, at = c(0,mdl.clm$beta),labels = levels(rating$Context), las=2,cex=2,cex.lab=1.5,cex.axis=1.5)
xsNas = seq(xlimNas[1], xlimNas[2], length.out=100)
lines(xsNas, plogis(mdl.clm$Theta[1] - xsNas), col='black')
lines(xsNas, plogis(mdl.clm$Theta[2] - xsNas)-plogis(mdl.clm$Theta[1] - xsNas), col='red')
lines(xsNas, plogis(mdl.clm$Theta[3] - xsNas)-plogis(mdl.clm$Theta[2] - xsNas), col='green')
lines(xsNas, plogis(mdl.clm$Theta[4] - xsNas)-plogis(mdl.clm$Theta[3] - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clm$Theta[4] - xsNas)), col='blue')
abline(v=c(0,mdl.clm$beta),lty=3)
abline(h=0, lty="dashed")
abline(h=1, lty="dashed")
legend(par('usr')[2], par('usr')[4], bty='n', xpd=NA,lty=1, col=c("black", "red", "green", "orange", "blue"),
legend=c("Oral", "2", "3", "4", "Nasal"),cex=0.75)
```
# From Linear to Mixed models. Why random effects matter
Let's generate a new dataframe that we will use later on for our mixed models
```{r warning=FALSE, message=FALSE, error=FALSE}
set.seed(666)
#we create 6 subjects
subjects <- paste0('S', 1:6)
#here we ad repetitions within speakers
subjects <- rep(subjects, each = 20)
items <- paste0('Item', 1:20)
#below repeates
items <- rep(items, 6)
#below is to generate random numbers that are log values
logFreq <- round(rexp(20)*5,2)
#below we are repeating the logFreq 6 times to fit with the number of speakers and items
logFreq <- rep(logFreq,6)
xdf <- data.frame(subjects, items, logFreq)
#below removes the individual variables we had created because they are already in the dataframe
rm(subjects,items,logFreq)
xdf$Intercept <- 300
submeans <- rep(rnorm(6,sd = 40),20)
#sort make the means for each subject is the same...
submeans <- sort(submeans)
xdf$submeans <- submeans
#we create the same thing for items... we allow the items mean to vary between words...
itsmeans <- rep(rnorm(20,sd=20),6)
xdf$itsmeans <- itsmeans
xdf$error <- rnorm(120,sd=20)
#here we create an effect column,
#here for each logFreq, we have a decrease of -5 of that particular logFreq
xdf$effect <- -5*xdf$logFreq
xdf$dur <- xdf$Intercept +xdf$submeans +xdf$itsmeans +xdf$error + xdf$effect
#below is to subset the data and get only a few columns.. the -c(4:8) removes the columns 4 to 8..
xreal <- xdf[,-c(4:8)]
head(xreal)
rm(xdf,submeans,itsmeans)
```
## Correlations and plots
Let's start by doing a correlation test and plotting the data. Our results show that there is a negative correlation between duration and LogFrequency, and the plot shows this decrease.
```{r warning=FALSE, message=FALSE, error=FALSE}
cor5 <- xreal %>%
correlation()
summary(cor5)
ggplot.xreal <- ggplot(xreal,aes(x=logFreq,y=dur))+
geom_point()+theme_bw(base_size = 20)+
labs(y = "Duration", x = "Frequency (Log)")+
geom_smooth(method = lm, se=F)
ggplot.xreal
```
## Linear model
Let's run a simple linear model on the data. As we can see below, thereare some issues with the "simple" linear model: we had set our SD for subjects to be 40, but this was picked up as 120 (see histogram of residuals). The QQ Plot is not "normal".
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lm.xreal <- lm(dur~logFreq, data=xreal)
summary(mdl.lm.xreal)
hist(residuals(mdl.lm.xreal))
qqnorm(residuals(mdl.lm.xreal)); qqline(residuals(mdl.lm.xreal))
plot(fitted(mdl.lm.xreal),residuals(mdl.lm.xreal), cex = 4)
```
## Linear Mixed Model
Our Linear Mixed effects Model will take into account the random effects we added and also our model specifications. We use a Maximum Likelihood estimate (REML=FALSE) as this is what we will use for model comparison. The Linear Mixed Model is reflecting our model specifications The SD of our subjects is picked up correctly. The model results are "almost" the same as our linear model above. The coefficient for the "Intercept" is at 337.973 and the coefficient for LogFrequency is at -5.460. This indicates that for each unit of increase in the LogFrequency, there is a decrease by 5.460 (ms).
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lmer.xreal <- lmer(dur~logFreq+(1|subjects)+(1|items), data=xreal,REML=FALSE)
summary(mdl.lmer.xreal)
hist(residuals(mdl.lmer.xreal))
qqnorm(residuals(mdl.lmer.xreal)); qqline(residuals(mdl.lmer.xreal))
plot(fitted(mdl.lmer.xreal),residuals(mdl.lmer.xreal), cex = 4)
```
## Our second Mixed model
This second model add a by-subject random slope. Random slopes allow for the variation that exists in the random effects to be taken into account. An intercept only model provides an averaged values to our participants.
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lmer.xreal.2 <- lmer(dur~logFreq+(logFreq|subjects)+(1|items), data=xreal,REML=FALSE)
summary(mdl.lmer.xreal.2)
hist(residuals(mdl.lmer.xreal.2))
qqnorm(residuals(mdl.lmer.xreal.2)); qqline(residuals(mdl.lmer.xreal.2))
plot(fitted(mdl.lmer.xreal.2),residuals(mdl.lmer.xreal.2), cex = 4)
```
## Model comparison
But where are our p values? The lme4 developers decided not to include p values due to various issues with estimating df. What we can do instead is to compare models. We need to create a null model to allow for significance testing. As expected our predictor is significantly contributing to the difference.
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lmer.xreal.Null <- lmer(dur~1+(logFreq|subjects)+(1|items), data=xreal,REML=FALSE)
anova(mdl.lmer.xreal.Null,mdl.lmer.xreal.2)
```
Also, do we really need random slopes? From the result below, we don't seem to need random slopes at all, given that adding random slopes does not improve the model fit. I always recommend testing this. Most of the time I keep random slopes.
```{r warning=FALSE, message=FALSE, error=FALSE}
anova(mdl.lmer.xreal,mdl.lmer.xreal.2)
```
But if you are really (really!!!) obsessed by p values, then you can also use lmerTest. BUT use after comparing models to evaluate contribution of predictors
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lmer.xreal.lmerTest <- lmer(dur~logFreq+(logFreq|subjects)+(1|items), data=xreal,REML=TRUE)
summary(mdl.lmer.xreal.lmerTest)
detach("package:lmerTest", unload=TRUE)
```
## Our final Mixed model
Our final model uses REML (or Restricted Maximum Likelihood Estimate of Variance Component) to estimate the model.
```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lmer.xreal.Full <- lmer(dur~logFreq+(logFreq|subjects)+(1|items), data=xreal,REML=TRUE)
summary(mdl.lmer.xreal.Full)
anova(mdl.lmer.xreal.Full)
hist(residuals(mdl.lmer.xreal.Full))
qqnorm(residuals(mdl.lmer.xreal.Full)); qqline(residuals(mdl.lmer.xreal.Full))
plot(fitted(mdl.lmer.xreal.Full),residuals(mdl.lmer.xreal.Full), cex = 4)
```
## Dissecting the model
```{r warning=FALSE, message=FALSE, error=FALSE}
coef(mdl.lmer.xreal.Full)
fixef(mdl.lmer.xreal.Full)
fixef(mdl.lmer.xreal.Full)[1]
fixef(mdl.lmer.xreal.Full)[2]
coef(mdl.lmer.xreal.Full)$`subjects`[1]
coef(mdl.lmer.xreal.Full)$`subjects`[2]
coef(mdl.lmer.xreal.Full)$`items`[1]
coef(mdl.lmer.xreal.Full)$`items`[2]
```
## Using predictions from our model
In general, I use the prediction from my final model in any plots. To generate this, we can use the following
```{r warning=FALSE, message=FALSE, error=FALSE}
xreal$Pred_Dur <- predict(lmer(dur~logFreq+(logFreq|subjects)+(1|items), data=xreal,REML=TRUE))
ggplot.xreal.Pred <- ggplot(xreal,aes(x=logFreq,y=Pred_Dur))+
geom_point()+theme_bw(base_size = 20)+
labs(y = "Duration", x = "Frequency (Log)",title="Predicted")+
geom_smooth(method = lm, se=F)+coord_cartesian(ylim=c(200,450))
ggplot.xreal.Pred
## original plot
ggplot.xreal.Original <- ggplot(xreal,aes(x=logFreq,y=dur))+
geom_point()+theme_bw(base_size = 20)+
labs(y = "Duration", x = "Frequency (Log)",title="Original")+
geom_smooth(method = lm, se=F)+coord_cartesian(ylim=c(200,450))
ggplot.xreal.Original
```
## GLMM and CLMM
The code above was using a Linear Mixed Effects Modelling. The outcome was a numeric object. In some cases (as we have seen above), we may have:
1. Binary outcome (binomial)
2. Count data (poisson),
3. Multi-category outcome (multinomial)
4. Rating data (cumulative function)
The code below gives you an idea of how to specificy these models
```{r warning=FALSE, message=FALSE, error=FALSE}
## Binomial family
## lme4::glmer(outcome~predictor(s)+(1|subject)+(1|items)..., data=data, family=binomial)
## Poisson family
## lme4::glmer(outcome~predictor(s)+(1|subject)+(1|items)..., data=data, family=poisson)
## Multinomial family
## a bit complicated as tehre is a need to use Bayesian approaches, see e.g.,
## glmmADMB
## mixcat
## MCMCglmm
## see https://gist.github.com/casallas/8263818
## Rating data, use following
## ordinal::clmm(outcome~predictor(s)+(1|subject)+(1|items)..., data=data)
## Remember to test for random effects and whether slopes are needed.
```
# Running your code and automatically drawing quality tables for publications
See this website for how to run and automatically draw tables for your publications. I haven't tested this but it is in development:
7) https://www.r-bloggers.com/elegant-regression-results-tables-and-plots-in-r-the-finalfit-package/amp