Max or No Max? – Andy Field

There’s been a recent spat between the heavy metal bands Sepultura and Soulfly. For those unaware of the history, 50% of Sepulture used to be the Cavalera brothers (Max and Igor) until Max (the frontman and guitarist) left the band in 1996 and formed Soulfly. The full story is here. There’s a lot of bad blood even 20 years later, and according to a recent story on metal sucks, Soulfly’s manager (and Max’s wife) Gloria Cavalier recently posted a fairly pointed post on her Facebook page. This got picked up by my favourite podcast (the metal sucks podcast). What has this got to do with me, or statistics? Well, one of the presenters of the metal sucks podcasts asked me this over Twitter:

After a very brief comment about needing to operationalise ‘better’, I decided that rather than reading book proofs I’d do a tongue in cheek analysis of what is better max or no max and here it is.

First we need to operationalise ‘better’. I have done this by accepting subjective opinion as determining ‘better’ and specifically ratings of albums on amazon.com (although I am English metal sucks is US based, so I thought I’d pander to them and take ratings from the US site). Our questions then becomes ‘is max or no max rated higher by the sorts of people who leave reviews on Amazon’. We have operationalised our questions and turned it into a scientific statement, which we can test with data. [There are all sorts of problems with using these ratings, not least of which is that they tend to be positively biased, and they likely reflect a certain type of person who reviews, often reviews reflect things other than the music (e.g., arrived quickly 5*), and so on … but fuck it, this is not serious science, just a bit of a laugh.]

You can generate the data using this R code (data correct as of today):

Code

library(tidyverse)

create_tibble <- function(name, year, data){
  tibble::tibble(
    Album = rep(name, length(data)),
    Year = rep(year, length(data)),
    Rating = data
  )
}

morbid <- create_tibble("Morbid", "1986", c(rep(1,2), rep(2, 4), rep(3, 3), rep(4, 8), rep(5, 36)))
schizo <- create_tibble("Schizo", "1987",  c(rep(1,1), rep(2, 2), rep(3, 4), rep(4, 10), rep(5, 33)))
remains <- create_tibble("Remains", "1989",  c(2, rep(3, 5), rep(4, 9), rep(5, 104)))
arise <- create_tibble("Arise", "1991", c(rep(2, 2), rep(4, 16), rep(5, 89)))
chaos <- create_tibble("Chaos", "1993", c(rep(1,4), rep(2, 2), rep(3, 9), rep(4, 20), rep(5, 120)))
roots <- create_tibble("Roots", "1996", c(rep(1,9), rep(2, 8), rep(3, 17), rep(4, 24), rep(5, 94)))
against <- create_tibble("Against", "1998", c(rep(1,16), rep(2, 14), rep(3, 11), rep(4, 20), rep(5, 32)))
nation <- create_tibble("Nation", "2001", c(rep(1,3), rep(2, 7), rep(3, 6), rep(4, 22), rep(5, 19)))
roorback <- create_tibble("Roorback", "2003", c(rep(1,6), rep(2, 6), rep(3, 5), rep(4, 13), rep(5, 20)))
dante <- create_tibble("Dante", "2006", c(rep(1,1), rep(2, 3), rep(3, 4), rep(4, 8), rep(5, 30)))
alex <- create_tibble("Alex", "2009", c(rep(1,1), rep(2, 1), rep(3, 3), rep(4, 6), rep(5, 18)))
kairos <- create_tibble("Kairos", "2011", c(rep(1,3), rep(2, 2), rep(3, 2), rep(4, 6), rep(5, 33)))
mediator <- create_tibble("Mediator", "2013", c(rep(1,0), rep(2, 3), rep(3, 4), rep(4, 6), rep(5, 21)))
  
sepultura <- rbind(morbid, schizo, remains, arise, chaos, roots, against, nation, dante, alex, kairos, mediator) |> 
  dplyr::mutate(
    Band = ifelse(as.numeric(Year) < 1998, "Sepultura Max", "Sepultura No Max"),
  )

  
  soulfly <- create_tibble("Soulfly", "1998", c(rep(1,8), rep(2, 9), rep(3, 4), rep(4, 16), rep(5, 89)))
  primitive <- create_tibble("Primitive", "2000", c(rep(1,11), rep(2, 5), rep(3, 5), rep(4, 19), rep(5, 53)))
  three <- create_tibble("Three", "2002",c(rep(1,1), rep(2, 10), rep(3, 12), rep(4, 7), rep(5, 19)))
  prophecy <- create_tibble("Prophecy", "2004",c(rep(1,2), rep(2, 5), rep(3, 5), rep(4, 25), rep(5, 42)))
  darkages <- create_tibble("Dark Ages", "2005",c(rep(1,1), rep(2, 1), rep(3, 5), rep(4, 18), rep(5, 36)))
  conquer <- create_tibble("Conquer", "2008",c(rep(1,1), rep(2, 0), rep(3, 5), rep(4, 5), rep(5, 31)))
  omen <- create_tibble("Omen", "2010",c(rep(1,0), rep(2, 2), rep(3, 1), rep(4, 6), rep(5, 17)))
  enslaved <- create_tibble("Enslaved", "2012",c(rep(1,1), rep(2,1), rep(3, 4), rep(4, 2), rep(5, 30)))
  savages <- create_tibble("Savages", "2013",c(rep(1,0), rep(2, 2), rep(3, 3), rep(4, 10), rep(5, 27)))
  archangel <- create_tibble("Archangel", "2015",c(rep(1,3), rep(2, 2), rep(3, 4), rep(4, 7), rep(5, 21)))


  soulfly <- rbind(soulfly, primitive, three, prophecy, darkages, conquer, omen, enslaved, savages, archangel) |> 
    dplyr::mutate(
      Band = "Soulfly"
  )
  
maxvsnomax <- dplyr::bind_rows(sepultura, soulfly) |> 
    dplyr::mutate(
      Band = forcats::as_factor(Band),
      Album = forcats::as_factor(Album),
    )

# create data that excludes data fro Max's period in sepultura
postmax <- maxvsnomax |> 
  filter(Band != "Sepultura Max")

Post Sepultura: Max or No Max

The first question is whether post-max Sepultura or Soulfly are rated higher. Let’s create some plots.

Code

ggplot(postmax, aes(x = Rating, fill = Band, colour = Band)) +
  geom_histogram(binwidth = 1, alpha = 0.7) +
  facet_wrap(~Band) +
  theme_minimal()

Figure 1: Histograms of all ratings for Soulfly and (Non-Max Era) Sepultura

Code

ggplot(postmax, aes(x = Year, y = Rating, colour = Band)) +
  stat_summary(fun.data = "mean_cl_normal", geom = "pointrange") +
  stat_summary(fun = "mean", geom = "line", aes(group = Band)) +
  coord_cartesian(ylim = c(1, 5)) +
  theme_minimal()

Figure 2: Mean ratings of Soulfly and (Non-Max Era) Sepultura by year of album release

Figure 1 shows that the data are hideously skewed with people tending to give positive reviews and 4-5* ratings. @fig_means shows the mean ratings by year post Max’s departure (note they released albums in different years so the dots are out of synch, but it’s a useful timeline). @fig_means seems to suggest that after the first couple of albums, both bands are rated fairly similarly: the Soulfly line is higher but error bars overlap a lot for all but the first albums.

There are a lot of ways you could look at these data. The first thing is the skew. That messes up estimates of confidence intervals and significance tests … but our sample is likely big enough that we can rely on the central limit theorem to do its magic and let us assume that the sampling distribution is normal (beautifully explained in my new book!)

I’m going to fit three models. The first is an intercept only model (a baseline with no predictors), the second allows intercepts to vary across albums (which allows ratings to vary by album, which seems like a sensible thing to do because albums will vary in quality) the third predicts ratings from the band (Sepultura vs Soulfly).

Code

library(nlme)
max_mod <- gls(Rating ~ 1, data = postmax, method = "ML")
max_ri <- lme(Rating ~ 1, random = ~1|Album, data = postmax, method = "ML")
max_band <- update(max_ri, .~. + Band)
anova(max_mod, max_ri, max_band)

         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
max_mod      1  2 2889.412 2899.013 -1442.706                        
max_ri       2  3 2853.747 2868.148 -1423.873 1 vs 2 37.66536  <.0001
max_band     3  4 2854.309 2873.510 -1423.155 2 vs 3  1.43806  0.2305

By comparing models we can see that album ratings varied very significantly (not surprising), the p-value is < .0001, but that band did not significantly predict ratings overall (p = .231). If you like you can look at the summary of the model

Code

summary(max_band)

Linear mixed-effects model fit by maximum likelihood
  Data: postmax 
       AIC     BIC    logLik
  2854.309 2873.51 -1423.154

Random effects:
 Formula: ~1 | Album
        (Intercept) Residual
StdDev:   0.2705842 1.166457

Fixed effects:  Rating ~ Band 
               Value Std.Error  DF   t-value p-value
(Intercept) 4.078740 0.1311196 882 31.107015  0.0000
BandSoulfly 0.204237 0.1650047  14  1.237765  0.2362
 Correlation: 
            (Intr)
BandSoulfly -0.795

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.9717684 -0.3367275  0.4998698  0.6230082  1.2686186 

Number of Observations: 898
Number of Groups: 16

The difference in ratings between Sepultura and Soulfly was b = 0.20. Ratings for soulfully were higher, but not significantly so (if we allow ratings to vary over albums, if you take that random effect out you’ll get a very different picture because that variability will go into the fixed effect of ‘band’.).

Max or No Max

Just because this isn’t fun enough, we could also just look at whether either Sepultura (post 1996) or Soulfly can compete with the Max-era-Sepultura heyday. I’m going to fit three models but this time including the early Sepultura albums (with max). The models are the same as before except that the fixed effect of band now has three levels: Sepultura Max, Sepultura no-Max and Soulfly:

Code

max_mod <- gls(Rating ~ 1, data = maxvsnomax, method = "ML")
max_ri <- lme(Rating ~ 1, random = ~1|Album, data = maxvsnomax, method = "ML")
max_band <- update(max_ri, .~. + Band)
anova(max_mod, max_ri, max_band)

         Model df      AIC      BIC    logLik   Test   L.Ratio p-value
max_mod      1  2 4686.930 4697.601 -2341.465                         
max_ri       2  3 4583.966 4599.973 -2288.983 1 vs 2 104.96454  <.0001
max_band     3  5 4581.436 4608.114 -2285.718 2 vs 3   6.52947  0.0382

Album ratings varied very significantly (not surprising), the p-value is < .0001, and the band did significantly predict ratings overall (p = .038). If you like you can look at the summary of the model

Code

summary(max_band)

Linear mixed-effects model fit by maximum likelihood
  Data: maxvsnomax 
       AIC      BIC    logLik
  4581.436 4608.114 -2285.718

Random effects:
 Formula: ~1 | Album
        (Intercept) Residual
StdDev:     0.25458 1.062036

Fixed effects:  Rating ~ Band 
                         Value Std.Error   DF  t-value p-value
(Intercept)           4.545918 0.1136968 1512 39.98281  0.0000
BandSepultura No Max -0.465626 0.1669412   19 -2.78916  0.0117
BandSoulfly          -0.262609 0.1471749   19 -1.78433  0.0903
 Correlation: 
                     (Intr) BndSNM
BandSepultura No Max -0.681       
BandSoulfly          -0.773  0.526

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.3954974 -0.3147123  0.3708523  0.6268751  1.3987442 

Number of Observations: 1534
Number of Groups: 22

The difference in ratings between Sepultura without Max compared to with him was b = -0.47 and significant at p = .012 (ratings for post-max Sepultura are significantly worse than for the Max-era Sepultura). The difference in ratings between Soulfly compared two Max-era Sepultura was b = -0.26 and not significant (p = .09) (ratings for Soulfly are not significantly worse than for the Max-era Sepultura). A couple of points here, p-values are silly, so don’t read too much into them, but the parameter (the bs) which quantifies the effect is a bit smaller for Soulfly.

Confidence Intervals

Interestingly if you write yourself a little bootstrap routine to get some robust confidence intervals around the parameters:

Code

library(boot)

boot_lme <- function(data, indices){
    data <- data[indices,] # select obs. in bootstrap sample
    model <- lme(Rating ~ Band, random = ~1|Album, data = data, method = "ML")
    fixef(model) # return coefficient vector
}

max_boot <- boot(maxvsnomax, boot_lme, 1000)
max_boot


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = maxvsnomax, statistic = boot_lme, R = 1000)


Bootstrap Statistics :
      original       bias    std. error
t1*  4.5459183 -0.001065884  0.03882691
t2* -0.4656258  0.008973167  0.08060762
t3* -0.2626088  0.001218160  0.06088377

Code

boot.ci(max_boot, index = 1, type = "perc")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = max_boot, type = "perc", index = 1)

Intervals : 
Level     Percentile     
95%   ( 4.467,  4.615 )  
Calculations and Intervals on Original Scale

Code

boot.ci(max_boot, index = 2, type = "perc")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = max_boot, type = "perc", index = 2)

Intervals : 
Level     Percentile     
95%   (-0.6227, -0.2951 )  
Calculations and Intervals on Original Scale

Code

boot.ci(max_boot, index = 3, type = "perc")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = max_boot, type = "perc", index = 3)

Intervals : 
Level     Percentile     
95%   (-0.3816, -0.1429 )  
Calculations and Intervals on Original Scale

The difference in ratings between Sepultura without Max compared to with him was b = -0.47 [-0.62, -0.31]. The difference in ratings between Soulfly compared to Max-era Sepultura was b = -0.26 [-0.39, -0.15].¹ This suggests that both soulfully and post-Max Sepultura yield negative parameters that reflect (to the degree that you believe that a confidence interval tells you about the population parameter…) a negative effect in the population. In other words, both bands are rated worse than Max-era Sepultura.

¹ These figures might differ slightly from the output because each time I generate the blog the bootstrap will re-run and results change slightly. I can’t be bothered to write inline code to update the values: they’ll be roughly the same.

Summary

Look, this is just a bit of fun and an excuse to show you how to use a bootstrap on a multilevel model, and how you can use data to try to answer pointless questions thrown at you on Twitter. Based on this hastily thrown together analysis that makes a lot of assumptions about a lot of things, my 120 character twitter response will be: Sepultura Max better than everything, but post 1996 Max is no better than No Max;-)

Citation

BibTeX citation:

@online{field2016,
  author = {Field, Andy},
  title = {Max or {No} {Max?}},
  date = {2016-03-29},
  url = {https://profandyfield.com/posts/2016_03_29_sepultura/},
  langid = {en}
}

For attribution, please cite this work as:

Field, Andy. 2016. “Max or No Max?” March 29, 2016. https://profandyfield.com/posts/2016_03_29_sepultura/.