We took on the task of collecting and standardizing a large dataset, ≈ 14, 000 of Facebook posts (https://github.com/lyr-uam/CorpusReaccion) from 10 brands that have an important presence in Mexico. Collected corpus (Table 1), in Spanish language, can be used for training and evaluating automatic systems that aim at predicting several customer’s engagement metrics, specifically Face- book’s reactions (i.e., Like, Love, Haha, Wow, Sad and Angry), sharing amount, and the number of comments generated by a post. Therefore, we define the task of predicting consumer’s engagement as the process of classifying whether a post will have higher (or lower) impact volume than the average seen in training data. Accordingly, our collected dataset defines six binary classification prob- lems, namely: i) comments (|C|), ii) sharing (|S|), ii) total reactions (|R|), iv) positive reactions (|R+|), v) negative reactions (|R−|) and, vi) neutral reactions (|R |). Each classification problem has categories high-impact and low-impact. The methodology for assigning each post’s category was: for each classification problem, we compute the average value of metric k among all the posts from the ten brands, this is referred as x̄k . Once we know the value x̄k , for each post contained in brand i, we review the value of metric k in post pi , thus, if pi,k > x̄k the category of the post is assigned to high-impact, or low-impact otherwise.
Table 1. Number of high- and low- impact instances for each problem.
Brand name | R | R + | R − | R | C | S |
---|---|---|---|---|---|---|
high low | high low | high low | high low | high low | high low | |
Clash Royale ES | 189 369 | 165 393 | 209 349 | 217 341 | 264 294 | 39 519 |
Canon Mexicana | 100 1014 | 94 1020 | 43 1071 | 90 1024 | 78 1036 | 96 1018 |
Muy Interesante México | 775 1393 | 790 1378 | 136 2032 | 374 1794 | 153 2015 | 824 1344 |
Cinépolis | 991 958 | 966 983 | 353 1596 | 729 1190 | 883 1067 | 745 1204 |
Discovery Channel | 109 1566 | 112 1563 | 110 1565 | 85 1590 | 9 1666 | 60 1615 |
National Geographic | 124 1620 | 132 1612 | 110 1634 | 53 1691 | 50 1694 | 115 1629 |
Fisher-Price | 248 594 | 266 576 | 15 827 | 31 811 | 119 723 | 43 799 |
Xbox México | 230 1424 | 230 1424 | 168 1486 | 155 1499 | 299 1355 | 92 1562 |
Nikon | 24 1208 | 33 1199 | 16 1216 | 6 1226 | 12 1220 | 10 1222 |
Lacoste | 46 669 | 57 658 | 0 715 | 2 713 | 2 713 | 4 711 |