DSCC
home / demo / social_media_analysis / report_v2_dscc

社交媒体行为分析报告

Q1. 跨表一致性检查

平台 用户均值 平台表值 差值
Bluesky 142.4854 8.0000 134.4854
Discord 146.5087 40.0000 106.5087
Facebook 140.2221 33.0000 107.2221
Instagram 140.0627 35.0000 105.0627
LinkedIn 140.3275 11.0000 129.3275
Pinterest 141.9667 14.0000 127.9667
RedNote 141.2082 25.0000 116.2082
Reddit 141.7874 24.0000 117.7874
Snapchat 141.2541 30.0000 111.2541
Telegram 142.3301 18.0000 124.3301
Threads 137.7528 12.0000 125.7528
TikTok 142.5701 58.0000 84.5701
WhatsApp 143.4046 28.0000 115.4046
X (Twitter) 140.6729 22.0000 118.6729
YouTube 141.1732 48.0000 93.1732

Pearson 相关系数: 0.3143

绝对差最大的 3 个平台:

  1. Bluesky: 绝对差 = 134.4854
  2. LinkedIn: 绝对差 = 129.3275
  3. Pinterest: 绝对差 = 127.9667

Q2. 独立性检验

2×2 列联表:

has_purchased_via_social=False has_purchased_via_social=True
is_content_creator=False 13048 8160
is_content_creator=True 2340 1452

chi2 = 0.0390, dof = 1, p_value = 0.8435

结论: 在 α=0.05 水平上,无法拒绝独立性假设

Q3. 逻辑回归:预测是否通过社交平台购买过

Train AUC: 0.5220 Test AUC: 0.5006 Test Accuracy: 0.6156

系数绝对值 Top 5 的特征:

特征 系数
primary_platform_Threads 0.1585
income_bracket_$150K+ -0.1203
primary_platform_WhatsApp 0.1056
primary_platform_TikTok 0.1035
primary_platform_Snapchat 0.1025

Q4. Simpson’s 风险检查

r_all (整体): -0.3698 r_creator (创作者): -0.3615 r_non_creator (非创作者): -0.3713

符号反转或量级减半以上检查:

Q5. K-Means 行为聚类

每簇规模:

用户数 占比(%)
0 6585 26.3400
1 7136 28.5440
2 2658 10.6320
3 8621 34.4840

每簇原始特征均值:

daily_screen_time_minutes engagement_rate_pct posts_per_week addiction_level_1_to_10
0 116.6691 2.9098 2.1001 2.1072
1 207.9937 1.7783 2.2190 4.8065
2 141.5662 1.8286 10.5779 3.0899
3 104.3116 1.1635 2.0220 1.8397

业务标签:

Q6. 异常参与度用户

残差绝对值 Top 5 用户:

user_id followers_count engagement_rate_pct predicted residual
USR-020194 1025 6.1200 1.8388 4.2812
USR-020577 1733 5.9400 1.8455 4.0945
USR-010877 1581 5.7100 1.8443 3.8657
USR-005491 1324 5.6000 1.8420 3.7580
USR-001540 5706 5.5300 1.8608 3.6692