DSCC
home / cookbook / 05-data-analysis

05 · Data Analysis on CSVs

The most load-bearing case: 25 000 rows of user-level survey data and 18 platform aggregates, four non-trivial analytical questions, one markdown report. DSCC has to pick a tool (pandas, polars, awk — its call), drive it via bash, and produce numerically correct tables.

This case reuses the existing demo assets under demo/social_media_analysis/ so it runs end-to-end against real data.

Capability demonstrated

Setup

Datasets must live at /tmp/social_analysis/:

The prompt lives at demo/social_media_analysis/PROMPT.md — it is the literal prompt used in the shipped demo, so we link rather than duplicate.

Python 3 with pandas installed is the easiest dependency for the model to use. A working awk + sort stack also suffices if you want to test that path.

Run command

dscc --model claude-opus-4-6 \
  --permission-mode danger-full-access \
  prompt "$(cat demo/social_media_analysis/PROMPT.md)"

A live run is preserved at report_dscc.md (the literal markdown the model wrote to /tmp/social_analysis/report_dscc.md) and dscc_run.log. Both are mirrors of the authoritative demo artifacts under demo/social_media_analysis/.

Expected behavior

The model should typically:

  1. bash ls /tmp/social_analysis/ and a quick head on both CSVs to learn the column names.
  2. bash a Python (or awk) pipeline that:
    • bins addiction_level_1_to_10 into Low/Medium/High;
    • ranks platforms by user count;
    • computes cyberbullying rate per age_group.
  3. write_file (or bash-redirect) the markdown report to /tmp/social_analysis/report_dscc.md.

Verification

Diff against the reference report in demo/social_media_analysis/report_dscc.md — numbers should match within rounding.