home / cookbook / 05-data-analysis

05 · Data Analysis on CSVs

The most load-bearing case: 25 000 rows of user-level survey data and 18 platform aggregates, four non-trivial analytical questions, one markdown report. DSCC has to pick a tool (pandas, polars, awk — its call), drive it via bash, and produce numerically correct tables.

This case reuses the existing demo assets under demo/social_media_analysis/ so it runs end-to-end against real data.

Capability demonstrated

bash to execute analysis pipelines (pandas / awk / etc.).
read_file / write_file for the report.
DangerFullAccess permission mode — the agent needs to write under /tmp/ and run arbitrary shell. See guides/permissions.md for when this is appropriate.

Setup

Datasets must live at /tmp/social_analysis/:

social_media_user_behavior.csv — 25 000 user rows, ~45 columns.
platform_statistics_2026.csv — 18 platform aggregates.

The prompt lives at demo/social_media_analysis/PROMPT.md — it is the literal prompt used in the shipped demo, so we link rather than duplicate.

Python 3 with pandas installed is the easiest dependency for the model to use. A working awk + sort stack also suffices if you want to test that path.

Run command

dscc --model claude-opus-4-6 \
  --permission-mode danger-full-access \
  prompt "$(cat demo/social_media_analysis/PROMPT.md)"

A live run is preserved at report_dscc.md (the literal markdown the model wrote to /tmp/social_analysis/report_dscc.md) and dscc_run.log. Both are mirrors of the authoritative demo artifacts under demo/social_media_analysis/.

Expected behavior

The model should typically:

bash ls /tmp/social_analysis/ and a quick head on both CSVs to learn the column names.
bash a Python (or awk) pipeline that:
- bins addiction_level_1_to_10 into Low/Medium/High;
- ranks platforms by user count;
- computes cyberbullying rate per age_group.
write_file (or bash-redirect) the markdown report to /tmp/social_analysis/report_dscc.md.

Verification

/tmp/social_analysis/report_dscc.md exists.
Report contains four sections in order: addiction groups, platform ranking, age-group cyberbullying, one-line conclusion.
Numeric values render with two decimals where the prompt requires it.
The conclusion line is at most ~80 Chinese characters.

Diff against the reference report in demo/social_media_analysis/report_dscc.md — numbers should match within rounding.