Seeking open-ended, CTF-style threat hunting datasets for Microsoft Sentinel (similar to BOTSv3, under 10GB/day)
Iโm looking for recommendations on CTF-style threat hunting datasets that integrate well with Microsoft Sentinel. I recently finished a massive investigative threat hunt project using the Splunk BOTSv3 dataset and absolutely loved it. Even though I only uncovered about 60% of the adversary's full execution tree, the sheer scope, deep technical challenge, and open-ended nature of the hunt made it an incredibly rewarding project.
I published my investigative logs and Splunk detection playbooks from that project to my GitHub, put it on my resume, got a Splunk cert, and now I want to do the same exact thing, but with Sentinel. My initial plan was to use BOTSv2, but I've recently discovered the amount of work it would require to get the Splunk logs normalized to the KQL schema, so I'm looking for a backup option.
This upcoming project is designed to serve three distinct goals:
- Portfolio & Resume Evidence: Documenting the end-to-end hunt, ingestion engineering, and playbook creation.
- SC-200 Prep: Gaining proficiency with KQL syntax to prepare for the SC-200 exam.
- Methodology Refinement: Sharpening vendor-agnostic threat hunting and detection engineering methodologies that easily transfer across SIEM platforms.
What I am specifically looking for in a dataset:
- Open-Ended/Full Scope: I want to avoid datasets that are hand-holding or strictly oriented around a single, pre-mapped MITRE ATT&CK technique with no deviations. I want a true "needle in a haystack" investigative challenge. Ideally I'd like a full scoped attack starting from the reconnaissance/initial access phases and ending with exfiltration.
- Realistic White Noise: It needs to contain benign baseline background traffic so I encounter realistic false positives, forcing me to actively tune my KQL detections just like in a real world environment.
- Data Cap Friendly: Because this is for a cloud home lab, I would like to respect a 10GB daily data ingestion limit to keep my Azure workspace under the free trial allocation. I am open to drip-feeding a larger dataset across multiple days or spending a small amount of money, but ingesting a full 300gb dataset like BOTSv2 isn't an option.
Every Sentinel dataset Iโve stumbled across so far seems incredibly limited in scope or feels too "on rails" (e.g., executing one isolated script and immediately querying the single resulting alert).
Does anyone have recommendations for datasets that fit this open-ended criteria while respecting the 10GB daily ingestion cap? Are there any viable options outside of Mordor? Because of how modular it is, I'm concerned it'll lack the broader, interconnected scope I'm looking for.
[link] [comments]