π Key Features
-
β Automated Directory Scanning and Indexing:
- Recursive scanning of the ISMN root directory (organized by network β station β file).
- Automatic caching of site metadata to accelerate future processing (
./tmp/info.csv).
-
β Advanced Multi-dimensional Filtering:
- Filter sites by continent, network (e.g., USCRN).
- Depth-based filtering supporting three thresholds (
0.05,0.1,0.2m β¦).
-
β Robust Data Quality Control:
- Observations outside the range
[0,1]are flagged as missing (NaN). - Aggregates hourly data to daily averages.
- Observations outside the range
-
β Flexible Data Export:
- Annual CSV outputs (
soil_moisture.csv) organized by year (rows: sites, columns: daily values). - Detailed site coordinates (
crd.csv) and metadata (info.csv) included.
- Annual CSV outputs (
π Soil Depth Threshold Selection
The pipeline supports selecting soil moisture observations at three standard depth thresholds:
- 0.05 m: selects data measured from surface to a depth of approximately 0.075 m.
- 0.1 m: selects data measured between depths approximately 0.075 m to 0.150 m.
- 0.2 m: selects data measured between depths approximately 0.150 m to 0.250 m.
These standard thresholds are set as:
if depth_threshold == 0.05:
depth_bool = (depth_end <= 0.075) & (depth_str >= 0)
elif depth_threshold == 0.1:
depth_bool = (depth_end <= 0.150) & (depth_str >= 0.075 + eps)
elif depth_threshold == 0.2:
depth_bool = (depth_end <= 0.250) & (depth_str >= 0.15 + eps)
else:
raise ValueError(f"You can add your custom depth_threshold {depth_threshold}")
To use different depth layers, modify this logic according to your requirements. Note that the current implementation does not interpolate between different depth layers. If interpolation between depth layers is needed, you can add the corresponding functionality based on your analysis needs.
βοΈ Environment Setup
Python
Python β₯ 3.9 recommended (3.12 preferred).
Dependencies
Create and activate an environment, then install dependencies:
conda create -n rainflow-env python=3.12 -y
conda activate rainflow-env
pip install numpy pandas tqdm
π₯ How to Download ISMN Data
- Log in at https://ismn.earth/en/
- Go to Data Access
- Select the networks (or choose all)
- Select the desired date range (e.g., 2015-01-01 to 2025-08-31)
- Click Download
- Choose βVariables stored in separate files (Header+values formatted) (zipped)β and select Gap filling
- Click Download
- Unzip into a directory, e.g.:
path_directory/ISMN_20150101_20250831
π Expected Input Layout (example)
ISMN_20150101_20250831/
<NETWORK_A>/
<STATION_ID_1>/
*.txt ...
<STATION_ID_2>/
*.txt ...
<NETWORK_B>/
...
π» Quickstart (Command Line)
Single network (USCRN):
python extract_international_soil_moisture_network.py \
--input-dir path_directory/ISMN_20150101_20250831 \
--output-dir ./formatted_ISMN \
--start-date 2015-01-01 --end-date 2020-12-31 \
--depth-threshold 0.05 \
--network USCRN
Multiple networks (USCRN, COSMOS):
python extract_international_soil_moisture_network.py \
--input-dir path_directory/ISMN_20150101_20250831 \
--output-dir ./formatted_ISMN \
--start-date 2015-01-01 --end-date 2020-12-31 \
--depth-threshold 0.05 \
--network USCRN COSMOS
One or more continents (Europe and Asia):
python extract_international_soil_moisture_network.py \
--input-dir path_directory/ISMN_20150101_20250831 \
--output-dir ./formatted_ISMN \
--start-date 2015-01-01 --end-date 2020-12-31 \
--depth-threshold 0.05 \
--continent Europe Asia
All continents (processed separately in one run):
python extract_international_soil_moisture_network.py \
--input-dir path_directory/ISMN_20150101_20250831 \
--output-dir ./formatted_ISMN \
--start-date 2015-01-01 --end-date 2020-12-31 \
--depth-threshold 0.05 \
--continent ALL
Global extraction (all continents merged):
python extract_international_soil_moisture_network.py \
--input-dir path_directory/ISMN_20150101_20250831 \
--output-dir ./formatted_ISMN \
--start-date 2015-01-01 --end-date 2020-03-31 \
--depth-threshold 0.05 \
--global
π Continent Keys
Use the following exact keys with --continent:
Africa, Asia, Australia, Europe, North_America, Oceania, South_America, or ALL.
π¦ Output Structure
For each selection (per network, per continent, or GLOBAL), the script writes outputs under the specified --output-dir in a dedicated subfolder.
Typical contents:
-
Annual data: one CSV per year named
soil_moisture.csvwith- Rows: stations
- Columns: daily aggregated soil moisture values
- Coordinates:
crd.csv(station coordinates and IDs) - Metadata:
info.csv(siteβlevel attributes)
Missing values: -9999
Example tree (for a network run):
formatted_ISMN/
USCRN/
2015/soil_moisture.csv
2016/soil_moisture.csv
...
crd.csv
info.csv
π Citation
If you use this pipeline in your research, please cite:
Liu, J., Rahmani, F., Lawson, K., & Shen, C. (2022). A multiscale deep learning model for soil moisture integrating satellite and in situ data. Geophysical Research Letters, 49(7), e2021GL096847. https://doi.org/10.1029/2021GL096847
Liu, J., Hughes, D., Rahmani, F., Lawson, K., & Shen, C. (2023). Evaluating a global soil moisture dataset from a multitask model (GSM3 v1.0) with potential applications for crop threats. Geoscientific Model Development, 16(5), 1553β1567. https://doi.org/10.5194/gmd-16-1553-2023
π Acknowledgments
- International Soil Moisture Network (ISMN)
- ChatGPT for assistance in refining documentation.
π License
This software is licensed under CC BYβNC 4.0. Academic and research use is explicitly permitted.
Start the conversation