Azure Green Product Scraping

URL Structure

Azure Green uses two URL patterns:

  • /departments/{id}/ - Category navigation (nested hierarchy)
  • /products/{id}/ - Actual product listings

Important: Use /products/ URLs for scraping actual items.

Working Product IDs

Candles

IDDescriptionCount
111Ritual candles~20
112Household candles~20
113Pillar candles~17
114Figure candles (cat, knob, etc)~20
143Chime candles~20
262Votive candles~20
268Crystal Journey pillars~20

Incense

IDDescriptionCount
35Burners & holders~20
36Cone incense~19
37Powder incense~20
549Palo Santo & smudge~20
609Brass burners~20
379Charcoal discs~14
608Wood burners~27
1052Backflow burners~33
1074Waterfall incense~34

Crystals & Stones

IDDescriptionCount
193Raw/bulk stones~42
394Tumbled stones~40

Other Categories

IDDescriptionCount
2Tarot decks~20
250Best sellers~20

Scraper Usage

# Scrape a specific category
npx tsx scripts/scrape-dept.ts \
  "https://www.azuregreen.net/Candles/products/111/" \
  --output ./data/candles.json \
  --check-images
 
# Options:
#   --output <file>   Output JSON file
#   --limit <n>       Max products to scrape
#   --check-images    Verify image URLs exist

Image URL Pattern

Standard pattern: https://www.azuregreen.net/images/{SKU}.jpg

Some products have missing images. Current workarounds:

  1. Check alternate patterns (lowercase, underscores)
  2. Use category placeholder images
  3. Source manually

Import Pipeline

# 1. Scrape products
npx tsx scripts/scrape-dept.ts <url> --output ./data/<category>.json
 
# 2. Combine all scraped data
node -e "..." # (see all-scraped.json generation script)
 
# 3. Import to database
npx tsx scripts/import-azure-green.ts --input ./data/all-scraped.json
 
# 4. Auto-tag with correspondences
npx tsx scripts/auto-tag-products.ts

Current Stats (2026-01-31)

  • Total unique products scraped: 444
  • With images: 444 (100%)
  • Categories covered: Candles, Incense, Tarot, Crystals, Amulets, Athames, Scrying, Voodoo, Salts, Statues

Known Issues

  1. Misclassification - Some products tagged wrong (e.g., β€œGoddess athame” β†’ Statue)

    • Fix: Review auto-tagging logic or manual corrections
  2. Salt/Tool overlap - Some salts tagged as tools

    • 13 products in Tools category have missing images (not on AzureGreen CDN)
  3. Pagination - Scraper handles basic pagination but may miss some pages

    • Workaround: Manually verify counts against site