Use this file to discover all available pages before exploring further.
Extract detailed property information from individual Zillow property pages using ZenRows’ Universal Scraper API. This tutorial shows you how to scrape specific data fields from single properties and scale up to process multiple listings efficiently.
Real estate professionals need detailed property information for market analysis, investment decisions, and lead generation. Extracting structured data from property listings enables:
Start by creating a scraping function that handles Zillow’s anti-bot measures. You’ll need JavaScript Rendering to process dynamic content and Premium Proxies to avoid IP blocking.
This function returns the website’s HTML content. The js_render parameter enables JavaScript processing (essential for dynamic websites like Zillow), while premium_proxy provides residential IP addresses that appear as regular user traffic.
The proxy_country parameter is optional. If you don’t specify it, ZenRows will use a random IP address from anywhere in the world. Learn more about geolocation here.
Before scaling to multiple properties, test your scraper with a single Zillow property page. Use ZenRows’ css_extractor feature to automatically extract specific data fields.Create CSS selectors to target the property data you need:
CSS selectors can change when websites update their code. To maintain a reliable scraper, monitor your selectors regularly and update them as needed. Learn more about CSS selectors here.
Update your scraper function to handle CSS extraction and return JSON data:
Once your single property extraction works reliably, scale up to process multiple property URLs. Here are some example Zillow property URLs you can use for testing:
Python
# Example property URLs from Zillowproperty_urls = [ "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/", "https://www.zillow.com/homedetails/882-Highpoint-Dr-Coal-Center-PA-15423/2087529083_zpid/", "https://www.zillow.com/homedetails/508-5th-St-California-PA-15419/49745256_zpid/", "https://www.zillow.com/homedetails/721-Spring-St-Roscoe-PA-15477/49794732_zpid/"]all_property_data = []# Extract data from each propertyfor property_url in property_urls: try: properties = scraper( property_url, css_extractor=property_css_extractor, ) # Clean and structure the data cleaned_properties = clean_property_data(properties) # Add the URL for reference cleaned_properties["source_url"] = property_url all_property_data.append(cleaned_properties) print(f"Successfully extracted data from: {property_url}") except Exception as e: print(f"Error extracting data from {property_url}: {str(e)}") continueprint(f"Successfully extracted data for {len(all_property_data)} properties")
Save the property data in JSON format to preserve the nested structure:
Python
# Store the data as JSONwith open("properties.json", "w", encoding="utf-8") as f: json.dump(all_property_data, f, ensure_ascii=False, indent=2)print(f"Data saved to properties.json")
For flat data structures, you can also export to CSV:
Python
import pandas as pd# Convert to DataFrame and save as CSV (flattens nested data)df = pd.json_normalize(all_property_data)df.to_csv("properties.csv", index=False)print("Data saved to properties.csv")
Here’s the complete code that extracts property data from multiple URLs:
Python
# pip install requests pandasimport requestsimport jsonimport pandas as pddef scraper(url, css_extractor=None): apikey = "YOUR_ZENROWS_API_KEY" params = { "url": url, "apikey": apikey, "js_render": "true", "premium_proxy": "true", "proxy_country": "us", "wait": "2000", "css_extractor": css_extractor, } response = requests.get("https://api.zenrows.com/v1/", params=params) return response.json()def clean_property_data(properties): """Clean and structure property data""" # Clean the "listed_by" field listed_by = properties.get("listed_by") if listed_by: properties["listed_by"] = listed_by.replace("Listed by:", "").strip() # Process and merge price history data price_history_dates = properties.get("price_change_dates") price_changes = properties.get("price_changes") if price_history_dates and price_changes: combined_price_history = [] for i in range(0, len(price_history_dates), 2): date = price_history_dates[i] event = ( price_history_dates[i + 1] if i + 1 < len(price_history_dates) else "" ) price = price_changes[i // 2] if i // 2 < len(price_changes) else "" combined_price_history.append({ "Date": date, "Event": event, "Price": price }) properties["price_history"] = combined_price_history # Remove the original fields after combining for key in ["price_change_dates", "price_changes"]: if key in properties: del properties[key] return properties# CSS extractor for property dataproperty_css_extractor = json.dumps( { "price": "span[data-testid='price']", "location": "ul.footer-breadcrumbs a, ul.footer-breadcrumbs strong", "dimension": "div[data-testid='bed-bath-sqft-facts']", "description": "div[data-testid='description']", "listed_by": "div[data-testid='seller-attribution']", "price_change_dates": "span[data-testid='date-info']", "price_changes": "td[data-testid='price-money-cell']", })# Example property URLsproperty_urls = [ "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/", "https://www.zillow.com/homedetails/882-Highpoint-Dr-Coal-Center-PA-15423/2087529083_zpid/", "https://www.zillow.com/homedetails/508-5th-St-California-PA-15419/49745256_zpid/", "https://www.zillow.com/homedetails/721-Spring-St-Roscoe-PA-15477/49794732_zpid/"]all_property_data = []# Extract data from each propertyfor property_url in property_urls: try: properties = scraper( property_url, css_extractor=property_css_extractor, ) # Clean and structure the data cleaned_properties = clean_property_data(properties) # Add the URL for reference cleaned_properties["source_url"] = property_url all_property_data.append(cleaned_properties) print(f"✓ Successfully extracted data from: {property_url}") except Exception as e: print(f"✗ Error extracting data from {property_url}: {str(e)}") continue# Save the datawith open("properties.json", "w", encoding="utf-8") as f: json.dump(all_property_data, f, ensure_ascii=False, indent=2)# Also save as CSV for easy analysisdf = pd.json_normalize(all_property_data)df.to_csv("properties.csv", index=False)print(f"\n🎉 Successfully extracted data for {len(all_property_data)} properties")print("Data saved to properties.json and properties.csv")
Congratulations! 🎉 You’ve successfully extracted detailed property data from Zillow using ZenRows’ web scraping capabilities.
Design your data structure to match your specific business needs rather than using a generic approach. For example, if you need to analyze price trends, structure price history as date/price pairs, not raw HTML. Map scraped fields directly to your business entities (property, agent, transaction) and use consistent, clear field names that work for your team.
Check for missing or malformed fields, unexpected data types, duplicate entries, and values outside the expected range. Use validation scripts or schema checks (such as Python’s pydantic or JSON Schema). Automate this process to maintain consistency throughout your data extraction workflow.
Solution 1: Use adequate delay strategies to allow dynamic content to load. Increase the wait parameter value or add specific waits for elements that load asynchronously.Solution 2: Verify you’re using the correct CSS selectors. Test each selector using the ZenRows Request Playground before integrating it into your code. Create a monitoring strategy to spot site structural changes and isolate selectors from your codebase for easy debugging.
Solution 1: Ensure you’re using anti-bot bypass parameters like js_render and premium_proxy.Solution 2: If you continue encountering CAPTCHAs, integrate a CAPTCHA-solving service like 2Captcha through our solver integration options. Check our 2Captcha integration guide for implementation details.Solution 3: Use fallback mechanisms or alternative pathways to prevent scraping failures.
Solution 1: Use the premium_proxy parameter to automatically rotate IP addresses.Solution 2: Implement request retry mechanisms with exponential backoff delays between failed requests.Solution 3: Add delays between requests to avoid overwhelming the target website.