PHP - Website Scraping

Load a webpage as a .txt file and strip specific information.

Getting Started

Call a built-in php function. We'll use Home Hardware as an example. I'll use a get method for the SKU or ITEM Number to test a number of SKUs.

$txt = file_get_contents("https://www.homehardware.ca/en/p/".$_GET['sku']);

Load up the URL requested above with a SKU or ITEM Number to view the txt. Or simply print_r the results to get the exact information you require.

In this case, I already know the required HTML txt to get specific information.

$productName    = get_string_between($txt, '<h1 class="productdetails-title">','</h1>');
$productBrand   = get_string_between($txt, 'class="product-brand">','</a>');
$productNumber  = get_string_between($txt, '<li>Item: # ','</li>');
$productModel   = get_string_between($txt, '<li>Model: # ','</li>');
$productImage   = get_string_between($txt, 'data-image="', '"');

Strip the String

This function was not written by me -- there are many variations online. It's very simple; the function will look through the txt file (or in this case, converted to a string) and return the text in between the starting and ending text specified above.

function get_string_between($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0)
        return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

Echo / Print Results

echo '<h1>'.$productName.'</h1>';
echo '<div>'.$productBrand.'</div>';
echo '<div>'.$productNumber.'</div>';
echo '<div>'.$productModel.'</div>';
echo '<div><img src="'.$productImage.'"/></div>';

HHSearch