Monthly Archives: February 2014

How to make a WooCommerce Product Importer

Okay, I've had to do this for two clients this month, so I thought I'd write a post on how to make a product scraper that works with WooCommerce and WordPress.

2 Ways to Make a Product Importer


Before you begin, both method described in this article use the simple HTML Dom library for php which you can download HERE, just download the file and change it from a .txt file to a .php file and include it at the top of your scripts as you'll see below. You may have to copy and paste it into a blank PHP file from the browser window.

The two most common ways to import products from an external source is probably either to scrape the site you want to sell products from or get an Excel or CSV file directly from your product supplier. I have done both methods so will cover the basics here. If you need this done and don't want to do it yourself as it is rather complex, hire me and I'll do it for you at a very reasonable price since I have experience doing it now.

Using an Excel file from the Product Vendor

If you get an excel file from a vendor, you'll need to write an algorithm or application in PHP to convert the Excel file into a CSV file compatible with the CSV importer plugin for WooCommerce seen below:


So you prob have an Excel file from your supplier that looks something like this:


Now to work with the CSV Importer plugin for WooCommerce and WordPress, you need it in CSV format specially formatted to provide specific product information for your shopping cart. The CSV file needs to look something like this one I made for a client recently;

2156635698;2156635698;3.25 carat round brilliant loose certified diamond;Color: F,Cut: EX,Clarity: SI2,Shape: round brilliant,Loose GIA certified diamond with certification number 2156635698;Color: F,Cut: EX,Clarity: SI2,Shape: round brilliant,Loose GIA certified diamond with certification number 2156635698;loose certified diamonds;GIA certified|colorless|Slightly Included|Peter Michaelson Jewellery;1;61261;
2131664334;2131664334;2.35 carat round brilliant loose certified diamond;Color: E,Cut: EX,Clarity: VS1,Shape: round brilliant,Loose GIA certified diamond with certification number 2131664334;Color: E,Cut: EX,Clarity: VS1,Shape: round brilliant,Loose GIA certified diamond with certification number 2131664334;loose certified diamonds;GIA certified|colorless|Very Slightly Included|Peter Michaelson Jewellery;1;63694;

Above is what the .csv file looks like opened in a text editor, open it in Excel and it looks a little nicer as you can see below:


So the goal here is to convert the below Excel on the left to the CSV file on the right to use with CSV Importer:



Here is how to convert the Excel file to a CSV file you can upload as a product list into your WordPress WooCommerce website:

Instead of providing a very long explanation about how to make this Excel xlsx to csv file converter, I'm just going to get you started with some code that you can modify to fit your unique situation:

<!DOCTYPE html>
<title>CSV Maker</title>
<table id="results-table" border="1" cellpadding="3" style="border-collapse: collapse">
<th style="font-size:10px">Fluorescence</th>
<th>Cert. No.</th>
//search code:
require_once "simplexlsx.class.php";
$xlsx = new SimpleXLSX('products.xlsx');

//start variable to hold CSV output for woocommerce importer:
$CSV = 'sku;post_name;post_title;post_content;post_excerpt;category;tags;stock;price;featured_image'.PHP_EOL;

list($cols,) = $xlsx->dimension();
foreach( $xlsx->rows() as $k => $r) {
if ($k == 0) continue; // skip first row
//start variable to show cur row:
$curRow = '';
$curRow .='<tr>';
for( $i = 0; $i < $cols; $i++) {
//if i=0 it is carat:
$car = $r[$i];
$curRow .='<td>'.$car.'</td>';

//if i=1 it is shape:
$shp = $r[$i];
$shp = strtolower($shp);
$curRow .='<td>'.$shp.'</td>';

//if i=2 it is dimensions:
$dim = $r[$i];
$curRow .='<td>'.$dim.'</td>';

//if i=3 it is color:
$col = $r[$i];
$curRow .='<td>'.$col.'</td>';

//if i=4 it is clarity:
$cla = $r[$i];
$curRow .='<td>'.$cla.'</td>';

//if i=5 it is cut
$cut = $r[$i];
$curRow .='<td>'.$cut.'</td>';

//if i=6 it is certificate type ie: GIA
$cer = $r[$i];
$curRow .='<td>'.$cer.'</td>';

//if i=7 it is fluorescence:
$flo = $r[$i];
$curRow .='<td>'.$flo.'</td>';

//if i=9 it is Price:
$pri = $r[$i];
$curRow .='<td>$'.$pri.'</td>';
}//end for loop for ea. column.
$curRow .='</tr>';
//show row:
echo $curRow;
//build csv file content:
//carat = $car
//shape = $shp
//dimensions = $dim
//color = $col
//clarity = $cla
//cut = $cut
//cert type = $cer
//fluoresence = $flo
//price = $pri

/* make following CSV fields from above data:
//test print ea. column of the excel sheet:
echo "<tr><td>carat=$car</td><td>shape=$shp</td><td>measure=$dim</td><td>color=$col</td><td>clarity=$cla</td><td>cut=$cut</td><td>c type=$cer</td><td>fluorescence=$flo</td><td>price=$pri</td></tr>";
//build product description

//build sku using cert no.:
$sku = $certno;
//make post name cert no too:
$post_name = $certno;
//make the post title be the carat and the shape:
$post_title = $car." carat ".$shp." loose certified diamond";
//make post content be color, cut, clarity, shape, certificate type and no.
$post_content = "Color: $col,";
$post_content .= "Cut: $cut,";
$post_content .= "Clarity: $cla,";
$post_content .= "Shape: $shp,";
$post_content .= "Loose $cer certified diamond with certification number $certno";
//make post excerpt the same as post content:
$post_excerpt = $post_content;
//make category be loose certified diamonds:
$category = 'loose certified diamonds';

//make tags be cert, type, color, clarity Peter Michaelson Jewellery.
//if color is d, e, f, it is colorless if it is g, h, i, it is near colorless
if($col == 'D'||$col == 'E' ||$col == 'F' ){$coltag = 'colorless';}else{$coltag = 'near colorless';}
//clarity>>> IF=Internally Flawless | VVS1/VVS2=Very, Very Slightly Included | VS1/VS2=Very Slightly Included | SI1/SI2=Slightly Included | P1=Included
if($cla=='IF'){$claritytag = 'Internally Flawless';}
if($cla=='VVS1/VVS2'){$claritytag = 'Very, Very Slightly Included';}
if($cla=='VS1/VS2'){$claritytag = 'Very Slightly Included';}
if($cla=='SI1/SI2'){$claritytag = 'Slightly Included';}
if($cla=='P1'){$claritytag = 'Included';}
$tags = $cer." certified|";
$tags .= $coltag."|";
$tags .= $claritytag."|";
$tags .= "Peter Michaelson Jewellery";
//End making tags for csv.

//build stock for csv:
$stock = 1;
//build price for csv:
$preprice = $pri * 0.15;
$price = $pri + $preprice;
//build featured_image for csv:
if($shp=='round brilliant'){
$fimg = '';
$fimg = '';
$fimg = '';
//build csv file contents using above variables such as:
//sku post_name post_title post_content post_excerpt category tags stock and price
$CSV.= $sku.";";
$CSV.= $post_name.";";
$CSV.= $post_title.";";
$CSV.= $post_content.";";
$CSV.= $post_excerpt.";";
$CSV.= $category.";";
$CSV.= $tags.";";
$CSV.= $stock.";";
$CSV.= $price.";";
$CSV.= $fimg.PHP_EOL;

}//end foreach loop for each row.
//test and debug show csv file content:
echo "<hr />CSV contents:<br />$CSV<hr />";

//write to csv file after foreach loop:
$myFile = "products.csv";
$fh = fopen($myFile, 'w') or die("can't open products.csv file");
fwrite($fh, $CSV);
//provide a download link to the CSV file:
echo "<p><a href='$myFile' target='_blank'>Download .CSV/Excel File!</a></p>";


Okay, there you have it! Now on to the next way of getting products into WooCommerce.

Scraping Products from another Website

If you can't get the provider to provide an Excel sheet or product list of some sort, you may have to resort to scraping thier website. Here is a script I wrote to do just that:

First, the HTML:

<h2>Product Scraper for</h2>
Created By: Ian L. of <a href=""></a>

<hr />

<form action="apmex-scraper5.php" method="post">
<td>Page no.:<select id="pgn" name="pgn"><option>1</option><option>2</option><option>3</option><option>4</option><option>5</option></select></td>

<input id="kwrd" type="text" name="kwrd" value="gold" /></td>
<td>Per/Pg:<select id="per" name="per"><option>60</option><option>5</option><option>10</option><option>20</option><option>30</option><option>40</option><option>50</option></select></td>
<td><input id="sbt" type="submit" name="sbt" value="Scrape Now" /></td>

Then the PHP main page:

//Exports products in given category from into csv file for importing into woocommerce.
$pgno = $_POST['pgn'];//the page no. for get var in the URL to scrape
$keywrd = $_POST['kwrd'];//keyword to scrape search results of scraped site.
$howmany = $_POST['per'];//how many to scrape per page.

//URL String for searching target site:
$u = "";

include 'simple_html_dom.php';

$html = file_get_html($u);//string of HTML for gallery pg
//echo "HTML:".$html;

include 'functions19.php';

//get the short descriptions:
$s = '
<div class="products-item-description">';//'
$e = ' get_short_desc($s,$e,$html);</div>
//get the links to product page:
$s = '<h3 class="products-item-title"><a href="';<br />
$e = '">';
//get the price and mark up 15%:
$s = '<h3 class="heading-blue">Volume Pricing</h3>';//'1 - ';
$e = '<div class="products-item clearfix">';//'</div>';

//now short-desc, link & price all are in arrays, $d, $l and $p

$icnt = count($l);
echo "Processing $icnt items";
//start the csv file contents variable:
$csv = "post_name,featured_image,post_excerpt,sku,stock,post_title,price,category,post_content,product_gallery,product_id".PHP_EOL;

$ii = $i+1;

$de = $d[$i];//short desc
$pr = $p[$i];//price
$li = $l[$i];//link
echo "
<h2>Link text we get title from is:</h2>
//get the title from the link
$tia = explode("/",$li);//title array will have title at key 5
$ti = $tia[5];
//replace - with space in title:
$ti = str_replace('-',' ',$ti);
//capitalize title:
$ti = strtoupper($ti);
echo "Title: $ti

//build cats var for csv file:
//now get the html for breadcrumb/categories:
$html3arr = explode('

<nav class="breadcrumbs">',$html);//key 1 wil have breadcrumbs/cats
$bcchunk = $html3arr[1];//has cats plus, so cut off at</nav>$bccarr = explode('

',$bcchunk);//key 0 is html3 we need for cats!
$html3 = $bccarr[0];//echo "HTML for cats:
$catarr = explode('<a href=""/category/',$html3);//key"> //count cat array:
$ccount = count($catarr);
$countmin = $ccount-1;
$cats="";//variable to hold categories(resets it too)
$capkey = ucwords($keywrd);//captialize first letter of ea. word in keyword
$cats .= $capkey;//adds keyword as first category.</a>

$catcode = $catarr[$k];//has txt b4 and after cat.
$catcodearr = explode('">',$catcode);//key 1 will have cat followed by extra txt
$catplus = $catcodearr[1];//has cat plus txt after.
$catplusarr = explode('',$catplus);//key 0 is just cat!
$cat = $catplusarr[0];//echo "

cat found: $cat

$cat = trim($cat);//trim cat
//see if cat is equal to keyword in all its captialized forms
if($cat == $capkey) {
//if it matches don't add it, so do nothing!
}else{//else its not the keyword that was added first, so add it:
//add a pipe at start since its not first one
$cats .= '|';
$cats .= $cat;
}//end if/else cat is same as keyword in a case insensitive comparison
}//end for loop going over cats.

//get the product description for main content:
$darr = explode('
<h2>Product Description</h2>
',$html2);//key 1 will have description plus some
$desplus = $darr[1];//desc plus txt b4 and after
//if it uses a table, get til

if it uses a ul list get til


$desarr = explode('

',$desplus);//key 0 is desc with some before
$desarr = explode('',$desplus);//key 0 is desc with some before
}//end if/else desc uses table.
$desandb4 = $desarr[0];//desc may have
and before that needs taken off
$descr = explode('
',$desandb4);//key 1 will b just desc!
$desc = $descr[1];
}//end if/else desc has part we don't want at start

//strip HTML tags from desc:
$desc = strip_tags($desc);
//now replace @@@ with br tags:
$desc = str_replace('@@@','

//clear desc of any commas that will confuse csv file:
$desc = str_replace(',','',$desc);

//put the data into a CSV file for importing into woocommerce:
$csv .= $ti.",";//add title for post name to csv file
$csv .= $im.",";//add gallery image url to csv file
$csv .= $de.",";//add long desc to csv file
$csv .= $sk.",";//add sku to csv file
$csv .= "99,";//add stock to csv file
$csv .= $ti.",";//add title for product title to csv file
$csv .= $pr.",";//add price to csv file
$csv .= $cats.",";//add categories to csv file
//add desc/post_content to csv file:
$csv .= $desc.",";//add desc.
//add product pics to csv file last before eol because no comma after them!
$curdir = $_SERVER['SERVER_NAME']."/".basename(__DIR__);//dirname(__FILE__);//getcwd();//gets the directory name this script is run from.
for($j=0;$j<5;$j++){//changed j=1 to j=0 for dgrundel scraper because it includes feature img here
$cpic = $pics[$j];
//if there is another img after this one, add a pipe separator:
$cpic .= "|";
$csv .= $cpic;
//do nothing...
}//end if less then j pics...else...
}//end for loop adding pics to csv file.
$csv .= ",".$pidno;//add product ID to csv file.
$csv .= PHP_EOL;//eol to csv file last
}//end for loop
//write CSV file:
$myFile = "$keywrd-products-r$pgno.csv";
$fh = fopen($myFile, 'w') or die("can't open silver-products.csv file");
fwrite($fh, $csv);
//provide a link to the file to download it:
echo "

Done Creating CSV file!

<a href="$myFile" target="_blank">Download $myFile CSV File!</a>


echo "done!";

Finally the helper_functions.php file:

//written for apmex-scraper5.php

//function to retrieve HTML objects by Ian L. of
function get_stuff($start,$end,$htm){
$extras=false;//make true to see extra debugging data specific to their function.
global $dbugHTML;
global $t;//title
global $d;//desc
global $l;//link
global $pics;//product image urls
//global $filez;//product image file names
// split at start to get chunk we want:
$html_array = explode($start,$htm);
$o_a = explode($end,$objct);//key 0 should be part we need.
$o = $o_a[0];
if($dbugHTML){if($extras){echo "Object $i: $o
//put in title($t) or desc($d) array according to $end var:
//if $end is '' its a desc and if $end=its a title:

//put the link to next pg in l array:
if($end == '">'){$l[]=''.$o;}//fill link array

//put the product images in the pics array:

when running run no. 18, it had 26 pics with functions.php.
when running with functions18.php it had 45 pics. using curl to check imgs.
#the above is not relevent with new dgrundel csv scraper because it will not get images that have ashx in the url so I fixed it by changing the thumb name instead. Made the thumb change in woo-product-importer-ajax.php(dgrunden csv importer plugin file)
if($end == '" title="' && $start == '

//get img try one:
//take out: handlers/ThumbJpeg.ashx?VFilePath=~/
$otry1= str_replace("","",$o);
$otry1= str_replace(" ","%20",$otry1);
$oar = explode("&",$otry1);//now key 0 is just the img url!
$otry1 = $oar[0];

//test to see if image works:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$otry1);
// don't download content
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

echo "<h2>GOT IMG!</h2>";

$pics[]=$otry1;//fill pics urls array
//The img is good, so do nothing...
}else{//else if image doesn't work, try different method of getting img:

echo "<h2>TRY TWO TO GET IMG!</h2>";

echo "Failed try one: $otry1";

//take out: handlers/ThumbJpeg.ashx?VFilePath=~/
//$otry2= str_replace("","",$o);
$otry2 = str_replace(" ","%20",$o);
$oar = explode("&",$otry2);//now key 0 is just the img url!
$otry2 = $oar[0];
$pics[]=$otry2;//fill pics urls array
if($extras){echo "Try two img: $otry2";}
}//end else img didn't work so tried another way.

//took this out because we don't use the filez array it seemed and $oo doesn't exist!:
//$ooo = str_replace('','',$oo);
//$filez[]=$ooo;//file pic file names array
}//end if img...

if($dbugHTML){echo "<hr />";}
}//end get_stuff function.

//function to retrieve price and mark up 15% - by Ian L. of
function get_prices($start,$end,$htm){
global $dbugHTML;
global $p;
$extras=false;//set to true to see extra debuggind info specific to this function.
// split at start to get chunk we want:
$html_array = explode($start,$htm);
//$del = array($start, 'Any Quantity');
// explode by array of delimiters In one swoop:
//$html_array = explode( $del[0], str_replace($del, $del[0], $htm) );

$hcntactual = $hcnt-1;
if($exras){echo "
<h2>Prices code blocks found: $hcntactual(not all may have actual price in them)</h2>
if($hcntactual < $howmany){echo "
<h3 style="color: red;">ERROR! only $hcntactual of $howmany price blocks found...</h3>
$o_a = explode($end,$objct);//key 0 should be part we need.
$o = $o_a[0];
//take out price:
$o_a2 = explode('$',$o);//key 2 should have orig price plus extra txt
$o2 = $o_a2[2];//price with extra text and commas that need removed
//remove all aft:
$o2a = explode(' ',$o2);//key 0 is just price w/commas
$o2 = $o2a[0];//price w/commas
//remove commas from price text and make a number:
$o3 = str_replace(",","",$o2);
if($dbugHTML){echo "$i - Orig. price:$o3";}
$price = $o3 * 0.15;
$price = $price + $o3;
$price = round($price,2);
echo "Mark-up price:";
printf("%.2f", $price);
echo "";}
}//end while.
if($dbugHTML){echo "<hr />";}
}//end get_prices function.

//function to retrieve woocommerce short desc by Ian L. of
function get_short_desc($start,$end,$htm){
global $dbugHTML;
global $d;//short desc

// split at start to get chunk we want:
$html_array = explode($start,$htm);
$o_a = explode($end,$objct);//key 0 should be part we need.
$o = $o_a[0];
//put the short description in the d array:
//take commas out to not confuse the csv values:
$o = str_replace(",","",$o);
//see if has ' ' and take out it and before if so:
$findstr = ' ';
if(strpos($o,$findstr) !== false){
$farr = explode($findstr,$o);//key 1 will be all after findstr.
$o = $farr[1];
}//end if had ''
//trim short desc:
$o = trim($o);
//replace any instances of apmex with Goldecom
$o = str_replace('apmex','Goldecom',$o);
$o = str_replace('Apmex','Goldecom',$o);
$o = str_replace('APMEX','Goldecom',$o);
if($dbugHTML){echo "Short Desc $i: $o";}
$d[]=$o;//fill description array:
}//end while going over html_array
if($dbugHTML){echo "<hr />";}
}//end get_short_desc function.

Important Note:

If you copy and paste code from almost any WordPress blog, including this one, WordPress normally changes single and double quotes(" and ') to fancy ones. You can easily do a find & replace in any text editor to change them back, but if you do not, the script will not function as expected or may not work at all.


Scraping with Curl using Cookies

Okay, I got a much needed lesson in scaping today using curl and cookies. I quickly discovered that you can not get the contents of some webpages without using cookies because some pages use cookies to validate requests for pages or other data.

My mission was to scrape the following URL from

If you knew anything about, you might know that the URL above is for members only, so the goal of this exercise is to make the web server at think that you are a logged in user. Here is how:

So, first I tried this:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);

echo "HTML:<br>$html<hr>";

That returned nothing but a redirect to the login page of So then I read online how to use curl while sending a cookie with the page request and found that this worked:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
// cookies to be sent
curl_setopt($ch, CURLOPT_COOKIE, "PUT_COOKIE_HERE");
$html = curl_exec($ch);

echo "HTML:<br>$html<hr>";

That worked! so I was home free, but more importantly here are the steps I took to get the value to replace PUT_COOKIE_HERE:

  1. First, I signed up for a free membership at
  2. Then I went to the page I wanted to scrape and made a note of the exact URL which was:
  3. So I copied the URL to my clipboard from step two above and pasted it into the Firefox address bar. Don't hit Enter yet.
  4. Then I opened Firebug and allowed the URL to resolve while Firebug was open so I could monitor the HTML headers sent.
  5. Look for a header that reads: "Cookie" and copy the value and paste it in place of PUT_COOKIE_HERE in the code above. Make sure the value is in parentheses.
  6. Now run your code again and you'll get results as long as that's all the webpage is looking for. I have had some that want a referrer set as well which you can also do with curl. Hint, Google curl set referrer for more information.



Learn PHP

Errors with file_get_contents PHP Function

If you get an error on your server that says something like:

http:// wrapper is disabled in the server configuration by allow_url_fopen=0

or otherwise cannot get the PHP file_get_contents function to work, this tutorial is for you.


There are a couple best solutions and here they are:

  1. Set your php.ini settings for your server for allow_url_fopen to 1 or true whichever is the case.
  2. use Curl instead.

which solution you use depends on your situation. If you have access to your servers config via the shell prompt or Cpanel, then you should first use number one above. If you don't have such access and your on a shared server that doesn't allow file_get_contents or similar functions, then you have to replace the functions in the app or webpage that is throwing the error. Here is a typical replacement for file_get_contents  using curl:

$html = file_get_contents('');

echo $html;

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, ‘’);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$html= curl_exec($ch);

// display file
echo $html;

Simple if you are a PHP developer and workable if you are not I guess. Good luck!



Google PageRank

What is Google PageRank

PageRank(AKA PR) is one of the methods Google uses to determine a page's importance and therefore it's position in search engine results and the domain name's value in part. The PR of all web pages changes every month when Google does their re-indexing.

How is PageRank determined by Google?
PageRank is determined solely by incoming Links. So to increase your PageRank, increase the number of links to your site. The quality of links is also a big factor in your site's PR. For example, links coming into your site from a site with a PR of 1 are not as valuable as links coming to your site with a PR of 6. Keep this in mind when going after incoming links to your site and check the PR of the sites where you place links.

Is it better to have a high or a low PageRank?
I used to wonder whether it was better to have a zero or a five. Well now that I understand a little about how it works, it's definitely better to have a higher PR than a lower one.

Want to learn more about SEO? Check out these relative books from Amazon: