WordPress < 3.6.1 PHP Object InjectionUpdate WordPress to avoid Remote Code Execution attacks

11 September 2013

After reading a blog post about a “PHP object injection” vulnerability in Joomla, I dug a bit deeper and found Stefan Esser’s slides of the 2010 BlackHat conference, which showed that PHP’s unserialize() function can give rise to vulnerabilities when supplied user-generated content.

So basically, the unserialize() function takes a string that represents a serialized value, and unserializes (hence the name) it to a PHP value. This value can be any type, except the resource type (i.e. integer, double, string, array, boolean, object, NULL). When the function is given a user-generated string, this may result in memory leak vulnerabilities in some (older) PHP versions. However, this will not be the focus of this blog post. If you want to learn more about this, you can refer to the aforementioned BlackHat slides.

Another type of vulnerability that an attacker can exploit when his data is run through the unserialize() function, is “PHP Object Injection”. In this case, object-types are unserialized, allowing the attacker to set all the properties of the object to his choice. When the object’s methods are called, this could have some effect (e.g. removing some file), and as the attacker is able to choose the properties of the object, he might be able to remove a file of his choice.

Let’s examplify this to make it more clear: Imagine that the following class is loaded at the time user-generated content is passed to unserialize().

<?php
class Foo {
    private $bar; 
    public $file;
    
    public function __construct($fileName) {
        $this->bar = 'foobar';
        $this->file = $fileName;
    }
    
    // Some more code here…
    
    public function __toString() {
        return file_get_contents($this->file);
    }
}
?>

If the victim’s code would contain the following line echo unserialize($_GET['in']);, the attacker will be able to read arbitrary files. The attacker could construct his payload with the following code:

<?php
class Foo {
    public $file;
}
$foo = new Foo();
$foo->file = '/etc/passwd';
echo serialize($foo);
?>

Which results in O:3:"Foo":1:{s:4:"file";s:11:"/etc/passwd";}. All the attacker has to do now is to send a GET request to the vulnerable page with his payload. This page will then output the contents of /etc/passwd. Although reading arbitrary files is quite bad, imagine what would happen if file_get_contents would be eval in the above example…

I hope this section has shed some light on the possible dangers of supplying user-generated content to the unserialize() function. Even PHP’s reference manual clearly states that one should not pass user-generated content to the unserialize() function:

Warning

Do not pass untrusted user input to unserialize(). Unserialization can result in code being loaded and executed due to object instantiation and autoloading, and a malicious user may be able to exploit this. Use a safe, standard data interchange format such as JSON (via json_decode() and json_encode()) if you need to pass serialized data to the user.

You shall not pass user-content to userialize()

Now let’s move on to how this affects WordPress.

WordPress vulnerability

In Stefan Esser’s BlackHat presentation, he mentioned that WordPress is a well-known example of an application that makes use of serialize() and unserialize(). In the example of his slides, unserialize() is used on content received from the WordPress website. So when an attacker is able to perform a MitM-attack on the victim’s website, he can modify the response from the WordPress website to include his payload. Interestingly, at the time of writing, even the latest version of WordPress (3.6) contains this vulnerability (aprox. 3 years after the presentation). Imagine what could happen if an attacker were able to hijack the WordPress.org DNS…

However… this is not the only occurrence where WordPress uses unserialize(). It is also used to store certain information in the database. For example, some user metadata is stored serialized in the database. This metadata is retrieved in the wp-includes/meta.php file by the get_metadata()-function defined on line 267. Here’s a little abstract from this function (lines 292-297):

if ( isset($meta_cache[$meta_key]) ) {
    if ( $single )
        return maybe_unserialize( $meta_cache[$meta_key][0] );
    else
        return array_map('maybe_unserialize', $meta_cache[$meta_key]);
}

So basically, what this function does is retrieve metadata (either from posts or users) from the database (respectively the wp_postmeta and wp_usermeta tables). As some content should be serialized while other content should not, the maybe_unserialize() function is called instead of unserialize(). This function is defined in wp-includes/functions.php on lines 230-234.

function maybe_unserialize( $original ) {
    if ( is_serialized( $original ) ) // don't attempt to unserialize data that wasn't serialized going in
        return @unserialize( $original );
    return $original;
}

So what this function does, is check whether the given value is a serialized string and if it is, it is unserialized. The is_serialized() function is defined in the same file, on lines 247-276:

function is_serialized( $data ) {
    // if it isn't a string, it isn't serialized
    if ( ! is_string( $data ) )
        return false;
    $data = trim( $data );
     if ( 'N;' == $data )
        return true;
    $length = strlen( $data );
    if ( $length < 4 )
        return false;
    if ( ':' !== $data[1] )
        return false;
    $lastc = $data[$length-1];
    if ( ';' !== $lastc && '}' !== $lastc )
        return false;
    $token = $data[0];
    switch ( $token ) {
        case 's' :
            if ( '"' !== $data[$length-2] )
                return false;
        case 'a' :
        case 'O' :
            return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data );
        case 'b' :
        case 'i' :
        case 'd' :
            return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data );
    }
    return false;
}

The reason why it is important to note how WordPress checks if a value is a serialized string will become clear soon. First, let’s look at how an attacker could make content he supplies end up in this metadata table. For every user the first name, last name, Yahoo IM, … are stored in the wp_usermeta table. So let’s just add the payload there and pwn WordPress, right?! You can check by setting i:1; as your name, if this is unserialized, it will result in the integer 1. However, if you test this, you will see that the content isn’t unserialized and just returns i:1;, as was entered.

Darn, it’ll take some more to pwn WordPress aparently… Let’s dig deeper why the content isn’t unserialized… In wp-includes/meta.php, the update_metadata() function is defined on lines 101-164. Here’s an abstract of this function:

// …
    $meta_value = wp_unslash($meta_value);
    $meta_value = sanitize_meta( $meta_key, $meta_value, $meta_type );
// …
    $meta_value = maybe_serialize( $meta_value );
    
    $data  = compact( 'meta_value' );
// …
    $wpdb->update( $table, $data, $where );
// …
    

The maybe_serialize() function might explain why our payload didn’t work… Let’s take a closer look at the function defined in wp-includes/functions.php on lines 314-324.

function maybe_serialize( $data ) {
    if ( is_array( $data ) || is_object( $data ) )
        return serialize( $data );

    // Double serialization is required for backward compatibility.
    // See http://core.trac.wordpress.org/ticket/12930
    if ( is_serialized( $data ) )
        return serialize( $data );

    return $data;
}

So when the given value is a serialized string, it will be serialized again. That is indeed what happens. As you can see in the database, i:1; is turned into s:4:"i:1;";, which is deserialized as a string when it is displayed. So what now?

As you might have noticed, this post was also tagged MySQL. Now it’ll become clear why. In order to successfully insert a serialized object, we need the is_serialized() function to return false when a string is insterted, and it should return true after it is retrieved from the database.

As you might know, a MySQL database, table and even the separate columns have their own charset/collation. For WordPress, the default charset is utf8. Contrastinly to the name, this charset actually does not support the full Unicode character set. For more information about this, please refer to following post by Mathias Bynens. This taught me that tables with utf8 as charset can not store astral symbols (whose code points range from U+010000 to U+10FFFF). So what happens if we try to store one of these symbols nonetheless? Apparently, everything after such a symbol is just discarded. So for example, when trying to insert foo𝌆bar, MySQL will discard 𝌆bar and just store foo.

This was the last piece of the puzzle that was needed to inject serialized values which will be unserialized later on. To test this, you can insert i:1;𝌆 as your first name. As you will see, this results in just 1 as value, meaning that the value you supplied was unserialized. If you don’t yet believe me, try entering a serialized empty array with an astral symbol appended: a:0:{}𝌆. This will result in Array.

Let’s recap: maybe_serialized('i:1;𝌆') is inserted to the database. As WordPress does not see this as a serialized string (because it doesn’t end in ; or }), this will result in i:1;𝌆. When inserted, MySQL doesn’t know how to store it properly, and removes the astral symbol 𝌆. Later on, when the value i:1; is retrieved, it will be unserialized as it now has ; as last character which will make is_serialized() return true. Boom. Vulnerability.

WordPress exploit

Now we’ve shown that WordPress contains a PHP Object Injection vulnerability, let’s try to exploit it… So in order to exploit this vulnerability (by injecting objects), we need to find a class that (i) contains a “useful” method that is called, and (ii) is included at the time the object is created.

When an object is unserialized, the __wakeup() function is called. This function is one of PHP’s “magic-methods”. This is one method we are sure of that is called, there could be some more though. I made the following class which logs all function calls to /tmp/func.log.

<?php
class Foo {
    public static function logFuncCall($funcName) {
        $fh = fopen('/tmp/func.log', 'a');
        fwrite($fh, $funcName."\n");
        fclose($fh);
    }
    public function __construct() { Foo::logFuncCall('__construct('.json_encode(func_get_args()).')');}
    public function __destruct() { Foo::logFuncCall('__destruct()');}
    public function __get($name) { Foo::logFuncCall("__get($name)"); return "Foo";}
    public function __set($name, $value) { Foo::logFuncCall("__set($name, value)");} 
    public function __isset($name) { Foo::logFuncCall("__isset($name)"); return true;} 
    public function __unset($name) { Foo::logFuncCall("__unset($name)");} 
    public function __sleep() { Foo::logFuncCall("__sleep()"); return array();} 
    public function __wakeup() { Foo::logFuncCall("__wakeup()");} 
    public function __toString() { Foo::logFuncCall("__toString()"); return "Foo";} 
    public function __invoke($a) { Foo::logFuncCall("__invoke(". json_encode(func_get_args()).")");}
    public function __call($a, $b) { Foo::logFuncCall("__call(". json_encode(func_get_args()).")");}
    public static function __callStatic($a, $b) { Foo::logFuncCall("__callStatic(". json_encode(func_get_args()).")");}
    public static function __set_state($a) { Foo::logFuncCall("__set_state(". json_encode(func_get_args()).")"); return null;}
    public function __clone() { Foo::logFuncCall("__clone()");} 
}
?>

In order to list all the functions that are called, first make sure that the class is included at the time the unserialization happens. You can do this by adding require_once('foo.php') to the top of functions.php. Next, try exploiting the PHP Object Injection by setting your first name to O:3:"Foo":0:{}𝌆. When you save this, and the page is refreshed, you will see that your first name now is Foo, which is exactly what is returned by the __toString() function of the Foo class. Now let’s look at the functions that were called:

$ sort -u /tmp/func.log
__destruct()
__toString()
__wakeup()

That gives us three functions we can work with: __wakeup(), __destruct() and __toString(). “Unfortunately” I was unable to find an occurrence of a WordPress class that was loaded at the time the unserialization happens which could lead to a severe exploitation. Please note that this is not due to the “security” of WordPress, but rather by chance.

So does this mean that WordPress is just vulnerable, but no exploit is possible? Not quite… If you are familiar with WordPress, you might be aware that there is an enormous amount of plugins available. These plugins come with their own classes and thus may introduce what is needed for successfully exploiting this vulnerability. I looked into this, and found that there exists a popular plugin which (when enabled) elevates this vulnerability to Remote Command Execution.

Due to ethical considerations, I will not disclose a PoC of this exploit at this time, as there are too many vulnerable WordPress installations out there.

WordPress fix

The fix by WordPress is in the is_serialized() function, I’ll briefly discuss it here.

function is_serialized( $data, $strict = true ) {
     // if it isn't a string, it isn't serialized
     if ( ! is_string( $data ) )
         return false;
     if ( ':' !== $data[1] )
         return false;
    if ( $strict ) {
        $lastc = $data[ $length - 1 ];
        if ( ';' !== $lastc && '}' !== $lastc )
            return false;
    } else {
        // ensures ; or } exists but is not in the first X chars
        if ( strpos( $data, ';' ) < 3 && strpos( $data, '}' ) < 4 )
            return false;
    }
     $token = $data[0];
     switch ( $token ) {
         case 's' :
            if ( $strict ) {
                if ( '"' !== $data[ $length - 2 ] )
                    return false;
            } elseif ( false === strpos( $data, '"' ) ) {
                 return false;
            }
         case 'a' :
         case 'O' :
             return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data );
         case 'b' :
         case 'i' :
         case 'd' :
            $end = $strict ? '$' : '';
            return (bool) preg_match( "/^{$token}:[0-9.E-]+;$end/", $data );
     }
     return false;
 }

The main difference is that when the $strict parameter is set to false, there are fewer constraints a string needs to be marked serialized. For example, the last character no longer needs to be ; or {, which makes that this fix patches the vulnerability I reported. Now are there any similar issues that could lead to have the same consequences?

As WordPress is still using the unsafe unserialize() function instead of the safer json_decode(), it is now dependant on the regularity of MySQL’s irregular behaviour. The vulnerability I disclosed above made use of the fact that MySQL’s utf8 charset removes all characters that come after an astral symbol. Now what would happen if in a future version, MySQL would remove everything before this character? WordPress would be vulnerable again. Another option that is not unlikely, is that there exists a character that is removed by MySQL when INSERTed. In this case, is_serialized() will return false when trying to insert a string with this character prepended as meta-data. When this string is then retrieved again, it will no longer have the character, and is_serialized() will now return true, which will cause the user-generated string to be unserialized.

Ofcourse, this is pure speculation (as I am not that familiar with MySQL). I shared these concerns with WordPress, and they consulted their MySQL expert, and assured that above scenarios will not happen. The first scenario (where characters are removed before a certain character), will not happen because:

I see no way of this happening. ‘After’ can happen, if you manage to define a partial multi-byte character, as in the original report. ‘Before’ can’t, because MySQL only runs forward through a string when converting it to a character set, never backwards.

As for the second scenario (where a character “disappears”) will not happen because:

MySQL replaces characters it doesn’t recognize (for the given character set), with a placeholder. MySQL will sometimes replace byte sequences with “?” or “�” (U+FFFD). Such replacements would not be harmful.

Timeline

April 3rd: Vulnerability discovered
April 4th: WordPress notified
June 18th: First WordPress fix
June 21st: WordPress 3.5.2 released (fix not included)
August 1st: WordPress 3.6 released (fix not included)
September 6th: Second WordPress fix
September 11th: WordPress 3.6.1 released (fix included)
September 11th: Public disclosure through this blog post

Conclusion

Even though this vulnerability was caused by a single Unicode character, it did have an impact on the core functionality of WordPress (presumably this is why it took them 5 months to fix). As abandoning the use of unserialize() was not an option for them (presumably because of legacy issues), they had to come up with a good algorithm to prevent this vulnerability while remaining compatible with a plethora of plugins and other systems. All in all, I feel a bit more safe hosting my friend’s WP blog, though I did alter the meta-tables to support all Unicode characters:

ALTER TABLE wp_commentmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE wp_postmeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE wp_usermeta CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Finally, I would like to thank the WordPress Security Team for our collaboration on fixing this vulnerability, and for taking my feedback on both fixes into account.