Monday, July 5, 2010

Fetch full posts content of FeedWordPress feeds

Hi :)

I use WordPress and FeedWordPress plugin to create a planet. It's great plugin. Some bloggers don't show full post content on their feeds. If you like to get the full content of posts, you can contact to blogger and ask his/her to enable full content on the feed or continue to read this article.
I create functions to get full content of posts.

Requirement

  • PHP with cURL support (Client URL Library)

  • Permissions to modify theme files

  • Text editor

  • basic php programming skills


Step 1 - Where is post content?
It's easy. just open the web page and see the page source.
For example, open http://zebardast.ir/en/linux-and-unix-bash-shell-aliases/ (Single post with full content) and see the page source.
On the page source you can see the content which is started by below code:
<div  class="postBody">

and ended by :
			</div> 

<div class="postFooter">

* It's not ended only by </div> because there is some divs on post content. So I add some html code after </div> which is unique.

Step 2 - Add started and ended code to `Custom Feed Settings`
Open the wordpress administration panel and go to the `Feed and Update Settings` page. Select the feed from drop down menu (Here `Saeid Zebardast's Blog`).
Add started and ended code to `Custom Feed Settings`:


Step 3 - Fetch full content from source and update post on wordpress
Open functions.php in text editor and add the below codes to the end of it:

<?php
function validLink($link) {
if(preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $link)) {
return true;
} else {
return false;
}
}


/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => false, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "ayy.ir spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);

$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );

$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}


function before ($this, $inthat)
{
return substr($inthat, 0, strpos($inthat, $this));
};

function after ($this, $inthat)
{
if (!is_bool(strpos($inthat, $this)))
return substr($inthat, strpos($inthat,$this)+strlen($this));
};

function multi_between($this, $that, $inthat)
{
$counter = 0;
while ($inthat)
{
$counter++;
$elements[$counter] = before($that, $inthat);
$elements[$counter] = after($this, $elements[$counter]);
$inthat = after($that, $inthat);
}
return $elements;
}

function strbet($inputStr, $delimeterLeft, $delimeterRight, $debug=false) {
$posLeft=strpos($inputStr, $delimeterLeft);

if ( $debug ) {
echo $posLeft;
}

if ( $posLeft===false ) {
if ( $debug ) {
echo "Warning: left delimiter '{$delimeterLeft}' not found";
}
return false;
}
$posLeft+=strlen($delimeterLeft);
$posRight=strpos($inputStr, $delimeterRight, $posLeft);
if ( $posRight===false ) {
if ( $debug ) {
echo "Warning: right delimiter '{$delimeterRight}' not found";
}
return false;
}


if ( $debug ) {
echo $posLeft;
echo $posRight;
}

return substr($inputStr, $posLeft, $posRight-$posLeft);
}

?>

Close functions.php and open single.php in text editor. Add the below codes after `<?php if (have_posts()) : while (have_posts()) : the_post(); ?>`:

<?php
$my_content = get_the_content();
if (is_syndicated()) :

$syndication_permalink = get_post_meta(get_the_ID(),"syndication_permalink", true);
$syndication_source = get_post_meta(get_the_ID(),"syndication_source", true);
$syndication_source_uri = get_post_meta(get_the_ID(),"syndication_source_uri", true);

if (!validLink($syndication_permalink) && validLink($syndication_source_uri)) {
$syndication_permalink = $syndication_source_uri . "/" . $syndication_permalink;
}

$post_updated = get_post_meta(get_the_ID(),"post_updated", true);
if (empty($post_updated) || $post_updated == false) {

$start_content = get_feed_meta('start_content');
$end_content = get_feed_meta('end_content');

if (!empty ($start_content) && !empty($end_content)) {
$result = get_web_page($syndication_permalink);
$my_page = $result['content'];

if (!empty($my_page)) {
$valid_texts = array();
$valid_texts = strbet($my_page, $start_content, $end_content);
if (is_array($valid_texts)) {
$valid_texts = $valid_texts[0];
}

if (!empty($valid_texts)) {
$my_post = array();
$my_post['ID'] = get_the_ID();
$my_post['post_content'] = $valid_texts;
$my_content = $valid_texts;
wp_update_post($my_post);
update_post_meta(get_the_ID(), 'post_updated', true);
}
}
}
}

endif; //is_syndicated()
?>

After it, replace `the_content()` with:
 echo $my_content; 

Close text editor and Upload functions.php and single.php to your theme folder. Now go to the single post and see the full content.
Just try it!

See also
How do I get FeedWordPress to include the full content of posts, instead of just a short summary or excerpt of the text?

External links
WordPress
FeedWordPress (Homepage)
FeedWordPress (WordPress plugin directory)
Client URL Library

Good luck :)