Help - Search - Members - Calendar
Full Version: Page Data Grab Help, Please
Invision Power Services > Community Forums > Community Web Design and Coding
Anachronos.
Hi everyone,

I assist in running a helpsite for the popular online Java game, www.runescape.com. What brings me here to day is, troubles with looking up data from the highscores for a thing called stats signatures (look up the user's stats, put them on a dynamic image, and save it so they can use the image in their forum signature). Currently to look up data I have to create a new connection to their highscores page (using fsockopen()), write to their page, or rather post to their page, the user's name that I want to lookup, get the data I want, and close it. However, this is for just ONE skill - there are around 21 plus a few other items I grab, such as total level/experience. Aside from massive CPU usage every time it has to do this, the loading time is well over the 8 seconds people are willing to wait for a pageload - it's around a minute. Generating the image takes half a second, so I don't need help with that. I'm quite well with image generation.

Any assistance would be grately appreciated... The following is my lookup code and an example of how I fill the fields.
CODE
<?php
function hsc($finalrsname, $skill, $item)
{
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
$skill = strtolower($skill);
$fp = fsockopen("hiscore-web.runescape.com", 80, $errno, $errstr, 30);
if (!$fp) {
   die("$errstr ($errno)<br />\n");
} else {
stream_set_timeout($fp,1);
$out = "POST /lang/en/aff/runescape/hiscorepersonal.ws HTTP/1.1\r\n";
$out .= "Host: hiscore-web.runescape.com\r\n";
$out .= "From: admin@runemasters.net\r\n";
$out .= "Content-length: ".(strlen(rawurlencode($finalrsname))+21)."\r\n";
$out .= "Content-Type: application/x-www-form-urlencoded\r\n";
$out .= "User-Agent: RuneSignatures/1.0\r\n";
$out .= "Connection: Close\r\n\r\n";
$out .= "submit=Compare&user1=".rawurlencode($finalrsname);

   fwrite($fp, $out);
   $page = "";
   $page .= stream_get_contents($fp);
   fclose($fp);
}
$page = explode('<tr><td align="right"><b>Skill</b></td><td>&nbsp;</td><td align="right"><b>Rank</b></td><td align="right"><b>Level</b></td><td align="right"><b>XP</b></td> </tr>',$page);
$page = explode('</body>',$page[1]);
$page = strip_tags($page[0]);
$page = str_replace(array(",","This webpage and its contents are copyright 1999 - 2006 Jagex Ltd. To use our service you must agree to our Terms+Conditions + Privacy policy"),"",$page);
$page = explode(" ",$page);
for($a = 0;$a <= 88; $a = $a + 4)
{

$hiscores[strtolower($page[$a])] = array("rank" => $page[$a+1], "level" => $page[$a+2], "xp" => $page[$a+3]);
}

   return($hiscores[$skill][$item]);
}
?>


And this is an example of code I use for filling out fields:

CODE
// Load the GetXP script.
require("getxp.php");

// SKILL ONE: Hitpoints
$skill = Hitpoints;
$item = level;
$HP = hsc($finalrsname, $skill, $item);
if(empty($HP)) {
print "<tr><td width='50%'>HP: </td><td width='50%'><input name='hp' type='text' size='2' maxlength='2' /></td>
</tr>";
}
else
{
if($HP == Ranked) {
print "<tr><td width='50%'>HP: </td><td width='50%'><input name='hp' type='text' size='2' maxlength='2' /></td>
</tr>";
} else {
print "<tr>
<td width='50%'>HP: </td><td width='50%'><input name='hp' type='text' size='2' maxlength='2' READONLY value='$HP' /></td>
</tr>";
}
}
Anachronos.
As you can see, I made the whole script a function... So I use the function for every skill, and the function includes all the connect/grab/disconnect code. I've tried seperating them so it only outputs the skill I want and will output all of them in one connection, but it doesn't work. Won't connect.
Digi
If some of it is fake how do you expect us to completely understand what you are doing? If you are that worried about losing your source don't post it.
Anachronos.
Alright, here it is, with the fake parts removed. I still won't post the entirety of my other script because it's a tad massive.

CODE
p
function hsc($finalrsname, $skill, $item)
{
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
$skill = strtolower($skill);
$fp = fsockopen("hiscore-web.runescape.com", 80, $errno, $errstr, 30);
if (!$fp) {
   die("$errstr ($errno)<br />\n");
} else {
stream_set_timeout($fp,1);
$out = "POST /lang/en/aff/runescape/hiscorepersonal.ws HTTP/1.1\r\n";
$out .= "Host: hiscore-web.runescape.com\r\n";
$out .= "From: admin@runemasters.net\r\n";
$out .= "Content-length: ".(strlen(rawurlencode($finalrsname))+21)."\r\n";
$out .= "Content-Type: application/x-www-form-urlencoded\r\n";
$out .= "User-Agent: RuneSignatures/1.0\r\n";
$out .= "Connection: Close\r\n\r\n";
$out .= "submit=Compare&user1=".rawurlencode($finalrsname);

   fwrite($fp, $out);
   $page = "";
   $page .= stream_get_contents($fp);
   fclose($fp);
}
$page = explode('<tr><td align="right"><b>Skill</b></td><td>&nbsp;</td><td align="right"><b>Rank</b></td><td align="right"><b>Level</b></td><td align="right"><b>XP</b></td> </tr>',$page);
$page = explode('</body>',$page[1]);
$page = strip_tags($page[0]);
$page = str_replace(array(",","This webpage and its contents are copyright 1999 - 2006 Jagex Ltd. To use our service you must agree to our Terms+Conditions + Privacy policy"),"",$page);
$page = explode(" ",$page);
for($a = 0;$a <= 88; $a = $a + 4)
{

$hiscores[strtolower($page[$a])] = array("rank" => $page[$a+1], "level" => $page[$a+2], "xp" => $page[$a+3]);
}

   return($hiscores[$skill][$item]);
}
?>


It also makes use of a variable that is later defined/redefined. That variable is $skill, which is defined as any of the skills I wish to look up.

Basically, my question is, how am I to get this to work without opening a connection and closing it twenty-one times?
Digi
Does your server have CURL installed? CURL is much faster than what you are doing and you could simply use the full url rather than parsing and sending all the header data like you are now. The page would be save the same way (kinda). I would have to go look at that site later to get a definitive idea on how to improve on it though.
Anachronos.
I'm not entirely sure if it does. I'll check and get back to you.

Aye, it is installed. I'll be reading up on their documents on the cURL site, but thanks for the pointer!
Digi
Yeah np. I'll look @ Runescape later tonight and see what I can think up as far as making it faster...
Anachronos.
I don't know if I'm just slow today or what, but I'm beside myself trying to get it to work. Anyone who knows how to use cURL and would help me would be much appreciated :S
Digi
From what I saw on that site (was just able to see it) you are just going to have to request each page and get the data. CURL should still be a lot faster than fsockopen though.

As far as using curl it goes something like this:
CODE
<?php
   function vGetPageInfo( $sHTMLpage ) {
     $sh = curl_init( $sHTMLpage );
     curl_exec  ( $sh );
     $sAverageSpeedDownload = curl_getInfo( $sh, CURLINFO_SPEED_DOWNLOAD );
     $sAverageSpeedUpload  = curl_getInfo( $sh, CURLINFO_SPEED_UPLOAD );
     echo '<pre>';
     echo 'Average speed download == ' . $sAverageSpeedDownload . '<br>';
     echo 'Average Speed upload    == ' . $sAverageSpeedUpload  . '<br>';
     echo '<br>';
     $aCURLinfo = curl_getInfo( $sh );
     print_r( $aCURLinfo );
     echo '</pre>';
     curl_close(  $sh );
   }
?>


As you can see, simply pass the url of the page you wish to get info from to curl_init() and run curl_exec(). curl_getInfo() contains all of the page data returned. You can begin your parsing as before after you get this. Additionally there are some parameters that you should look into at the link below that you may need for some reason.

http://us3.php.net/manual/en/function.curl-setopt.php
nitr021
Here is my fix:
http://forum.icefuzion.net/index.php?showtopic=9373
Digi
Go you! biggrin.gif
nitr021
QUOTE
Go you!


Sarcasm?
Digi
no biggrin.gif
nitr021
QUOTE(Digital-NW @ Apr 19 2006, 03:44 PM) *
no biggrin.gif

Ok ty then, sorry i get that a lot nowadays
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.