It is currently Thu Mar 28, 2024 5:58 pm



Reply to topic  [ 11 posts ] 
PHP :: using php to open a web page and read data 
Author Message
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post PHP :: using php to open a web page and read data
ok, I can actually get a php page to open a web page and pull data. The problem is that this web page forces a redirect and sets a session cookie, and there I'm stuck. Code so far (minus some incriminating evidence).

[php]
<?
function sendToHost($host,$method,$path,$data) {
// Supply a default method of GET if the one passed was empty
if (empty($method)) { $method = 'GET'; }
$method = strtoupper($method);
$fp = fsockopen($host, 80);
if ($method == 'GET') { $path .= '?' . $data; }

$header = "$method $path HTTP/1.0\r\n";
$header .= "Host: $host\r\n";
$header .= "Content-type: application/x-www-form-urlencoded\r\n";
$header .= "Content-length: " . strlen($data) . "\r\n";
$header .= "Connection: close\r\n\r\n";

echo '<pre>'; echo $header; echo '</pre>';

fputs($fp,$header);
if ($method == 'POST') { fputs($fp, $data); }

while (!feof($fp)) {
$buf[] = fgets($fp,128);
}

fclose($fp);
return $buf;
}
$host = //host omitted to protect the guilty
$path = //path omitted to protect the guilty
$data = //username and password information as if it were passed by the login form

$contents = sendToHost($host,"POST",$path,$data);

echo '<pre>'; print_r($contents); echo '</pre>';
?>
[/php]

ok, that part works. Following is the output thus far:

first the header that I send:
[php]
POST /Default.asp HTTP/1.0
Host: [edited[
Content-type: application/x-www-form-urlencoded
Content-length: 35
Connection: close
[/php]

And the return code as shown via the print_r (for ease of display).
[php]
Array
(
[0] => HTTP/1.1 302 Object moved
[1] => Connection: close
[2] => Date: Wed, 08 Nov 2006 21:37:43 GMT
[3] => Server: Microsoft-IIS/6.0
[4] => X-Powered-By: ASP.NET
[5] => Location: [omitted, but includes a GET var]
[6] => Content-Length: 144
[7] => Content-Type: text/html
[8] => Set-Cookie: ASPSESSIONIDSQASTCAT=EHCBDGCBFNOAINNDGABGIGGP; path=/
[9] => Cache-control: private
[10] =>
[11] =>
[12] =>
Object Moved
This object may be found here.)
[/php]

So what I need to do is open the new path and somehow pass that asp session cookie. How? Notice that the location includes a GET var, so I have to submit a GET instead of POST request. But I don't know how to send a session cookie. If I try to send anything via GET to that page, my crap just hangs. If I send an empty get var to the page, it comes back but doesn't think I'm logged in.

Any ideas? Sorry if I'm being obtuse, but this is for my job.

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Last edited by Satis on Thu Nov 09, 2006 8:38 am, edited 3 times in total.



Wed Nov 08, 2006 3:43 pm
Profile WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
damn....I'm going to try to fix the &nbsp; problem.

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Wed Nov 08, 2006 3:48 pm
Profile WWW
Emperor
User avatar

Joined: Wed Apr 16, 2003 1:25 am
Posts: 2560
Reply with quote
Post 
Back, without Master degree in English.

I am n00b about this, but you can try to set the cookie in your PHP script (with setcookie) and do redirect after that, without cookie.

Other thing that comes to mind is that if the next page expects that cookie in GET, you got to put it in GET somehow. It is possible that you screwed this or some other variable up somehow.

_________________
++


Thu Nov 09, 2006 6:06 am
Profile WWW
Duke
User avatar

Joined: Mon Mar 31, 2003 8:59 am
Posts: 1358
Location: right behind you
Reply with quote
Post 
You should be able to follow the page to the final destination using cURL. cURL is pretty easy to use. There's also a class called snoopy that wraps a nice interface onto cURL. Let me know if you need more detail than that.


Thu Nov 09, 2006 10:22 pm
Profile YIM WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
ok...I'll check it out. What I'm basically trying to do is fake a log in to a web page that shows alot of data, parse it for "last updated" type fields, and then validate that it is, indeed, updating when it should. Without having to do anything. :)

I went all the way to the point of sniffing all the packet traffic... now I wish I could use php to manually control ports and input/output crap. Something tells me that when I finally get to the point of building real apps with c# (or whatever) I'm going to dig being able to plunk around with system level crap. If I could manually build and read packets with php, this thing would be fixed.

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Fri Nov 10, 2006 7:15 am
Profile WWW
Duke
User avatar

Joined: Mon Mar 31, 2003 8:59 am
Posts: 1358
Location: right behind you
Reply with quote
Post 
If the only thing that is stopping you is the redirect, snoopy will probably do what you need. It's probably a bit of overkill, but faster than coding a class yourself.


Fri Nov 10, 2006 10:49 am
Profile YIM WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
nah, got it. Curl did the trick. Basically my problem was processing the session cookie and forcing the redirect. Curl took care of that for me. I'm now able to pull all the data and parse it. I've gotten to the point of breaking the page into an array, finding an example of what I need and processing it. I now have management and development blessing to expand it...

what I'm basically doing, for anyone that cares, is that we provide a stock ticker feed to our customers, but there's no process in place to monitor the feed to make sure it's working. While they build something more robust (a windows service) my script is a stop gap. I basically get to parse out the times, check to make sure the stock market is open (exclude hours, days and holidays the stock market is down) and then verify that the last updated field is within an acceptable paramter. :) On fail it'll shoot off an email.

It's stop gap and not entirely necessary, but it's fun, and it also puts me on management's radar in a positive way. :P

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Fri Nov 10, 2006 4:40 pm
Profile WWW
Duke
User avatar

Joined: Mon Mar 31, 2003 8:59 am
Posts: 1358
Location: right behind you
Reply with quote
Post 
LOL, that sounds a lot like the work I was doing before moving to the Times. IT couldn't do it quickly or cheaply, or even show that it would be what we needed, so I would pound out a bridge solution or a proof of concept in 1/10 the time for virtually nothing.

Good for you man. Even if your entire application is thrown away, you are showing them that you GSD. Those types of people are in short supply.


Mon Nov 13, 2006 11:09 am
Profile YIM WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
hehe.... indeed. :P Now I just want more raises. Anyway, since this is the programming forum, I'm going to post my finished source. This is currently live. I'm cutting out some parts that might be considered sensitive.

[php]
<?
define("StockUpdateFrequency", 60*60);
define("NewsUpdateFrequency", 60*60);
define("WeatherUpdateFrequency", 60*60);
class checkMyWebsite {
private $host = "http://theurl.com";
private $username = 'username';
private $password = 'password';
private $mydataarray;
private $bad_entries;
private $error;
private $emailTo = 'monitoringemail@mycompany.com';

public function __construct(){
$this->checkDate();
$this->checkTime();
$this->checkHolidays();
$this->executeCheck();
}
private function checkDate(){
if(date("l") == 'Sunday' || date("l") == 'Saturday'){
//it's a weekend, exit
die('weekend');
}
}
private function checkTime(){
$opening = mktime(8,30,0);
$closing = mktime(15,0,0);
if(time() < $opening || time() > $closing){
//it's either before 9:30am or after 4:00pm
die("too early or late");
}
}
private function checkHolidays(){
$holidays[] = '2006 November 23';
$holidays[] = '2006 December 25';
$holidays[] = '2007 January 1';
$holidays[] = '2007 January 15';
$holidays[] = '2007 February 19';
$holidays[] = '2007 April 6';
$holidays[] = '2007 May 28';
$holidays[] = '2007 July 4';
$holidays[] = '2007 September 3';
$holidays[] = '2007 November 22';
$holidays[] = '2007 December 25';
foreach($holidays AS $holiday){
if($holiday == date('Y F j')){
//today is a NYSE holiday
die("holiday");
}
}
}
private function executeCheck(){
$data = $this->connect();
$this->populateArray($data);
$errors = $this->getErrors();
if($this->bad_entries){
$this->send_error_report($errors);
}
}
private function connect(){
//set up url and posting data
$post_data = 'username=' .$this->username .'&password=' .$this->password;

//initialize curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $this->host);
curl_setopt($ch, CURLOPT_POST, 1 );
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
//curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');

//execute curl
$store = curl_exec ($ch);

if(!$store){
//catch any errors
$this->error = curl_error($ch);
}
//close connection
curl_close ($ch);
return $store;
}
private function populateArray($store){
//explode results based on tr tags
$InStock = false;
$InNews = false;
$InWeather = false;
$exploded = explode("<tr>",$store);
for($i=0;$i<count($exploded);$i++){
if($InStock){
$tempExploded = explode("\r\n",strip_tags($exploded[$i]));
$this->mydataarray[$i]['type'] = 'Stock';
$this->mydataarray[$i]['symbol'] = $tempExploded[1];
$this->mydataarray[$i]['company'] = $tempExploded[2];
$this->mydataarray[$i]['lastupdate'] = strtotime($tempExploded[3]);
}
if($InNews){
$tempExploded = explode("\r\n",strip_tags($exploded[$i]));
$this->mydataarray[$i]['type'] = 'News';
$this->mydataarray[$i]['name'] = $tempExploded[1];
$this->mydataarray[$i]['lastupdate'] = strtotime($tempExploded[2]);
}
if($InWeather){
$tempExploded = explode("\r\n",strip_tags($exploded[$i]));
$this->mydataarray[$i]['type'] = 'Weather';
$this->mydataarray[$i]['zip'] = $tempExploded[1];
$this->mydataarray[$i]['location'] = $tempExploded[2];
$this->mydataarray[$i]['lastupdate'] = strtotime($tempExploded[3]);
}

//are we in the stock section?
if(strpos($exploded[$i],'Stock Symbol')){
$InStock = true;
}
if($InStock && strpos($exploded[$i],'</table>')){
$InStock = false;
}

//are we in the news section?
if(strpos($exploded[$i],'Description')){
$InNews = true;
}
if($InNews && strpos($exploded[$i],'</table>')){
$InNews = false;
}

//are we in the weather section?
if(strpos($exploded[$i],'Location')){
$InWeather = true;
}
if($InWeather && strpos($exploded[$i],'</table>')){
$InWeather = false;
}
}
}
private function getErrors(){
$stock = false;
$news = false;
$weather = false;
$return = '<table style="width: 100%;">';
foreach($this->mydataarray AS $row){
switch($row['type']){
case 'Stock':
if($row['lastupdate'] + StockUpdateFrequency > time()){ break; }
$this->bad_entries += 1;
if(!$stock){
$return .= '<tr><th style="text-align: left;">Symbol</th><th style="text-align: left;">Name</th><th style="text-align: left;">Last update</th></tr>';
}
$return .= '<tr><td>' .$row['symbol'] .'</td>';
$return .= '<td>' .$row['company'] .'</td>';
$return .= '<td>' .date("g:ia M j",$row['lastupdate']) .'</td></tr>';
$stock = true;
break;
//due to unavoidable update delays, I'm not checking the news any more
// case 'News':
// if($row['lastupdate'] + NewsUpdateFrequency > time()){ break; }
// $this->bad_entries += 1;
// if(!$news){
// $return .= '<tr><th>News Feed</th><th>Last update</th></tr>';
// }
// $return .= '<tr><td colspan=2>' .$row['name'] .'</td>';
// $return .= '<td>' .date("g:ia M j",$row['lastupdate']) .'</td></tr>';
// $news = true;
// break;
case 'Weather':
if($row['lastupdate'] + WeatherUpdateFrequency > time()){ break; }
$this->bad_entries += 1;
if(!$weather){
$return .= '<tr><th style="text-align: left;">Zip</th><th style="text-align: left;">Location</th><th style="text-align: left;">Last update</th></tr>';
}
$return .= '<tr><td>' .$row['zip'] .'</td>';
$return .= '<td>' .$row['location'] .'</td>';
$return .= '<td>' .date("g:ia M j",$row['lastupdate']) .'</td></tr>';
$weather = true;
break;
}
}
$return .= '</table>';
return $return;
}
private function send_error_report($errors){
$body = 'The following item';
if($this->bad_entries > 1){
$body .= 's have';
}
else{
$body .= ' has';
}
$body .= ' not updated within the last ' .(StockUpdateFrequency / 60) .' minutes.<br><br>' .$errors;

//set up smtp settings
ini_set('SMTP','myexchangeserver.mycompany.com');
ini_set('smtp_port','25');

$subject = 'Error With mypage.com Updates (' .$this->bad_entries .')';
$headers = 'from: my.page@company.com' . "\r\n";
$headers .= 'MIME-Version: 1.0' . "\r\n";
$headers .= 'Content-type: text/html; charset=iso-8859-1' . "\r\n";
//send mail
mail($this->emailTo, $subject, $body, $headers);
}
}
$checkMypage = new checkMyWebsite();
?>
[/php]

This requires curl to be installed, but installing curl is easy as hell. Just download it and basically uncomment the curl line in php.ini. The install docs for curl in the php manual are retarded...they talk about recompiling php and all that, but it's not actually true.

Also, I know the code is a bit dirty and could've been handled better, but it's a quick thing I threw together. Given another week of dev time I probably could make it alot more pretty. I probably shouldn't have instantiated the class either, but bleh.

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Mon Nov 13, 2006 3:00 pm
Profile WWW
Duke
User avatar

Joined: Mon Mar 31, 2003 8:59 am
Posts: 1358
Location: right behind you
Reply with quote
Post 
GG. Behold the power of your geek. This might be helpful. It's a bare bones sample class. It assumes $db is PEAR DB. It also assumes you have some kind of error reporting class. The class I wrote takes 3 arguments: public message, debugging message, and an error code (unique to each possible error). It displays errors, but also logs errors to a physical file, etc. (I'll be adding on to it to add error types for system errors vs user errors, and adding email notification or custom system commands for certain error types). Anyway...

When you instantiate the object, you immediately check if it came out OK. Then if you call any method that relies on the object being properly instantiated, it checks to make sure it is OK. Every time you do something, you should check to see if it failed for any reason. This allows you to catch every possible error in the application, and if you are good about using unique numbers for your error codes, you can see exactly where the problem occurred (even in the log file). Any error that occurs is available in the $error object.

[php]
<?php

class foo
{
private $valid;
private $bar;

public function __construct($id)
{
global $db;
global $error;

$this -> valid = true;

$result = $db -> query('SELECT * FROM table WHERE id = ?', array($id));
if( db :: isError($result) )
{
$error -> add('Your selection was not valid', 'foo ID was not found in database', 1);
$this -> valid = false;
}
else
{
$row = $result -> fetchOne();
$this -> bar = $row['bar'];
}
}

public function is_valid()
{
return $this -> valid;
}

public function get_bar()
{
global $db;
global $error;

if( !$this -> is_valid() )
{
$error -> add('Some public message', 'Method called for an invalid object', 2);
return false;
}

return $this -> bar;
}

//non-instantiated method
public function do_something()
{
return 'boo';
}
}


$foo = new foo($_GET['id']);
if( !$foo -> is_valid() )
{
echo( $error -> get_public_message() );
}
else
{
if( !$bar = $foo -> get_bar() )
{
echo($error -> get_public_message() );
}
else
{
echo($bar);
}
}

echo foo :: do_something();

?>
[/php]


Mon Nov 13, 2006 9:22 pm
Profile YIM WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
hmm..that's not a bad idea. All my playing with c# lately has given me alot of respect for extending classes, too. You can do some really neat stuff. I already have a generic user authentication class I built... I'm thinking in future apps I may just use it to extend my existing class (or vice versa).

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Tue Nov 14, 2006 7:55 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 11 posts ] 

Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware.