Pages: [1]
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-03-29 21:44:12

possibly implement some sort of "page request frequency limiting" (like "flood control" in the forums), so a page can only be requested every 5-10 seconds (or longer), if requested/refreshed before that, show message explaining why the page wasn't reloaded
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9440
Credits: 350,105,499
World-rank: 4,782

2006-03-30 05:05:47
last modified: 2006-03-30 05:06:30

It's a good idea with one flaw: what if two different people try to access the same page (like the first page of BOINC user stats)?

Last week I implemented caching. When a page is first requested, it is stored as a file. When it's requested again, the data doesn't have to retrieved from the database, the server can just serve the stored file.

When a table is updated, its corresponding files are deleted.

This has resulted in lower load on the server, especially during updates.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Guest

2006-04-01 15:02:40

On a related note, would browser caching help even more? I noticed the html code has the line:

<meta http-equiv="CACHE-CONTROL" content="NO-CACHE">

Since the information for the pages is updated at semi-regular times, you could replace the no-cache with an expiration date like in the example from the W3C page which suggests:

<META http-equiv="Expires" content="Tue, 20 Aug 1996 14:25:27 GMT">

For some of the stats addicts, like myself, who check frequently because we don't remember when the stats are updated, it may cut down on the number of requests the server has to handle and therefore the number of pages it has to retrieve data for. Plus it may make page loads a bit faster for the users.

smith

[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9440
Credits: 350,105,499
World-rank: 4,782

2006-04-01 15:17:11

I will do a test with this.

The reason I once added the no-cache to the pages was to prevent that visitors viewed old stats.

Also, the problem (at the moment) is not available bandwidth, BOINCstats 'only' uses about 150GB of the 250GB allowed (per month). The main problem is the load on the database.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Guest

2006-04-01 16:23:47

On second thought, you may not want to implement it. I forgot about the shout box which also would not get updated if the browser caches everything. I don't know of any way to have the browser cache only the stats and not the shout box .... other than what you already did.

My main thinking behind the browser cache was that unless all the pages are kept in RAM, then each request will have some disk access and other overhead. I gather that processor/calculation speed is the main limitation, so it probably won't have much effect, but with all the page hits you get, I was reasoning it might do something?

smith
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-04-05 20:04:29
last modified: 2006-04-05 20:05:32

It's a good idea with one flaw: what if two different people try to access the same page (like the first page of BOINC user stats)?
if it's 2 different people then the page should be served to each, it should only be denied if it's the same person requesting the same content within a set time limit


as for client side caching, it's best to use http headers rather than META tags (so that caches will work better, especially proxies)
also using a set expires date (in the past) is a bad idea
the best thing is to handle conditional GET requests properly, so if the content hasn't changed the server only replies with "HTTP 304 Not Modified" with just the headers, but if the content has changed then it'll return a normal "HTTP 200 OK" with the headers and content

this is the best of both worlds because you always make sure clients have the most recent content, but without the extra load of always returning a HTTP 200, and reducing the load on the server (apache can serve a few hundred 304s simultaneously without a problem)

i can offer advice about how to configure the server to do this
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9440
Credits: 350,105,499
World-rank: 4,782

2006-04-05 20:09:00

I still see a problem with client side cashing: A page on BOINCstats is never the same. There is a clock in the menubar, scheduler status, shoutbox.

But I might be short-sighted on this
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-04-06 15:48:15
last modified: 2006-04-06 15:59:48

[quote] I still see a problem with client side cashing: A page on BOINCstats is never the same. There is a clock in the menubar, scheduler status, shoutbox.

But I might be short-sighted on this
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-04-06 16:21:16
last modified: 2006-04-06 16:25:11

On second thought, you may not want to implement it. I forgot about the shout box which also would not get updated if the browser caches everything. I don't know of any way to have the browser cache only the stats and not the shout box .... other than what you already did.
erm, it's not that clean cut, you can have individual settings for each file/page, or type of content (such as telling a browser it doesn't have to check freshness of images for a long time (because they're not likely to change) by appliying settings to the /images/ folder and have them inherited by all the child objects of the /images/ folder)
but it also depends on the method, there are various types of caching, there are 2 main ones
in the first all content is labeled with values (using http headers) so that clients can determine when it was last modified, and in short, once a client has a local copy (and as long as the server is configured correctly), the client only checks if the content has changed (by sending "conditional" GET requests, ie, telling the server to only resend the content if it has changed from the browsers local copy, hance the server needs to be setup properly, otherwise it won't work, and all content will just be resent all the time), if it has been changed it's resent, if it hasn't, then the server should only send a HTTP 304 code which means "not modified" and the http headers, without any content
beacuse the client is always checking that content is fresh, the user always sees the most recent content (or stats in this case, as pages will change most often compared to images and such)

the other method actively tells the client that the specified content (depending which content this method is applied to) is "fresh" for a user-specified amount of time (there are various ways to do this, some are better than others) this enables the client to not have to keep checking content for freshenss (such as images which are unlikely to change very often at all) and will further reduce load and bandwidth, when one of the "expiration" conditions is met, the next time the content is requested by the user (who wants to view that page, or a page with that content (an image for example) the browser will send a conditional GET request to check if it's still fresh or not, this goes back to the first method, if it's still fresh then the server should send back a HTTP 304 and just the http headers, if not, it'll send a full (normal) HTTP 200 along with the content (page, image, whatever)

the first is best used for content that changes often, or the time it'll change is unknown (so that the client always checks)
the 2nd method is better for more static content, and/or large content, images are a prime example of both, where it's reasonably safe to have clients cache it without having to recheck, but if it does need to be changed, there are ways around the problem, which will make clients download a changed image, so if an image needs to be changed (because it's important, rather than just an improvement, like a new logo or something) then it's still possible to "force" it out, and not have to wait a month or so

My main thinking behind the browser cache was that unless all the pages are kept in RAM, then each request will have some disk access and other overhead. I gather that processor/calculation speed is the main limitation, so it probably won't have much effect, but with all the page hits you get, I was reasoning it might do something?

absolutely agree on this one, i imagine the site is quite busy most of the time, and the reduction is load can be significant, especially as willy has said that CPU performance is the main problem
so this will help squeeze every spare cycle out (from the web front at least)

and incase i've put you off willy, don't worry, most of the calcualtion for what the values should be can be handled quite happily by the server, you just have to specify "parameters" sort of, so that it knows the rules for how to calculate
(such as the "expires" header, it uses an absolute date/time, but if you want your content to be "fresh" for a week, you don't need to change it every week, as long as the server is configured correctly the date value it'll send will be a week (or however long you want) from the time it served the content, but "expires" is old, and there are better methods available these days)
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-04-06 16:37:03
last modified: 2006-04-06 16:38:34

On a related note, would browser caching help even more? I noticed the html code has the line:

<meta http-equiv="CACHE-CONTROL" content="NO-CACHE">

Since the information for the pages is updated at semi-regular times, you could replace the no-cache with an expiration date like in the example from the W3C page which suggests:

<META http-equiv="Expires" content="Tue, 20 Aug 1996 14:25:27 GMT">

For some of the stats addicts, like myself, who check frequently because we don't remember when the stats are updated, it may cut down on the number of requests the server has to handle and therefore the number of pages it has to retrieve data for. Plus it may make page loads a bit faster for the users.

just to clarify, "cache-control" is the better option to use instead of "expires", the spec for http/1.1 (which includes the newer headers like cache-control) is more specific/explicit about meanings of each header than http/1.0 is (which uses the older methods like "expires&quot

also there are conflicts when using an expires date that's in the past, because it means that the content should be considered stale, and redownloaded, and this will happen every time, which is not what you want, you want the client to check if it's local copy is ok, rather than just blindly redownload all the time (always using a full "HTTP 200 OK" + sending the content)
by correctly using using cache-control, as willy has done (although it would be better sent as a HTTP header, rather than HTML meta tag to help proxies, which is easy to do with PHP) then to the client it means "you can save a local copy of this, but you must always check that it's fresh everytime it's needed"
which is the idea solution if i'm correctly assuming the requiremnts due to the use of setting cache-control to "no-cache"
to summarise incase i wasn't clear before: using an expires date in the past to prevent caching will cause a conflict with any headers for checking freshness ("last-modified", "ETag" and such) and generally cause the content to always be redownloaded (preventing any caching at all)

on a side note, i wouldn't take HTTP advice from the W3C, they're focused on HTML and related standards (web content) rather than internetworking and HTTP, which is an IETF standard, keep in mind that the web and the internet are 2 seperate and very different things
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Guest

2006-04-07 01:03:59

Wow! I don't know how interested Willy is in all this stuff, but I certainly am. I dabble a little with a personal webpage and some php coding on the side, and am always interested in learning more. I know a little about sending raw HTTP codes, but have never run across information with the weath of knowledge you seem to have about which is best for a particular application. Do you have any good references you can point to?

smith
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-04-09 02:34:14
last modified: 2006-04-13 15:37:33

Wow! I don't know how interested Willy is in all this stuff, but I certainly am.

well, if he's interested in making a good, efficient site then he should be


I dabble a little with a personal webpage and some php coding on the side, and am always interested in learning more. I know a little about sending raw HTTP codes, but have never run across information with the weath of knowledge you seem to have about which is best for a particular application. Do you have any good references you can point to?

why thankyou, i spent quite a few hours reading about the subject when i was getting to grips with my own site (i believe in doing things properly, and caching falls into that category), but as with most things related to the web, there's no "ultimate guide" it's all bits and pieces of info scattered about

firstly i'll say learn about HTTP headers with static files (HTML, CSS, JavaScript, images) then apply that knowledge to dynamic content (like PHP) because you need to know the basics and ideas behind it all first, to know how to make it work with PHP

the first places i'll point you is the well renound caching article, refered to my many reputable sites

the second is the rather in depth dicussion about ideas to improve the BOINC web code over at SETI (which was specifically about using web caching) there's plenty of info in that thread and you may need to read it a couple of times (i'd be happy to offer advise thou)

i strongly suggest starting with static files first (because the server can do all the calculations for you, but for PHP specifically, i'll admit my knowledge of PHP isn't vast, i've yet to get into server-side scripting, but there are quite a few places with documentation, help and forums available for implementing caching in PHP content
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Guest

2006-04-12 22:30:07

Thanks Lee! There is some good reading in those links you suggested.

smith
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-04-13 15:41:31
last modified: 2006-04-15 23:09:31

Thanks Lee! There is some good reading in those links you suggested.

you're welcome, always glad to help someone who wants to do things the "right way"

there are some handy tools available which you might find useful (i sure did)

HTTP response header viewer - to see what your server returns for a specific request, if you're only interested in the headers then choose a HEAD request (tells the server you only want the headers)

Ethereal - a great (and free, open-source) network packet analyser, so you can see exactly what's going on
if you need help setting it up/tweaking it then just shout, but for the web if you use the capture filter: "port 80" (without the quotes, this is assuming you're not using a proxy, if you are then replace "80" with the port used by your proxy (usually 8080 or 3128 but it's customisable (so can be anything) and the default is dependant on the software package used)) and a display filter of "http.request or http.response" (again, without the quotes) that should cut out a lot of un-needed data by only capturing web traffic and only display the request and response packets

there are other things that will help caching work better, such as following best practices, if you need/want advice about how things "should" be done (from my own experience) then i'd be happy to help with that too, but best place to start is learning good content coding (such as writing valid HTML preferably using the "4.01 strict" version (and i'd stick with HTML untill various issues related to XHTML are sorted out))
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Guest

2006-04-14 03:14:58

Thanks again Lee!

I made my own little PHP script to read the headers so I can at least do that now. Also, based on your email I downloaded ethereal and just got it working. I have to admit I don't understand 90%+ of the stuff in that program, but I have figured out how to filter the output down to just the http data. I'll have to play with it some more, but so far, so good.

As for implementing some of this stuff, I already have a basic understanding of how to send HTTP commands from PHP, and if I gather from the tutorial link you sent, then I can do similar things for static html documents using the .htaccess files. I have run into those briefly already, but guess I'll have to look into them a bit more. Before I do however, is the .htaccess the best way to send the HTTP headers for a static html file, or should I look elsewhere first?

Again, I want to thank you for your help. The links you've given have been great!

smith
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-04-19 16:46:19
last modified: 2006-04-19 16:48:34

Thanks again Lee!

I made my own little PHP script to read the headers so I can at least do that now. Also, based on your email I downloaded ethereal and just got it working. I have to admit I don't understand 90%+ of the stuff in that program, but I have figured out how to filter the output down to just the http data. I'll have to play with it some more, but so far, so good.
you're welcome, if you've got that level of skill you should do fine, it's a doddle once you get your head around the concepts

as for ethreal, reading the manual/documentation is a good place to start, but feel free to ask, i've been using it for a while now so should be able to explain most things (the manual is a bit lacking when it comes to explaining more technical stuff)


I already have a basic understanding of how to send HTTP commands from PHP
good, however, caching isn't as simple as sending an arbitrary HTTP header, it requires knowledge about when the document was last modified, (which you'll have to get from somewhere) otherwise nasty things can happen (generally it won't work, and the server will always send HTTP 200 when it should be sending HTTP 304)


if I gather from the tutorial link you sent, then I can do similar things for static html documents using the .htaccess files. I have run into those briefly already, but guess I'll have to look into them a bit more. Before I do however, is the .htaccess the best way to send the HTTP headers for a static html file, or should I look elsewhere first?
ok, another confession here, my experience so far is with MS IIS, not apache, so i can't say for certain, however if that's how you configure the server to do things differently for different files (something seperate to the general "rules", then yes, that's probably where to do it, but again, i honestly don't know

getting the server itself to work things out is generally the best option with static files (however that's done depending on the server) it's only for things like PHP where it needs to be part of the file

but even with static content, in HTML you can specify "http-equiv" META tags in the <head> of the document, this isn't as good as sending that data as actual HTTP headers (so that proxies read them etc.)

the reason it's better to have the server do things, is because it's more efficient at them than a general PHP parser, the server is designed to be able to perform the calcualtions needed (although you may need additional modules for apache to get the additional functionality)


Again, I want to thank you for your help. The links you've given have been great!
you're most welcome, glad to help

if there's anything you'd like me to inspect/review to check it's working correctly then i'd be happy to do so
because as much as i'd like to, i can't offer a lot of help with PHP, because i haven't learnt it yet, so i don't know how to actually write the code, and get it to interface with a database to get the date info to decide which response to send (200 or 304)
I can only offer advice about the end result, and best-practices/ideals etc.
such as that you need to configure PHP to be able to get the date from somewhere and use that in it's decision for example, but as for how to actually do that, i'm not sure

I worked thru a similar problem Chris Malton was having with his RSS feed (it was always returning a 200) so i could ask him for his PHP source if it would help, and i'm sure he could provide a small explanation of it's basic function (the logic of the code)

however i make no promises with that, as he seems to either be away, or very busy lately (he's most likely revising for his exams at present) so it may be a while before he replies
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Guest

2006-04-20 01:04:28

Hi Lee,


if there's anything you'd like me to inspect/review to check it's working correctly then i'd be happy to do so...

I might take you up on this at some point, although it may be a while as I still have a good bit of reading to do first.

In many ways, it sounds like we are coming from opposite sides of the spectrum. I have a working knowledge of PHP, but had never heard of caching before reading some of your posts. It is good to be able to bounce ideas off others. Thanks.

By the way, if you do ever get into PHP programming, you may find the 'filemtime' and 'header' commands useful. The first allows you to determine the last-modified time for a file, and the later allows you to send raw HTTP commands. I won't claim this is the best way to implement things, but it should at least provide one way to handle dynamic files.

Also, since I am working with an Apache server, I guess I'll have to do a bit more research on my own regarding static files. I think the .htaccess file might be able to take care of them (or at least some of them), but I need to check into it more. Thank you for the general tips and advice. It is all very useful.

smith


wizzszz
BAM!ID: 86
Joined: 2006-05-10
Posts: 205
Credits: 34,259
World-rank: 1,060,539

2006-06-16 09:55:00

Hi all,

the problem with caching is much easier.

I have already written my second http-server, and already posted my thoughts to the shoutbox, where it vanished w/o any comment after a while...

The server sends a Last-modified: token, which should be the time of the last database update (for BOINCstats).
This information will be stored in the clients web cache, and the next time, the client tries to load the page, it will send a If-Last-Modified token with exactly the same date!

All the server got to do now, is to check, whether there was another database update in the meantime...

if an update occured, it will gather the info from the database again, and send a fresh page with a '200 OK' response...

But now the interesting part:
If the If-Last-Modified: token matches the last update time, it will reply with a '304 Not modified' response code, and NO DATA AT ALL!!!
Only the protocol header has to be sent, awfully useful!!!

No need to cache it on the server side, because client already did this!

For the periodly updated information like shoutbox, server time, and so on, a frame could be used which can always be sent...
So reloading the page often for checking shoutbox info or last minor update time will keep up-to-date with time critical information, and the stats database won't be affected until a new update is available!

@guest: if you have questions, please let me know...

@Willy: If you need help with this, just drop me a line, or try it via icq.
Guest

2006-06-19 07:54:48
last modified: 2006-06-19 07:56:45

if it's 2 different people then the page should be served to each, it should only be denied if it's the same person requesting the same content within a set time limit


Hmm, just noticed this. The potential problem could be how one would determne a given user, when the use of the same IP address might not mean the same individual computer. I s'pose now, a user could be associated with a BAM account number, which is now needed for posting (though wasn't back in April, albeit assuming they are registered) People aren't required to have a BAM account ID to browse the site however, only to post. Of consideration would be 3 issues:

- Corporate networks which exist on a private network. The local computers have IPs that exist on a private network which is non-routable (such as in the 192.168.0.0 range). In this case they get out through NAT or PAT (over-loaded NAT) and several individuals could be getting out with the same IP address, though internal to the router, mapped to a different port number on that IP address.

If 2 employees happen to be checking up on something from their work computers (perhaps a company related BOINC team), they could be in 2 seperate rooms, totally unaware of what the other is doing, but still appear to come through the same IP.

- The same sorta thing could end up happening in a home environment, and once again with NAT on a cable modem or DSL router), else through the use of "Internet sharing" in Windows; where a cable modem subscriber might not want to get additional IP addresses for each computer in their house. With multiple computers, it isn't like family members will always keep tabs on one another to see "honey, are you going to that page?" Between husbands and wives, any sorta checking up on each other might not even be appreciated

They are still 2 seperate computers and users howver (as in the corporate environment), even if their computers are abstracted behind some form of address translation...

- Anyone who might be comming through a web proxy for whatever reason. The IP seen, will be that of the proxy server.
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 397,049

2006-06-20 04:31:00
last modified: 2006-06-20 04:35:42

the problem with caching is much easier.....
erm, i already said all that

For the periodly updated information like shoutbox, server time, and so on, a frame could be used which can always be sent...
frames are really bad for accessability. Good efficient code has a significant impact on how useful caching is. First things first!
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Pages: [1]

Index :: Comments and suggestions :: Potential "reloading Is Bad" Solution
Reason: