SQUID Frequently Asked Questions Duane Wessels, wessels@ircache.net Frequently Asked Questions (with answers!) about the Squid Internet Object Cache software. ______________________________________________________________________ Table of Contents: 1. About Squid, this FAQ, and other Squid information resources 1.1 What is Squid? 1.2 What is Internet object caching? 1.3 Why is it called Squid? 1.4 What is the latest version of Squid? 1.5 Who is responsible for Squid? 1.6 Where can I get Squid? 1.7 What Operating Systems does Squid support? 1.8 Does Squid run on Windows NT? 1.9 What Squid mailing lists are available? 1.10 I can't figure out how to unsubscribe from your mailing list. 1.11 What Squid web pages are available? 1.12 Does Squid support SSL? 1.13 What's the legal status of Squid? 1.14 Is Squid year-2000 compliant? 1.15 Can I pay someone for Squid support? 1.16 Squid FAQ contributors 1.17 About This Document 1.17.1 Want to contribute? Please write in SGML... 2. Getting and Compiling Squid 2.1 Which file do I download to get Squid? 2.2 How do I compile Squid? 2.3 What kind of compiler do I need? 2.4 Do you have pre-compiled binaries available? 2.5 How do I apply a patch or a diff? 2.6 configure options 2.7 undefined reference to __inet_ntoa 2.8 How can I get true DNS TTL info into Squid's IP cache? 2.9 My platform is BSD/OS or BSDI and I can't compile Squid 2.10 Problems compiling libmiscutil.a on Solaris 2.11 I have problems compiling Squid on Platform Foo. 2.12 I see a lot warnings while compiling Squid. 2.13 Building Squid on OS/2 3. Installing and Running Squid 3.1 How big of a system do I need to run Squid? 3.2 How do I install Squid? 3.3 What does the squid.conf file do? 3.4 Do you have a squid.conf example? 3.5 How do I start Squid? 3.5.1 Squid version 2.X 3.5.2 Squid version 1.1.X 3.6 How do I start Squid automatically when the system boots? 3.6.1 Squid Version 2.X 3.6.2 Squid Version 1.1.X 3.6.2.1 From inittab 3.6.2.2 From rc.local 3.6.2.3 From init.d 3.7 How do I tell if Squid is running? 3.8 squid command line options 3.9 How do I see how Squid works? 4. Configuration issues 4.1 How do I join a cache hierarchy? 4.2 How do I join NLANR's cache hierarchy? 4.3 Why should I want to join NLANR's cache hierarchy? 4.4 How do I register my cache with NLANR's registration service? 4.5 How do I find other caches close to me and arrange parent/child/sibling relationships with them? 4.6 My cache registration is not appearing in the Tracker database. 4.7 What is the httpd-accelerator mode? 4.8 How do I configure Squid to work behind a firewall? 4.9 How do I configure Squid forward all requests to another proxy? 4.10 I have dnsserver processes that aren't being used, should I lower the number in squid.conf ? 4.11 My dnsserver average/median service time seems high, how can I reduce it? 4.12 How can I easily change the default HTTP port? 4.13 Is it possible to control how big each cache_dir is? 4.14 What cache_dir size should I use? 4.15 I'm adding a new cache_dir . Will I lose my cache? 4.16 Squid and http-gw from the TIS toolkit. 4.16.1 Firewall configuration: 4.16.2 Squid configuration: 4.17 What is ``HTTP_X_FORWARDED_FOR''? Why does squid provide it to WWW servers, and how can I stop it? 4.18 Can Squid anonymize HTTP requests? 4.18.1 Squid 2.2 4.18.2 Squid 2.1 and Earlier 5. Communication between browsers and Squid 5.1 Netscape manual configuration 5.2 Netscape automatic configuration 5.3 Lynx and Mosaic configuration 5.4 Redundant Auto-Proxy-Configuration 5.5 Microsoft Internet Explorer configuration 5.6 Netmanage Internet Chameleon WebSurfer configuration 5.7 Opera 2.12 proxy configuration 5.8 How do I tell Squid to use a specific username for FTP urls? 5.9 Configuring Browsers for WPAD 6. Squid Log Files 6.1 squid.out 6.2 cache.log 6.3 useragent.log 6.4 store.log 6.5 hierarchy.log 6.6 access.log 6.6.1 The common log file format 6.6.2 The native log file format 6.6.3 access.log native format in detail 6.7 Squid result codes 6.8 HTTP status codes 6.9 Request methods 6.10 Hierarchy Codes 6.11 cache/log (Squid-1.x) 6.12 swap.state (Squid-2.x) 6.13 Which log files can I delete safely? 6.14 How can I disable Squid's log files? 6.15 My log files get very big! 6.16 Managing log files 6.17 Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often? 6.18 What does ERR_LIFETIME_EXP mean? 6.19 Retrieving ``lost'' files from the cache 7. Operational issues 7.1 How do I see system level Squid statistics? 7.2 How can I find the biggest objects in my cache? 7.3 I want to restart Squid with a clean cache 7.4 How can I proxy/cache Real Audio? 7.5 How can I purge an object from my cache? 7.6 Using ICMP to Measure the Network 7.6.1 Supporting ICMP in your Squid cache 7.6.2 Utilizing your parents database 7.6.3 Inspecting the database 7.7 Why are so few requests logged as TCP_IMS_MISS? 7.8 How can I make Squid NOT cache some servers or URLs? 7.9 How can I delete and recreate a cache directory? 7.10 Why can't I run Squid as root? 7.11 Can you tell me a good way to upgrade Squid with minimal downtime? 7.12 Can Squid listen on more than one HTTP port? 8. Memory 8.1 Why does Squid use so much memory!? 8.2 How can I tell how much memory my Squid process is using? 8.3 My Squid process grows without bounds. 8.4 I set cache_mem to XX, but the process grows beyond that! 8.5 How do I analyze memory usage from the cache manger output? 8.6 The ``Total memory accounted'' value is less than the size of my Squid process. 8.7 xmalloc: Unable to allocate 4096 bytes! 8.7.1 BSD/OS 8.7.2 FreeBSD (2.2.X) 8.7.3 OSF, Digital Unix 8.8 fork: (12) Cannot allocate memory 8.9 What can I do to reduce Squid's memory usage? 8.10 Using an alternate malloc library. 8.10.1 Using GNU malloc 8.10.2 dlmalloc 9. The Cache Manager 9.1 What is the cache manager? 9.2 How do you set it up? 9.3 Cache manager configuration for CERN httpd 3.0 9.4 Cache manager configuration for Apache 9.5 Cache manager ACLs in squid.conf 9.6 Why does it say I need a password and a URL? 9.7 I want to shutdown the cache remotely. What's the password? 9.8 How do I make the cache host default to my cache? 9.9 What's the difference between Squid TCP connections and Squid UDP connections? 9.10 It says the storage expiration will happen in 1970! 9.11 What do the Meta Data entries mean? 9.12 In the utilization section, what is Other ? 9.13 In the utilization section, why is the Transfer KB/sec column always zero? 9.14 In the utilization section, what is the Object Count ? 9.15 In the utilization section, what is the Max/Current/Min KB ? 9.16 What is the I/O section about? 9.17 What is the Objects section for? 9.18 What is the VM Objects section for? 9.19 What does AVG RTT mean? 9.20 In the IP cache section, what's the difference between a hit, a negative hit and a miss? 9.21 What do the IP cache contents mean anyway? 9.22 What is the fqdncache and how is it different from the ipcache? 9.23 What does ``Page faults with physical i/o: 4897'' mean? 9.23.1 Ok, so what is unusually high? 9.24 What does the IGNORED field mean in the 'cache server list'? 10. Access Controls 10.1 How do I implement an ACL ban list? 10.2 How do I block specific users or groups from accessing my cache? 10.2.1 Ident 10.2.2 Proxy Authentication 10.3 Do you have a CGI program which lets users change their own proxy passwords? 10.4 Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf? 10.5 Common Mistakes 10.5.1 And/Or logic 10.5.2 allow/deny mixups 10.5.3 Differences between src and srcdomain ACL types. 10.6 I set up my access controls, but they don't work! why? 10.7 Proxy-authentication and neighbor caches 10.8 Is there an easy way of banning all Destination addresses except one? 10.9 Does anyone have a ban list of porn sites and such? 10.10 Squid doesn't match my subdomains 10.11 Why does Squid deny some port numbers? 10.12 Does Squid support the use of a database such as mySQL for storing the ACL list? 10.13 How can I allow a single address to access a specific URL? 10.14 How can I allow some clients to use the cache at specific times? 10.15 Problems with IP ACL's that have complicated netmasks 11. Troubleshooting 11.1 Why am I getting ``Proxy Access Denied?'' 11.2 I can't get local_domain to work; Squid is caching the objects from my local servers. 11.3 I get Connection Refused when the cache tries to retrieve an object located on a sibling, even though the sibling thinks it delivered the object to my cache. 11.4 Running out of filedescriptors 11.4.1 Linux 11.4.2 Solaris 11.4.3 IRIX 11.4.4 FreeBSD 11.4.5 General BSD 11.4.6 Reconfigure afterwards 11.5 What are these strange lines about removing objects? 11.6 Can I change a Windows NT FTP server to list directories in Unix format? 11.7 Why am I getting ``Ignoring MISS from non-peer x.x.x.x?'' 11.8 DNS lookups for domain names with underscores (_) always fail. 11.9 Why does Squid say: ``Illegal character in hostname; underscores are not allowed?' 11.10 Why am I getting access denied from a sibling cache? 11.11 Cannot bind socket FD NN to *:8080 (125) Address already in use 11.12 icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe 11.13 icpDetectClientClose: FD 135, 255 unexpected bytes 11.14 How come Squid doesn't work with NTLM Authorization. 11.15 The default parent option isn't working! 11.16 ``Hot Mail'' complains about: Intrusion Logged. Access denied. 11.17 My Squid becomes very slow after it has been running for some time. 11.18 WARNING: Failed to start 'dnsserver' 11.19 Sending in Squid bug reports 11.19.1 crashes and core dumps 11.20 Debugging Squid 11.21 FATAL: ipcache_init: DNS name lookup tests failed 11.22 FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied 11.23 FATAL: Cannot open HTTP Port 11.24 FATAL: All redirectors have exited! 11.25 FATAL: file_map_allocate: Exceeded filemap limit 11.26 FATAL: You've run out of swap file numbers. 11.27 I am using up over 95% of the filemap bits?!! 11.28 FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied 11.29 When using a username and password, I can not access some files. 11.30 pingerOpen: icmp_sock: (13) Permission denied 11.31 What is a forwarding loop? 11.32 accept failure: (71) Protocol error 11.33 storeSwapInFileOpened: ... Size mismatch 11.34 Why do I get fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' 11.35 Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html 11.36 I get a lot of ``URI has whitespace'' error messages in my cache log, what should I do? 11.37 commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address 11.38 Unknown cache_dir type '/var/squid/cache' 11.39 unrecognized: 'cache_dns_program /usr/local/squid/bin/dnsserver' 12. How does Squid work? 12.1 What are cachable objects? 12.2 What is the ICP protocol? 12.3 What is the dnsserver ? 12.4 What is the ftpget program for? 12.5 FTP PUT's don't work! 12.6 What is a cache hierarchy? What are parents and siblings? 12.7 What is the Squid cache resolution algorithm? 12.8 What features are Squid developers currently working on? 12.9 Tell me more about Internet traffic workloads 12.10 What are the tradeoffs of caching with the NLANR cache system? 12.11 Where can I find out more about firewalls? 12.12 What is the ``Storage LRU Expiration Age?'' 12.13 What is ``Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes''? 12.14 Does squid periodically re-read its configuration file? 12.15 How does unlinkd work? 12.16 What is an icon URL? 12.17 Can I make my regular FTP clients use a Squid cache? 12.18 Why is the select loop average time so high? 12.19 How does Squid deal with Cookies? 12.20 How does Squid decide when to refresh a cached object? 12.20.1 Squid-1.1 and Squid-1.NOVM algorithm 12.20.2 Squid-2 algorithm 12.21 What exactly is a deferred read ? 12.22 Why is my cache's inbound traffic equal to the outbound traffic? 12.23 How come some objects do not get cached? 12.24 What does keep-alive ratio mean? 12.25 How does Squid's cache replacement algorithm work? 12.25.1 Squid 1.1 12.25.2 Squid 2 12.26 What are private and public keys? 12.27 What is FORW_VIA_DB for? 12.28 Does Squid send packets to port 7 (echo)? If so, why? 12.29 What does ``WARNING: Reply from unknown nameserver [a.b.c.d]'' mean? 12.30 How does Squid distribute cache files among the available directories? 12.31 Why do I see negative byte hit ratio? 12.32 What does ``Disabling use of private keys'' mean? 12.33 What is a half-closed filedescriptor? 12.34 What does --enable-heap-replacement do? 12.35 Why is actual filesystem space used greater than what Squid thinks? 12.36 How do positive_dns_ttl and negative_dns_ttl work? 13. Multicast 13.1 What is Multicast? 13.2 How do I know if I'm on the Mbone? 13.3 Should I be using Multicast ICP? 13.4 How do I configure Squid to send Multicast ICP queries? 13.5 How do I know what Multicast TTL to use? 13.6 How do I configure Squid to receive and respond to Multicast ICP? 14. System-Dependent Weirdnesses 14.1 Solaris 14.1.1 select() 14.1.2 malloc 14.1.3 DNS lookups and nscd 14.1.4 DNS lookups and /etc/nsswitch.conf 14.1.5 DNS lookups and NIS 14.1.6 Tuning 14.1.7 disk write error: (28) No space left on device 14.1.8 Solaris X86 and IPFilter 14.1.9 Changing the directory lookup cache size 14.1.10 The priority_paging algorithm 14.2 FreeBSD 14.2.1 T/TCP bugs 14.2.2 mbuf size 14.2.3 Dealing with NIS 14.2.4 FreeBSD 3.3: The lo0 (loop-back) device is not configured on startup 14.2.5 FreeBSD 3.x or newer: Speed up disk writes using Softupdates 14.3 OSF1/3.2 14.4 BSD/OS 14.4.1 gcc/yacc 14.4.2 process priority 14.5 Linux 14.5.1 Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address 14.5.2 FATAL: Don't run Squid as root, set 'cache_effective_user'! 14.5.3 Large ACL lists make Squid slow 14.5.4 gethostbyname() leaks memory in RedHat 6.0 with glibc 2.1.1. 14.5.5 assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1' on Alpha system. 14.5.6 tools.c:605: storage size of `rl' isn't known 14.6 HP-UX 14.6.1 StatHist.c:74: failed assertion `statHistBin(H, min) == 0' 14.7 IRIX 14.7.1 dnsserver always returns 255.255.255.255 14.8 SCO-UNIX 15. Redirectors 15.1 What is a redirector? 15.2 Why use a redirector? 15.3 How does it work? 15.4 Can I use the redirector to return HTTP redirect messages? 15.5 FATAL: All redirectors have exited! 16. Cache Digests 16.1 What is a Cache Digest? 16.2 How and why are they used? 16.3 What is the theory behind Cache Digests? 16.3.1 Adding a Key 16.3.2 Querying a Key 16.3.3 Deleting a Key 16.4 How is the size of the Cache Digest in Squid determined? 16.5 What hash functions (and how many of them) does Squid use? 16.6 How are objects added to the Cache Digest in Squid? 16.7 Does Squid support deletions in Cache Digests? What are diffs/deltas? 16.8 When and how often is the local digest built? 16.9 How are Cache Digests transferred between peers? 16.10 How and where are Cache Digests stored? 16.10.1 Cache Digest built locally 16.10.2 Cache Digest fetched from peer 16.11 How are the Cache Digest statistics in the Cache Manager to be interpreted? 16.12 What are False Hits and how should they be handled? 16.13 How can Cache Digest related activity be traced/debugged? 16.13.1 Enabling Cache Digests 16.13.2 What do the access.log entries look like? 16.13.3 What does a False Hit look like? 16.13.4 How is the cause of a False Hit determined? 16.13.5 Use The Source 16.14 What about ICP? 16.15 Is there a Cache Digest Specification? 16.16 Would it be possible to stagger the timings when cache_digests are retrieved from peers? 17. Transparent Caching/Proxying 17.1 Transparent caching for Solaris, SunOS, and BSD systems 17.1.1 Install IP Filter 17.1.2 Configure ipnat 17.1.3 Configure Squid 17.1.3.1 Squid-2 17.1.3.2 Squid-1.1 17.2 Transparent caching with Linux 17.3 Transparent caching with Cisco routers 17.3.1 possible bugs 17.4 Transparent caching with LINUX 2.0.29 and CISCO IOS 11.1 17.5 The cache is trying to connect to itself... 17.6 Transparent caching with FreeBSD 17.7 Transparent caching with Linux and ipchains 17.8 Transparent caching with ACC Tigris digital access server 17.9 ``Connection reset by peer'' and Cisco policy routing 17.10 WCCP - Web Cache Coordination Protocol 17.10.1 Does Squid support WCCP? 17.10.2 Configuring your Router 17.10.2.1 IOS Version 11.x 17.10.2.2 IOS Version 12.x 17.10.3 Configuring FreeBSD 17.10.4 Configuring Linux 2.2 17.10.4.1 Standard Linux GRE Tunnel 17.10.4.2 WCCP Specific Module 17.10.4.3 Common Steps 17.10.5 Configuring Others 17.11 Can someone tell me what version of cisco IOS WCCP is added in? 17.12 Transparent caching with Foundry L4 switches 18. SNMP 18.1 Does Squid support SNMP? 18.2 Enabling SNMP in Squid 18.3 Configuring Squid 2.2 18.4 Configuring Squid 2.1 18.5 How can I query the Squid SNMP Agent 18.6 What can I use SNMP and Squid for? 18.7 How can I use SNMP with Squid? 18.7.1 MRTG 18.8 Where can I get more information/discussion about Squid and SNMP? 19. Squid version 2 19.1 What are the new features? 19.2 How do I configure 'ssl_proxy' now? 19.3 Logfile rotation doesn't work with Async I/O 19.4 Adding a new cache disk 19.5 Squid 2 performs badly on Linux 19.6 How do I configure proxy authentication with Squid-2? 19.7 Why does proxy-auth reject all users with Squid-2.2? 19.8 Delay Pools 19.8.1 How can I limit Squid's total bandwidth to, say, 512 Kbps? 19.8.2 How to limit a single connection to 128 Kbps? 19.8.3 How do you personally use delay pools? 19.8.4 Where else can I find out about delay pools? 19.9 Can I preserve my cache when upgrading from 1.1 to 2? 19.10 Customizable Error Messages 19.11 My squid.conf from version 1.1 doesn't work! 20. httpd-accelerator mode 20.1 What is the httpd-accelerator mode? 20.2 How do I set it up? 20.3 When using an httpd-accelerator, the port number for redirects is wrong 21. Related Software 21.1 Clients 21.1.1 Wget 21.1.2 echoping 21.2 Logfile Analysis 21.2.1 SQUIDTIMES 21.3 Configuration Tools 21.3.1 3Dhierarchy.pl 21.4 Squid add-ons 21.4.1 transproxy 21.4.2 Iain's redirector package 21.4.3 Junkbusters 21.4.4 Squirm 21.4.5 chpasswd.cgi 21.4.6 jesred 21.4.7 squidGuard 21.4.8 Central Squid Server 21.5 Ident Servers 22. DISKD 22.1 What is DISKD? 22.2 Does it perform better? 22.3 What do I need to use it? 22.4 If I use DISKD, do I have to wipe out my current cache? 22.5 How do I configure message queues? 22.5.1 FreeBSD 22.5.2 Digital Unix 22.5.3 Linux 22.5.4 Solaris 22.6 How do I configure shared memory? 22.6.1 FreeBSD 22.6.2 Digital Unix 22.6.3 Linux 22.6.4 Solaris 23. Terms and Definitions 23.1 Neighbor ______________________________________________________________________ 1. About Squid, this FAQ, and other Squid information resources 1.1. What is Squid? Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Unlike traditional caching software, Squid handles all requests in a single, non- blocking, I/O-driven process. Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests. Squid supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, Squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings. Squid consists of a main server program squid, a Domain Name System lookup program dnsserver, some optional programs for rewriting requests and performing authentication, and some management and client tools. When squid starts up, it spawns a configurable number of dnsserver processes, each of which can perform a single, blocking Domain Name System (DNS) lookup. This reduces the amount of time the cache waits for DNS lookups. Squid is derived from the ARPA-funded Harvest project . 1.2. What is Internet object caching? Internet object caching is a way to store requested Internet objects (i.e., data available via the HTTP, FTP, and gopher protocols) on a system closer to the requesting site than to the source. Web browsers can then use the local Squid cache as a proxy HTTP server, reducing access time as well as bandwidth consumption. 1.3. Why is it called Squid? Harris' Lament says, ``All the good ones are taken." We needed to distinguish this new version from the Harvest cache software. Squid was the code name for initial development, and it stuck. 1.4. What is the latest version of Squid? Squid is updated often; please see the Squid home page for the most recent versions. 1.5. Who is responsible for Squid? Squid is the result of efforts by numerous individuals from the Internet community. Duane Wessels of the National Laboratory for Applied Network Research (funded by the National Science Foundation) leads code development. Please see the CONTRIBUTORS file for a list of our excellent contributors. 1.6. Where can I get Squid? You can download Squid via FTP from the primary FTP site or one of the many worldwide mirror sites . Many sushi bars also have Squid. 1.7. What Operating Systems does Squid support? The software is designed to operate on any modern Unix system, and is known to work on at least the following platforms: o Linux o FreeBSD o NetBSD o BSDI o OSF and Digital Unix o IRIX o SunOS/Solaris o NeXTStep o SCO Unix o AIX o HP-UX o ``OS/2'' For more specific information, please see platforms.html . If you encounter any platform-specific problems, please let us know by sending email to squid-bugs . 1.8. Does Squid run on Windows NT? Recent versions of Squid will compile and run on Windows/NT with the GNU-Win32 package . However, Squid does not yet perform well on Windows/NT. 1.9. What Squid mailing lists are available? o squid-users@ircache.net: general discussions about the Squid cache software. Subscribe via squid-users-request@ircache.net. Previous messages are available for browsing at the Squid Users Archive , and also at theaimsgroup.com . o squid-users-digest: digested (daily) version of above. Subscribe via squid-users-digest-request@ircache.net. o squid-announce@ircache.net: A receive-only list for announcements of new versions. Subscribe via squid-announce-request@ircache.net. o squid-bugs@ircache.net: A closed list for sending us bug reports. Bug reports received here are given priority over those mentioned on squid-users. o squid@ircache.net: A closed list for sending us feed-back and ideas. o squid-faq@ircache.net: A closed list for sending us feed-back, updates, and additions to the Squid FAQ. We also have a few other mailing lists which are not strictly Squid- related. o cache-snmp@ircache.net: A public list for discussion of Web Caching and SNMP issues and developments. Eventually we hope to put forth a standard Web Caching MIB. o icp-wg@ircache.net: Mostly-idle mailing list for the nonexistent ICP Working Group within the IETF. It may be resurrected some day, you never know! 1.10. I can't figure out how to unsubscribe from your mailing list. All of our mailing lists have ``-request'' addresses that you must use for subscribe and unsubscribe requests. To unsubscribe from the squid-users list, you send a message to squid-users- request@ircache.net and in the subject and/or body of your message, you put the magic word ``unsubscribe.'' 1.11. What Squid web pages are available? Several Squid and Caching-related web pages are available: o The Squid home page for information on the Squid software o The IRCache Mesh gives information on our operational mesh of caches. o The Squid FAQ (uh, you're reading it). o Oskar's Squid Users Guide . o The Information Resource Caching FAQ o Squid Programmers Guide . Yeah, its extremely incomplete. I assure you this is the most recent version. o Web Caching Reading list o Squid-1.0 Release Notes o Squid-1.1 Release Notes o Tutorial on Configuring Hierarchical Squid Caches o RFC 2186 ICPv2 -- Protocol o RFC 2187 ICPv2 -- Application o RFC 1016 1.12. Does Squid support SSL? Squid can proxy SSL requests. By default, Squid will forward all SSL requests directly to their origin servers. In firewall configurations, Squid will forward all SSL requests to one other proxy, defined with the ssl_proxy directive. 1.13. What's the legal status of Squid? Squid is copyrighted by the University of California San Diego. Squid uses some code developed by others . Squid is Free Software . Squid is licensed under the terms of the GNU General Public License . 1.14. Is Squid year-2000 compliant? We think so. Squid uses the Unix time format for all internal time representations. Potential problem areas are in printing and parsing other time representations. We have made the following fixes in to address the year 2000: o cache.log timestamps use 4-digit years instead of just 2 digits. o parse_rfc1123() assumes years less than "70" are after 2000. o parse_iso3307_time() checks all four year digits. Year-2000 fixes were applied to the following Squid versions: o squid-2.1 : Year parsing bug fixed for dates in the "Wed Jun 9 01:29:59 1993 GMT" format (Richard Kettlewell). o squid-1.1.22: Fixed likely year-2000 bug in ftpget's timestamp parsing (Henrik Nordstrom). o squid-1.1.20: Misc fixes (Arjan de Vet). Patches: o Richard's lib/rfc1123.c patch <../Y2K/patch3>. If you are still running 1.1.X, then you should apply this patch to your source and recompile. o Henrik's src/ftpget.c patch <../Y2K/patch2>. o Arjan's lib/rfc1123.c patch <../Y2K/patch1>. Squid-2.2 and earlier versions have a New Year bug . This is not strictly a Year-2000 bug; it would happen on the first day of any year. 1.15. Can I pay someone for Squid support? Yep. The following companies will support Squid for you: FSC Internet Corp We provide commercial Squid support; we frequently deploy squids in caching proxy, transparent caching proxy, httpd accelerator, and hierarchical modes, for a wide variety of corporate and public sector clients, including members of the Fortune 500. Inquires can be directed to FSC Information . Infomatik Inc., Brazil We are supporting the complete SA territory and speak Portuguese, Spanish, English and German. We are experienced in compiling Squid for SCO and FreeBSD. We do custom configurations, OS and cache fine tuning, maintenance and remote adminstration. Also we can help setting your server into existing hierarchies giving you best performance. Contact us on our Web site or send some e-mail to our Infomatik Information address. netwing, Austria netwing provides commercial Squid support. We are specialized in installing Squid Proxy Systems on Linux machines. Please contact us by sending an e-mail to squid- admin@netwing.at . pushcache.com, Inc., Austin, Texas pushcache.com provides a version of Squid and an accompanying library modified to support push as well as the traditional pull. We support our software and traditional Squid. Contact us at info@pushcache.com. INTERNET ONLINE AG, Cologne, Germany We are supporting squid, apache, linux and other public license software for professional use in germany. email us . Plugged In Software, Australia Plugged In Software provides commercial support for Squid, Apache, sendmail, Samba, RedHat Linux and other Open Source (TM) software. For further information, please see our home page or email our sales staff . NEC Australia NEC Australia provides managed firewall solutions and support for Squid proxies running under linux based firewalls. For further information, contact: Garry Thorpe Chief Engineer NEC Australia Ph +61 2 6250 8749 Email: garry.thorpe@nec.com.au Linux Manages! Linux Manages! also provides support for squid caches on Linux. We do installations, maintenance, linking squids into existing hierarchies, remote support and tuning of the cache to the customer's needs. Contact address is linux@thing.at. Wizards Computers, Lahore - Pakistan We provide commercial support for Squid, SCO UNIX, Cisco, Apache, Sendmail and RedHat Linux . Specially Squid and Linux based Transparent Caches. We also build and provide support for LANs. For further information, please email us at: wizards@brain.net.pk EDP S.r.l - Italy We provide commercial support for Squid, SCO UNIX, Cisco, Apache, Sendmail and any kind of Linux. Find us at: http://www.edpweb.com E-mail us at: info@edpweb.com. ICL Industrial Code and Logic , located in Cambridge, Ontario, Canada. Snerpa, Iceland Snerpa provides commercial Squid Support since 1995. We specialize in Linux solutions and we provide the INfilter Web content control database for Squid. You can contact us at support@snerpa.is . If you know someone who takes money for supporting Squid, let us know and we will add their information here. 1.16. Squid FAQ contributors The following people have made contributions to this document: o Jonathan Larmour o Cord Beermann o Tony Sterrett o Gerard Hynes o Katayama, Takeo o Duane Wessels o K Claffy o Paul Southworth o Oskar Pearson o Ong Beng Hui o Torsten Sturm o James R Grinter o Rodney van den Oever o Kolics Bertold o Carson Gaspar o Michael O'Reilly o Hume Smith o Richard Ayres o John Saunders o Miquel van Smoorenburg o David J N Begley o Kevin Sartorelli o Andreas Doering o Mark Visser o tom minchin o Jens-S. Voeckler o Andre Albsmeier o Doug Nazar o Henrik Nordstrom o Mark Reynolds o Arjan de Vet o Peter Wemm o John Line o Jason Armistead o Chris Tilbury o Jeff Madison o Mike Batchelor o Bill Bogstad o Radu Greab o F.J. Bosscha o Brian Feeny o Martin Lyons o David Luyer o Chris Foote o Jens Elkner Please send corrections, updates, and comments to: squid- faq@ircache.net . 1.17. About This Document This document is copyrighted (2000) by Duane Wessels. This document was written in SGML and converted with the SGML-Tools package . This document is available in HTML , compressed Postscript , and plain text . 1.17.1. Want to contribute? Please write in SGML... It is easier for us if you send us text which is close to "correct" SGML. The SQUID FAQ currently uses the LINUXDOC DTD. Its probably easiest to follow examples in the this file. Here are the basics: Use the tag for links, instead of HTML Use for emphasis, config options, and pathnames: usr/local/squid/etc/squid.conf foo bar Use , just like HTML's
 to show unformatted text.

  2.  Getting and Compiling Squid

  2.1.  Which file do I download to get Squid?

  You must download a source archive file of the form squid-x.y.z-
  src.tar.gz (eg, squid-1.1.6-src.tar.gz) from the Squid home page
  , or.  the Squid FTP site
  .  Context diffs are available for
  upgrading to new versions.  These can be applied with the patch
  program (available from the GNU FTP site
  ).

  2.2.  How do I compile Squid?

  For Squid-1.0 and Squid-1.1 versions, you can just type make from the
  top-level directory after unpacking the source files.  For example:

          % tar xzf squid-1.1.21-src.tar.gz
          % cd squid-1.1.21
          % make

  For Squid-2 you must run the configure script yourself before running
  make:

          % tar xzf squid-2.0.RELEASE-src.tar.gz
          % cd squid-2.0.RELEASE
          % ./configure
          % make

  2.3.  What kind of compiler do I need?

  To compile Squid, you will need an ANSI C compiler.  Almost all modern
  Unix systems come with pre-installed compilers which work just fine.
  The old SunOS compilers do not have support for ANSI C, and the Sun
  compiler for Solaris is a product which must be purchased separately.

  If you are uncertain about your system's C compiler, The GNU C
  compiler is available at the GNU FTP site
  .  In addition to gcc, you may also
  want or need to install the binutils package.

  2.4.  Do you have pre-compiled binaries available?

  The developers do not have the resources to make pre-compiled binaries
  available.  Instead, we invest effort into making the source code very
  portable.  Some people have made binary packages available.  Please
  see our Platforms Page .

  The SGI Freeware  site has pre-compiled
  packages for SGI IRIX.

  Squid binaries for FreeBSD on Alpha and Intel
  .

  Squid binaries for NetBSD on everything
  

  2.5.  How do I apply a patch or a diff?

  You need the patch program.  You should probably duplicate the entire
  directory structure before applying the patch.  For example, if you
  are upgrading from squid-1.1.10 to 1.1.11, you would run these
  commands:

          cd squid-1.1.10
          mkdir ../squid-1.1.11
          find . -depth -print | cpio -pdv ../squid-1.1.11
          cd ../squid-1.1.11
          patch < /tmp/diff-1.1.10-1.1.11

  After the patch has been applied, you must rebuild Squid from the very
  beginning, i.e.:

          make realclean
          ./configure
          make
          make install

  Note, In later distributions (Squid 2), 'realclean' has been changed
  to 'distclean'.

  If patch keeps asking for a file name, try adding ``-p0'':

          patch -p0 < filename

  If your patch program seems to complain or refuses to work, you should
  get a more recent version, from the GNU FTP site
  , for example.

  2.6.  configure  options

  The configure script can take numerous options.  The most useful is
  --prefix to install it in a different directory.  The default
  installation directory is /usr/local/squid/.  To change the default,
  you could do:

          % cd squid-x.y.z
          % ./configure --prefix=/some/other/directory/squid

  Type

          % ./configure --help

  to see all available options.  You will need to specify some of these
  options to enable or disable certain features.  Some options which are
  used often include:

    --prefix=PREFIX         install architecture-independent files in PREFIX
                            [/usr/local/squid]
    --enable-dlmalloc[=LIB] Compile & use the malloc package by Doug Lea
    --enable-gnuregex       Compile GNUregex
    --enable-splaytree      Use SPLAY trees to store ACL lists
    --enable-xmalloc-debug  Do some simple malloc debugging
    --enable-xmalloc-debug-trace
                            Detailed trace of memory allocations
    --enable-xmalloc-statistics
                            Show malloc statistics in status page
    --enable-carp           Enable CARP support
    --enable-async-io       Do ASYNC disk I/O using threads
    --enable-icmp           Enable ICMP pinging
    --enable-delay-pools    Enable delay pools to limit bandwith usage
    --enable-mem-gen-trace  Do trace of memory stuff
    --enable-useragent-log  Enable logging of User-Agent header
    --enable-kill-parent-hack
                            Kill parent on shutdown
    --enable-snmp           Enable SNMP monitoring
    --enable-time-hack      Update internal timestamp only once per second
    --enable-cachemgr-hostname[=hostname]
                            Make cachemgr.cgi default to this host
    --enable-arp-acl        Enable use of ARP ACL lists (ether address)
    --enable-htpc           Enable HTCP protocol
    --enable-forw-via-db    Enable Forw/Via database
    --enable-cache-digests  Use Cache Digests
                            see http://www.squid-cache.org/Doc/FAQ/FAQ-16.html
    --enable-err-language=lang
                            Select language for Error pages (see errors dir)

  2.7.  undefined reference to __inet_ntoa

  by Kevin Sartorelli  and Andreas Doering
  .

  Probably you've recently installed bind 8.x.  There is a mismatch
  between the header files and DNS library that Squid has found.  There
  are a couple of things you can try.

  First, try adding -lbind to XTRA_LIBS  in src/Makefile.  If -lresolv
  is already there, remove it.

  If that doesn't seem to work, edit your arpa/inet.h file and comment
  out the following:

          #define inet_addr               __inet_addr
          #define inet_aton               __inet_aton
          #define inet_lnaof              __inet_lnaof
          #define inet_makeaddr           __inet_makeaddr
          #define inet_neta               __inet_neta
          #define inet_netof              __inet_netof
          #define inet_network            __inet_network
          #define inet_net_ntop           __inet_net_ntop
          #define inet_net_pton           __inet_net_pton
          #define inet_ntoa               __inet_ntoa
          #define inet_pton               __inet_pton
          #define inet_ntop               __inet_ntop
          #define inet_nsap_addr          __inet_nsap_addr
          #define inet_nsap_ntoa          __inet_nsap_ntoa

  2.8.  How can I get true DNS TTL info into Squid's IP cache?

  If you have source for BIND, you can modify it as indicated in the
  diff below.  It causes the global variable _dns_ttl_ to be set with
  the TTL of the most recent lookup.  Then, when you compile Squid, the
  configure script will look for the _dns_ttl_ symbol in libresolv.a.
  If found, dnsserver will return the TTL value for every lookup.

  This hack was contributed by Endre Balint Nagy
  .

  diff -ru bind-4.9.4-orig/res/gethnamaddr.c bind-4.9.4/res/gethnamaddr.c
  --- bind-4.9.4-orig/res/gethnamaddr.c   Mon Aug  5 02:31:35 1996
  +++ bind-4.9.4/res/gethnamaddr.c        Tue Aug 27 15:33:11 1996
  @@ -133,6 +133,7 @@
   } align;

   extern int h_errno;
  +int _dns_ttl_;

   #ifdef DEBUG
   static void
  @@ -223,6 +224,7 @@
          host.h_addr_list = h_addr_ptrs;
          haveanswer = 0;
          had_error = 0;
  +       _dns_ttl_ = -1;
          while (ancount-- > 0 && cp < eom && !had_error) {
                  n = dn_expand(answer->buf, eom, cp, bp, buflen);
                  if ((n < 0) || !(*name_ok)(bp)) {
  @@ -232,8 +234,11 @@
                  cp += n;                        /* name */
                  type = _getshort(cp);
                  cp += INT16SZ;                  /* type */
  -               class = _getshort(cp);
  -               cp += INT16SZ + INT32SZ;        /* class, TTL */
  +               class = _getshort(cp);
  +               cp += INT16SZ;                  /* class */
  +               if (qtype == T_A  && type == T_A)
  +                       _dns_ttl_ = _getlong(cp);
  +               cp += INT32SZ;                  /* TTL */
                  n = _getshort(cp);
                  cp += INT16SZ;                  /* len */
                  if (class != C_IN) {

  And here is a patch for BIND-8:

  *** src/lib/irs/dns_ho.c.orig   Tue May 26 21:55:51 1998
  --- src/lib/irs/dns_ho.c        Tue May 26 21:59:57 1998
  ***************
  *** 87,92 ****
  --- 87,93 ----
    #endif

    extern int h_errno;
  + int _dns_ttl_;

    /* Definitions. */

  ***************
  *** 395,400 ****
  --- 396,402 ----
          pvt->host.h_addr_list = pvt->h_addr_ptrs;
          haveanswer = 0;
          had_error = 0;
  +       _dns_ttl_ = -1;
          while (ancount-- > 0 && cp < eom && !had_error) {
                  n = dn_expand(ansbuf, eom, cp, bp, buflen);
                  if ((n < 0) || !(*name_ok)(bp)) {
  ***************
  *** 404,411 ****
                  cp += n;                        /* name */
                  type = ns_get16(cp);
                  cp += INT16SZ;                  /* type */
  !               class = ns_get16(cp);
  !               cp += INT16SZ + INT32SZ;        /* class, TTL */
                  n = ns_get16(cp);
                  cp += INT16SZ;                  /* len */
                  if (class != C_IN) {
  --- 406,416 ----
                  cp += n;                        /* name */
                  type = ns_get16(cp);
                  cp += INT16SZ;                  /* type */
  !               class = _getshort(cp);
  !               cp += INT16SZ;                  /* class */
  !               if (qtype == T_A  && type == T_A)
  !                       _dns_ttl_ = _getlong(cp);
  !               cp += INT32SZ;                  /* TTL */
                  n = ns_get16(cp);
                  cp += INT16SZ;                  /* len */
                  if (class != C_IN) {

  2.9.  My platform is BSD/OS or BSDI and I can't compile Squid

          cache_cf.c: In function `parseConfigFile':
          cache_cf.c:1353: yacc stack overflow before `token'
          ...

  You may need to upgrade your gcc installation to a more recent
  version.  Check your gcc version with

          gcc -v

  If it is earlier than 2.7.2, you might consider upgrading.

  Alternatively, you can get pre-compiled Squid binaries for BSD/OS 2.1
  at the BSD patches FTP site ,
  patch U210-019 .

  2.10.  Problems compiling libmiscutil.a  on Solaris

  The following error occurs on Solaris systems using gcc when the
  Solaris C compiler is not installed:

          /usr/bin/rm -f libmiscutil.a
          /usr/bin/false r libmiscutil.a rfc1123.o rfc1738.o util.o ...
          make[1]: *** [libmiscutil.a] Error 255
          make[1]: Leaving directory `/tmp/squid-1.1.11/lib'
          make: *** [all] Error 1

  Note on the second line the /usr/bin/false.   This is supposed to be a
  path to the ar program.  If configure cannot find ar on your system,
  then it substitues false.

  To fix this you either need to:

  o  Add /usr/ccs/bin to your PATH.  This is where the ar command should
     be.  You need to install SUNWbtool if ar is not there.  Otherwise,

  o  Install the binutils package from the GNU FTP site
     .  This package includes programs
     such as ar, as, and ld.

  2.11.  I have problems compiling Squid on Platform Foo.

  Please check the page of platforms  on which Squid is
  known to compile.  Your problem might be listed there together with a
  solution.  If it isn't listed there, mail us what you are trying, your
  Squid version, and the problems you encounter.

  2.12.  I see a lot warnings while compiling Squid.

  Warnings are usually not a big concern, and can be common with
  software designed to operate on multiple platforms.  If you feel like
  fixing compile-time warnings, please do so and send us the patches.

  2.13.  Building Squid on OS/2

  by Doug Nazar 

  In order in compile squid, you need to have a reasonable facsimile of
  a Unix system installed.  This includes bash, make, sed, emx, various
  file utilities and a few more. I've setup a TVFS drive that matches a
  Unix file system but this probably isn't strictly necessary.

  I made a few modifications to the pristine EMX 0.9d install.

  1. added defines for strcasecmp() & strncasecmp() to string.h

  2. changed all occurrences of time_t to signed long instead of
     unsigned long
  3. hacked ld.exe

     a. to search for both xxxx.a and libxxxx.a

     b. to produce the correct filename when using the -Zexe option

  You will need to run scripts/convert.configure.to.os2 (in the Squid
  source distribution) to modify the configure script so that it can
  search for the various programs.

  Next, you need to set a few environment variables (see EMX docs for
  meaning):

          export EMXOPT="-h256 -c"
          export LDFLAGS="-Zexe -Zbin -s"

  Now you are ready to configure squid:

          ./configure

  Compile everything:

          make

  and finally, install:

          make install

  This will by default, install into /usr/local/squid. If you wish to
  install somewhere else, see the --prefix option for configure.

  Now, don't forget to set EMXOPT before running squid each time. I
  recommend using the -Y and -N options.

  3.  Installing and Running Squid

  3.1.  How big of a system do I need to run Squid?

  There are no hard-and-fast rules.  The most important resource for
  Squid is physical memory.  Your processor does not need to be ultra-
  fast.  Your disk system will be the major bottleneck, so fast disks
  are important for high-volume caches.  Do not use IDE disks if you can
  help it.

  In late 1998, if you are buying a new machine for a cache, I would
  recommend the following configuration:

  o  300 MHz Pentium II CPU

  o  512 MB RAM

  o  Five 9 GB UW-SCSI disks

     Your system disk, and logfile disk can probably be IDE without
     losing any cache performance.

  Also, see Squid Sizing for Intel Platforms
   by Martin Hamilton This
  is a very nice page summarizing system configurations people are using
  for large Squid caches.

  3.2.  How do I install Squid?

  After ``compiling Squid'', you can install it with this simple
  command:

          % make install

  If you have enabled the ``ICMP features'' then you will also want to
  type

          % su
          # make install-pinger

  After installing, you will want to edit and customize the squid.conf
  file.  By default, this file is located at
  /usr/local/squid/etc/squid.conf.

  Also, a QUICKSTART guide has been included with the source
  distribution.  Please see the directory where you unpacked the source
  archive.

  3.3.  What does the squid.conf  file do?

  The squid.conf file defines the configuration for squid.  the
  configuration includes (but not limited to) HTTP port number, the ICP
  request port number, incoming and outgoing requests, information about
  firewall access, and various timeout information.

  3.4.  Do you have a squid.conf  example?

  Yes, after you make install, a sample squid.conf file will exist in
  the ``etc" directory under the Squid installation directory.

  The sample squid.conf file contains comments explaining each option.

  3.5.  How do I start Squid?

  After you've finished editing the configuration file, you can start
  Squid for the first time.  The procedure depends a little bit on which
  version you are using.

  3.5.1.  Squid version 2.X

  First, you must create the swap directories.  Do this by running Squid
  with the -z option:

          % /usr/local/squid/bin/squid -z

  Once that completes, you can start Squid and try it out.  Probably the
  best thing to do is run it from your terminal and watch the debugging
  output.  Use this command:

          % /usr/local/squid/bin/squid -NCd1

  If everything is working okay, you will see the line:

          Ready to serve requests.

  If you want to run squid in the background, as a daemon process, just
  leave off all options:

          % /usr/local/squid/bin/squid

  NOTE: depending on your configuration, you may need to start squid as
  root.

  3.5.2.  Squid version 1.1.X

  With version 1.1.16 and later, you must first run Squid with the -z
  option to create the cache swap directories.

          % /usr/local/squid/bin/squid -z

  Squid will exit when it finishes creating all of the directories.
  Next you can start RunCache:

          % /usr/local/squid/bin/RunCache &

  For versions before 1.1.6 you should just start RunCache immediately,
  instead of running squid -z first.

  3.6.  How do I start Squid automatically when the system boots?

  3.6.1.  Squid Version 2.X

  Squid-2 has a restart feature built in.  This greatly simplifies
  starting Squid and means that you don't need to use RunCache or
  inittab.  At the minimum, you only need to enter the pathname to the
  Squid executable.  For example:

          /usr/local/squid/bin/squid

  Squid will automatically background itself and then spawn a child
  process.  In your syslog messages file, you should see something like
  this:

          Sep 23 23:55:58 kitty squid[14616]: Squid Parent: child process 14617 started

  That means that process ID 14563 is the parent process which monitors
  the child process (pid 14617).  The child process is the one that does
  all of the work.  The parent process just waits for the child process
  to exit.  If the child process exits unexpectedly, the parent will
  automatically start another child process.  In that case, syslog
  shows:

          Sep 23 23:56:02 kitty squid[14616]: Squid Parent: child process 14617 exited with status 1
          Sep 23 23:56:05 kitty squid[14616]: Squid Parent: child process 14619 started

  If there is some problem, and Squid can not start, the parent process
  will give up after a while.  Your syslog will show:

          Sep 23 23:56:12 kitty squid[14616]: Exiting due to repeated, frequent failures

  When this happens you should check your syslog messages and cache.log
  file for error messages.

  When  you look at a process (ps command) listing,  you'll see two
  squid processes:

          24353  ??  Ss     0:00.00 /usr/local/squid/bin/squid
          24354  ??  R      0:03.39 (squid) (squid)

  The first is the parent process, and the child process is the one
  called ``(squid)''.  Note that if you accidentally kill the parent
  process, the child process will not notice.

  If you want to run Squid from your termainal and prevent it from
  backgrounding and spawning a child process, use the -N command line
  option.

          /usr/local/squid/bin/squid -N

  3.6.2.  Squid Version 1.1.X

  3.6.2.1.  From inittab

  On systems which have an /etc/inittab file (Digital Unix, Solaris,
  IRIX, HP-UX, Linux), you can add a line like this:

          sq:3:respawn:/usr/local/squid/bin/squid.sh < /dev/null >> /tmp/squid.log 2>&1

  We recommend using a squid.sh shell script, but you could instead call
  Squid directly.  A sameple squid.sh script is shown below:

          #!/bin/sh
          C=/usr/local/squid
          PATH=/usr/bin:$C/bin
          TZ=PST8PDT
          export PATH TZ

          notify="root"
          cd $C
          umask 022
          sleep 10
          while [ -f /tmp/nosquid ]; do
                  sleep 1
          done
          /usr/bin/tail -20 $C/logs/cache.log \
                  | Mail -s "Squid restart on `hostname` at `date`" $notify
          exec bin/squid -CYs

  3.6.2.2.  From rc.local

  On BSD-ish systems, you will need to start Squid from the ``rc''
  files, usually /etc/rc.local.  For example:

          if [ -f /usr/local/squid/bin/RunCache ]; then
                  echo -n ' Squid'
                  (/usr/local/squid/bin/RunCache &)
          fi

  3.6.2.3.  From init.d

  Some people may want to use the ``init.d'' startup system.  If you
  start Squid (or RunCache) from an ``init.d'' script, then you should
  probably use nohup, e.g.:

          nohup squid -sY $conf >> $logdir/squid.out 2>&1

  Also, you may need to add a line to trap certain signals and prevent
  them from being sent to the Squid process.  Add this line at the top
  of your script:

          trap '' 1 2 3 18

  3.7.  How do I tell if Squid is running?

  You can use the client program:

          % client http://www.netscape.com/ > test

  There are other command-line HTTP client programs available as well.
  Two that you may find useful are wget
   and echoping
  .

  Another way is to use Squid itself to see if it can signal a running
  Squid process:

          % squid -k check

  And then check the shell's exit status variable.

  Also, check the log files, most importantly the access.log and
  cache.log files.

  3.8.  squid  command line options

  These are the command line options for Squid-2:

     -a Specify an alternate port number for incoming HTTP requests.
        Useful for testing a configuration file on a non-standard port.

     -d Debugging level for ``stderr'' messages.  If you use this
        option, then debugging messages up to the specified level will
        also be written to stderr.

     -f Specify an alternate squid.conf file instead of the pathname
        compiled into the executable.

     -h Prints the usage and help message.

     -k reconfigure
        Sends a HUP signal, which causes Squid to re-read its
        configuration files.

     -k rotate
        Sends an USR1 signal, which causes Squid to rotate its log
        files.  Note, if logfile_rotate is set to zero, Squid still
        closes and re-opens all log files.

     -k shutdown
        Sends a TERM signal, which causes Squid to wait briefly for
        current connections to finish and then exit.  The amount of time
        to wait is specified with shutdown_lifetime.

     -k interrupt
        Sends an INT signal, which causes Squid to shutdown immediately,
        without waiting for current connections.

     -k kill
        Sends a KILL signal, which causes the Squid process to exit
        immediately, without closing any connections or log files.  Use
        this only as a last resort.

     -k debug
        Sends an USR2 signal, which causes Squid to generate full
        debugging messages until the next USR2 signal is recieved.
        Obviously very useful for debugging problems.

     -k check
        Sends a ``ZERO'' signal to the Squid process.  This simply
        checks whether or not the process is actually running.

     -s Send debugging (level 0 only) message to syslog.

     -u Specify an alternate port number for ICP messages.  Useful for
        testing a configuration file on a non-standard port.

     -v Prints the Squid version.

     -z Creates disk swap directories.  You must use this option when
        installing Squid for the first time, or when you add or modify
        the cache_dir configuration.

     -D Do not make initial DNS tests.  Normally, Squid looks up some
        well-known DNS hostnames to ensure that your DNS name resolution
        service is working properly.

     -F If the swap.state logs are clean, then the cache is rebuilt in
        the ``foreground'' before any requests are served.  This will
        decrease the time required to rebuild the cache, but HTTP
        requests will not be satisified during this time.

     -N Do not automatically become a background daemon process.

     -R Do not set the SO_REUSEADDR option on sockets.

     -V Enable virtual host support for the httpd-accelerator mode.
        This is identical to writing httpd_accel_host virtual in the
        config file.

     -X Enable full debugging while parsing the config file.

     -Y Return ICP_OP_MISS_NOFETCH instead of ICP_OP_MISS while the
        swap.state file is being read.  If your cache has mostly child
        caches which use ICP, this will allow your cache to rebuild
        faster.

  3.9.  How do I see how Squid works?

  o  Check the cache.log file in your logs directory.  It logs
     interesting (and boring) things as a part of its normal operation.

  o  Install and use the ``Cache Manager''.

  4.  Configuration issues

  4.1.  How do I join a cache hierarchy?

  To place your cache in a hierarchy, use the cache_host directive in
  squid.conf to specify the parent and sibling nodes.

  For example, the following squid.conf file on childcache.example.com
  configures its cache to retrieve data from one parent cache and two
  sibling caches:

          #  squid.conf - On the host: childcache.example.com
          #
          #  Format is: hostname  type  http_port  udp_port
          #
          cache_host parentcache.example.com   parent  3128 3130
          cache_host childcache2.example.com   sibling 3128 3130
          cache_host childcache3.example.com   sibling 3128 3130

  The cache_host_domain directive allows you to specify that certain
  caches siblings or parents for certain domains:

          #  squid.conf - On the host: sv.cache.nlanr.net
          #
          #  Format is: hostname  type  http_port  udp_port
          #

          cache_host electraglide.geog.unsw.edu.au parent 3128 3130
          cache_host cache1.nzgate.net.nz          parent 3128 3130
          cache_host pb.cache.nlanr.net   parent 3128 3130
          cache_host it.cache.nlanr.net   parent 3128 3130
          cache_host sd.cache.nlanr.net   parent 3128 3130
          cache_host uc.cache.nlanr.net   sibling 3128 3130
          cache_host bo.cache.nlanr.net   sibling 3128 3130
          cache_host_domain electraglide.geog.unsw.edu.au .au
          cache_host_domain cache1.nzgate.net.nz   .au .aq .fj .nz
          cache_host_domain pb.cache.nlanr.net     .uk .de .fr .no .se .it
          cache_host_domain it.cache.nlanr.net     .uk .de .fr .no .se .it
          cache_host_domain sd.cache.nlanr.net     .mx .za .mu .zm

  The configuration above indicates that the cache will use
  pb.cache.nlanr.net and it.cache.nlanr.net for domains uk, de, fr, no,
  se and it, sd.cache.nlanr.net for domains mx, za, mu and zm, and
  cache1.nzgate.net.nz for domains au, aq, fj, and nz.

  4.2.  How do I join NLANR's cache hierarchy?

  We have a simple set of guidelines for joining
   the NLANR cache hierarchy.

  4.3.  Why should I want to join NLANR's cache hierarchy?

  The NLANR hierarchy can provide you with an initial source for parent
  or sibling caches.  Joining the NLANR global cache system will
  frequently improve the performance of your caching service.

  4.4.  How do I register my cache with NLANR's registration service?

  Just enable these options in your squid.conf and you'll be registered:

          cache_announce 24
          announce_to sd.cache.nlanr.net:3131

  NOTE: announcing your cache is not the same thing as joining the NLANR
  cache hierarchy.  You can join the NLANR cache hierarchy without
  registering, and you can register without joining the NLANR cache
  hierarchy.

  4.5.  How do I find other caches close to me and arrange par-
  ent/child/sibling relationships with them?

  Visit the NLANR cache registration database
   to discover other caches near
  you.  Keep in mind that just because a cache is registered in the
  database does not mean they are willing to be your
  parent/sibling/child.  But it can't hurt to ask...

  4.6.  My cache registration is not appearing in the Tracker database.

  o  Your site will not be listed if your cache IP address does not have
     a DNS PTR record. If we can't map the IP address back to a domain
     name, it will be listed as ``Unknown.''

  o  The registration messages are sent with UDP. We may not be
     receiving your announcement message due to firewalls which block
     UDP, or dropped packets due to congestion.

  4.7.  What is the httpd-accelerator mode?

  This entry has been moved to ``a different section''.

  4.8.  How do I configure Squid to work behind a firewall?

  Note: The information here is current for version 2.2.

  If you are behind a firewall then you can't make direct connections to
  the outside world, so you must use a parent cache.  Squid doesn't use
  ICP queries for a request if it's behind a firewall or if there is
  only one parent.

  You can use the never_direct access list in squid.conf to specify
  which requests must be forwarded to your parent cache outside the
  firewall.  For example, if Squid can connect directly to all servers
  that end with mydomain.com, but must use the parent for all others,
  you would write:

          acl INSIDE dstdomain mydomain.com
          never_direct deny INSIDE

  Note that the outside domains will not match the INSIDE acl.  When
  there are no matches, the default action is the opposite of the last
  action.  Its as if there is an implicit never_direct allow all as the
  final rule.

  You could also specify internal servers by IP address

          acl INSIDE_IP dst 1.2.3.4/24
          never_direct deny INSIDE

  Note, however that when you use IP addresses, Squid must perform a DNS
  lookup to convert URL hostnames to an address.  Your internal DNS
  servers may not be able to lookup external domains.

  If you use never_direct and you have multiple parent caches, then you
  probably will want to mark one of them as a default choice in case
  Squid can't decide which one to use.  That is done with the default
  keyword on a cache_peer line.  For example:

          cache_peer xyz.mydomain.com parent 3128 0 default

  4.9.  How do I configure Squid forward all requests to another proxy?

  Note: The information here is current for version 2.2.

  First, you need to give Squid a parent cache.  Second, you need to
  tell Squid it can not connect directly to origin servers.  This is
  done with three configuration file lines:

          cache_peer parentcache.foo.com parent 3128 0 no-query default
          acl all src 0.0.0.0/0.0.0.0
          never_direct allow all

  Note, with this configuration, if the parent cache fails or becomes
  unreachable, then every request will result in an error message.

  In case you want to be able to use direct connections when all the
  parents go down you should use a different approach:

          cache_peer parentcache.foo.com parent 3128 0 no-query
          prefer_direct off

  The default behaviour of Squid in the absence of positive ICP, HTCP,
  etc replies is to connect to the origin server instead of using par-
  ents.  The prefer_direct off directive tells Squid to try parents
  first.

  4.10.  I have dnsserver  processes that aren't being used, should I
  lower the number in squid.conf ?

  The dnsserver processes are used by squid because the gethostbyname(3)
  library routines used to convert web sites names to their internet
  addresses blocks until the function returns (i.e., the process that
  calls it has to wait for a reply). Since there is only one squid
  process, everyone who uses the cache would have to wait each time the
  routine was called.  This is why the dnsserver is a separate process,
  so that these processes can block, without causing blocking in squid.

  It's very important that there are enough dnsserver processes to cope
  with every access you will need, otherwise squid will stop
  occasionally.  A good rule of thumb is to make sure you have at least
  the maximum number of dnsservers squid has ever needed on your system,
  and probably add two to be on the safe side. In other words, if you
  have only ever seen at most three dnsserver processes in use, make at
  least five.  Remember that a dnsserver is small and, if unused, will
  be swapped out.

  4.11.  My dnsserver  average/median service time seems high, how can I
  reduce it?

  First, find out if you have enough dnsserver processes running by
  looking at the Cachemanager dns output.  Ideally, you should see that
  the first dnsserver handles a lot of requests, the second one less
  than the first, etc.  The last dnsserver should have serviced
  relatively few requests.  If there is not an obvious decreasing trend,
  then you need to increase the number of dns_children in the
  configuration file.  If the last dnsserver has zero requests, then you
  definately have enough.

  Another factor which affects the dnsserver service time is the
  proximity of your DNS resolver.  Normally we do not recommend running
  Squid and named on the same host.  Instead you should try use a DNS
  resolver (named) on a different host, but on the same LAN.  If your
  DNS traffic must pass through one or more routers, this could be
  causing unnecessary delays.

  4.12.  How can I easily change the default HTTP port?

  Before you run the configure script, simply set the CACHE_HTTP_PORT
  environment variable.

          setenv CACHE_HTTP_PORT 8080
          ./configure
          make
          make install

  4.13.  Is it possible to control how big each cache_dir  is?

  With Squid-1.1 it is NOT possible.  Each cache_dir is assumed to be
  the same size.  The cache_swap setting defines the size of all
  cache_dir's taken together.  If you have N cache_dir's then each one
  will hold cache_swap  N Megabytes.

  4.14.  What cache_dir  size should I use?

  Most people have a disk partition dedicated to the Squid cache.  You
  don't want to use the entire partition size.  You have to leave some
  extra room.  Currently, Squid is not very tolerant of running out of
  disk space.

  Lets say you have a 9GB disk.  Remember that disk manufacturers lie
  about the space available.  A so-called 9GB disk usually results in
  about 8.5GB of raw, usable space.  First, put a filesystem on it, and
  mount it.  Then check the ``available space'' with your df program.
  Note that you lose some disk space to filesystem overheads, like
  superblocks, inodes, and directory entries.  Also note that Unix
  normally keeps 10% free for itself.   So with a 9GB disk, you're
  probably down to about 8GB after formatting.

  Next, I suggest taking off another 10% or so for Squid overheads, and
  a "safe buffer."  Squid normally puts its swap.state files in each
  cache directory.  These grow in size until you rotate the logs, or
  restart squid.  Also note that Squid performs better when there is
  more free space.  So if performance is important to you, then take off
  even more space.  Typically, for a 9GB disk, I recommend a cache_dir
  setting of 6000 to 7500 Megabytes:

  cache_dir ... 7000 16 256

  Its better to start out conservative.  After the cache becomes full,
  look at the disk usage.  If you think there is plenty of unused space,
  then increase the cache_dir setting a little.

  If you're getting ``disk full'' write errors, then you definately need
  to decrease your cache size.
  4.15.  I'm adding a new cache_dir .  Will I lose my cache?

  With Squid-1.1, yes, you will lose your cache.  This is because
  version 1.1 uses a simplistic algorithm to distribute files between
  cache directories.

  With Squid-2, you will not lose your existing cache.  You can add and
  delete cache_dir's without affecting any of the others.

  4.16.  Squid and http-gw  from the TIS toolkit.

  Several people on both the fwtk-users and the squid-users mailing
  asked about using Squid in combination with http-gw from the TIS
  toolkit .  The most elegant way in my opinion is
  to run an internal Squid caching proxyserver which handles client
  requests and let this server forward it's requests to the http-gw
  running on the firewall. Cache hits won't need to be handled by the
  firewall.

  In this example Squid runs on the same server as the http-gw, Squid
  uses 8000 and http-gw uses 8080 (web).  The local domain is home.nl.

  4.16.1.  Firewall configuration:

  Either run http-gw as a daemon from the /etc/rc.d/rc.local (Linux
  Slackware):

          exec /usr/local/fwtk/http-gw -daemon 8080

  or run it from inetd like this:

          web stream      tcp      nowait.100  root /usr/local/fwtk/http-gw http-gw

  I increased the watermark to 100 because a lot of people run into
  problems with the default value.

  Make sure you have at least the following line in
  /usr/local/etc/netperm-table:

          http-gw:        hosts 127.0.0.1

  You could add the IP-address of your own workstation to this rule and
  make sure the http-gw by itself works, like:

          http-gw:                hosts 127.0.0.1 10.0.0.1

  4.16.2.  Squid configuration:

  The following settings are important:

          http_port       8000
          icp_port        0

          cache_host      localhost.home.nl parent 8080 0 default
          acl HOME        dstdomain home.nl
          never_direct    deny HOME

  This tells Squid to use the parent for all domains other than home.nl.
  Below, access.log entries show what happens if you do a reload on the
  Squid-homepage:

  872739961.631   1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl -
  872739962.976   1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl -
  872739963.007   1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl -
  872739963.061   1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl

  http-gw entries in syslog:

  Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
  Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/
  Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
  Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
  Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif
  Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
  Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif
  Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
  Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif
  Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
  Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2
  Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3

  To summarize:

  Advantages:

  o  http-gw allows you to selectively block ActiveX and Java, and it's
     primary design goal is security.

  o  The firewall doesn't need to run large applications like Squid.

  o  The internal Squid-server still gives you the benefit of caching.

  Disadvantages:

  o  The internal Squid proxyserver can't (and shouldn't) work with
     other parent or neighbor caches.

  o  Initial requests are slower because these go through http-gw, http-
     gw also does reverse lookups. Run a nameserver on the firewall or
     use an internal nameserver.

  --Rodney van den Oever 

  4.17.  What is ``HTTP_X_FORWARDED_FOR''?  Why does squid provide it to
  WWW servers, and how can I stop it?

  When a proxy-cache is used, a server does not see the connection
  coming from the originating client.  Many people like to implement
  access controls based on the client address.  To accommodate these
  people, Squid adds its own request header called "X-Forwarded-For"
  which looks like this:

          X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30

  Entries are always IP addresses, or the word unknown if the address
  could not be determined or if it has been disabled with the for-
  warded_for configuration option.

  We must note that access controls based on this header are extremely
  weak and simple to fake.  Anyone may hand-enter a request with any IP
  address whatsoever.  This is perhaps the reason why client IP
  addresses have been omitted from the HTTP/1.1 specification.

  4.18.  Can Squid anonymize HTTP requests?

  Yes it can, however the way of doing it has changed from earlier
  versions of squid. As of squid-2.2 a more customisable method has been
  introduced.  Please follow the instructions for the version of squid
  that you are using.  As a default, no anonymizing is done.

  If you choose to use the anonymizer you might wish to investigate the
  forwarded_for option to prevent the client address being disclosed.
  Failure to turn off the forwarded_for option will reduce the
  effectiveness of the anonymizer. Finally if you filter the User-Agent
  header using the fake_user_agent option can prevent some user problems
  as some sites require the User-Agent header.

  4.18.1.  Squid 2.2

  With the introduction of squid 2.2 the anonoymizer has become more
  customisable.  It now allows specification of exactly which headers
  will be allowed to pass.

  The new anonymizer uses the 'anonymize_headers' tag. It has two modes
  'deny' all and allow the specified headers. The following example will
  simulate the old paranoid mode.

                 anonymize_headers allow Allow Authorization Cache-Control
                 anonymize_headers allow Content-Encoding Content-Length
                 anonymize_headers allow Content-Type Date Expires Host
                 anonymize_headers allow If-Modified-Since Last-Modified
                 anonymize_headers allow Location Pragma Accept Charset
                 anonymize_headers allow Accept-Encoding Accept-Language
                 anonymize_headers allow Content-Language Mime-Version
                 anonymize_headers allow Retry-After Title Connection
                 anonymize_headers allow Proxy-Connection

  This will prevent any headers other than those listed from being
  passed by the proxy.

  The second mode is 'allow' all and deny the specified headers. The
  example replicates the old standard mode.

                 anonymize_headers deny From Referer Server
                 anonymize_headers deny User-Agent WWW-Authenticate Link

  It allows all headers to pass unless they are listed.

  You can not mix allow and deny in a squid configuration it is either
  one or the other!

  4.18.2.  Squid 2.1 and Earlier

  There are three modes: none, standard, and paranoid.  The mode is set
  with the http_anonymizer configuration option.

  With no anonymizing (the default), Squid forwards all request headers
  as received from the client, to the origin server (subject to the
  regular rules of HTTP).

  In the standard mode, Squid filters out the following specific request
  headers:

  o  From:

  o  Referer:

  o  Server:

  o  User-Agent:

  o  WWW-Authenticate:

  o  Link:

  In the paranoid mode, Squid allows only the following specific request
  headers:

  o  Allow:

  o  Authorization:

  o  Cache-Control:

  o  Content-Encoding:

  o  Content-Length:

  o  Content-Type:

  o  Date:

  o  Expires:

  o  Host:

  o  If-Modified-Since:

  o  Last-Modified:

  o  Location:

  o  Pragma:

  o  Accept:

  o  Accept-Charset:

  o  Accept-Encoding:

  o  Accept-Language:

  o  Content-Language:

  o  Mime-Version:

  o  Retry-After:

  o  Title:

  o  Connection:

  o  Proxy-Connection:

  References: Anonymous WWW 

  5.  Communication between browsers and Squid

  Most web browsers available today support proxying and are easily
  configured to use a Squid server as a proxy.  Some browsers support
  advanced features such as lists of domains or URL patterns that
  shouldn't be fetched through the proxy, or JavaScript automatic proxy
  configuration.

  5.1.  Netscape manual configuration

  Select Network Preferences from the Options menu.  On the Proxies
  page, click the radio button next to Manual Proxy Configuration and
  then click on the View button.  For each protocol that your Squid
  server supports (by default, HTTP, FTP, and gopher) enter the Squid
  server's hostname or IP address and put the HTTP port number for the
  Squid server (by default, 3128) in the Port column.  For any protocols
  that your Squid does not support, leave the fields blank.

  Here is a screen shot  of the Netscape
  Navigator manual proxy configuration screen.

  5.2.  Netscape automatic configuration

  Netscape Navigator's proxy configuration can be automated with
  JavaScript (for Navigator versions 2.0 or higher).  Select Network
  Preferences from the Options menu.  On the Proxies page, click the
  radio button next to Automatic Proxy Configuration and then fill in
  the URL for your JavaScript proxy configuration file in the text box.
  The box is too small, but the text will scroll to the right as you go.

  Here is a screen shot  of the Netscape
  Navigator automatic proxy configuration screen.

  You may also wish to consult Netscape's documentation for the
  Navigator JavaScript proxy configuration
  

  Here is a sample auto configuration JavaScript from Oskar Pearson:

  ______________________________________________________________________
  //We (www.is.co.za) run a central cache for our customers that they
  //access through a firewall - thus if they want to connect to their intranet
  //system (or anything in their domain at all) they have to connect
  //directly - hence all the "fiddling" to see if they are trying to connect
  //to their local domain.

  //Replace each occurrence of company.com with your domain name
  //and if you have some kind of intranet system, make sure
  //that you put it's name in place of "internal" below.

  //We also assume that your cache is called "cache.company.com", and
  //that it runs on port 8080. Change it down at the bottom.

  //(C) Oskar Pearson and the Internet Solution (http://www.is.co.za)

      function FindProxyForURL(url, host)
          {
              //If they have only specified a hostname, go directly.
              if (isPlainHostName(host))
                      return "DIRECT";

              //These connect directly if the machine they are trying to
              //connect to starts with "intranet" - ie http://intranet
              //Connect  directly if it is intranet.*
              //If you have another machine that you want them to
              //access directly, replace "internal*" with that
              //machine's name
              if (shExpMatch( host, "intranet*")||
                              shExpMatch(host, "internal*"))
                  return "DIRECT";

              //Connect directly to our domains (NB for Important News)
              if (dnsDomainIs( host,"company.com")||
              //If you have another domain that you wish to connect to
              //directly, put it in here
                              dnsDomainIs(host,"sistercompany.com"))
                  return "DIRECT";

              //So the error message "no such host" will appear through the
              //normal Netscape box - less support queries :)
              if (!isResolvable(host))
                      return "DIRECT";

              //We only cache http, ftp and gopher
              if (url.substring(0, 5) == "http:" ||
                              url.substring(0, 4) == "ftp:"||
                              url.substring(0, 7) == "gopher:")

              //Change the ":8080" to the port that your cache
              //runs on, and "cache.company.com" to the machine that
              //you run the cache on
                      return "PROXY cache.company.com:8080; DIRECT";

              //We don't cache WAIS
              if (url.substring(0, 5) == "wais:")
                      return "DIRECT";

              else
                      return "DIRECT";
          }
  ______________________________________________________________________

  5.3.  Lynx and Mosaic configuration

  For Mosaic and Lynx, you can set environment variables before starting
  the application.  For example (assuming csh or tcsh):

          % setenv http_proxy http://mycache.example.com:3128/
          % setenv gopher_proxy http://mycache.example.com:3128/
          % setenv ftp_proxy http://mycache.example.com:3128/

  For Lynx you can also edit the lynx.cfg file to configure proxy usage.
  This has the added benefit of causing all Lynx users on a system to
  access the proxy without making environment variable changes for each
  user.  For example:

          http_proxy:http://mycache.example.com:3128/
          ftp_proxy:http://mycache.example.com:3128/
          gopher_proxy:http://mycache.example.com:3128/

  5.4.  Redundant Auto-Proxy-Configuration

  There's one nasty side-effect to using auto-proxy scripts: if you
  start the web browser it will try and load the auto-proxy-script.

  If your script isn't available either because the web server hosting
  the script is down or your workstation can't reach the web server
  (e.g.  because you're working off-line with your notebook and just
  want to read a previously saved HTML-file) you'll get different errors
  depending on the browser you use.

  The Netscape browser will just return an error after a timeout (after
  that it tries to find the site 'www.proxy.com' if the script you use
  is called 'proxy.pac').

  The Microsoft Internet Explorer on the other hand won't even start, no
  window displays, only after about 1 minute it'll display a window
  asking you to go on with/without proxy configuration.

  The point is that your workstations always need to locate the proxy-
  script. I created some extra redundancy by hosting the script on two
  web servers (actually Apache web servers on the proxy servers
  themselves) and adding the following records to my primary nameserver:

          proxy   CNAME           proxy1
                  CNAME           proxy2

  The clients just refer to 'http://proxy/proxy.pac'.  This script looks
  like this:

  function FindProxyForURL(url,host)
  {
          // Hostname without domainname or host within our own domain?
          // Try them directly:
          // http://www.domain.com actually lives before the firewall, so
          // make an exception:
          if ((isPlainHostName(host)||dnsDomainIs( host,".domain.com")) &&
                  !localHostOrDomainIs(host, "www.domain.com"))
                  return "DIRECT";

          // First try proxy1 then proxy2. One server mostly caches '.com'
          // to make sure both servers are not
          // caching the same data in the normal situation. The other
          // server caches the other domains normally.
          // If one of 'm is down the client will try the other server.
          else if (shExpMatch(host, "*.com"))
                  return "PROXY proxy1.domain.com:8080; PROXY proxy2.domain.com:8081; DIRECT";
          return "PROXY proxy2.domain.com:8081; PROXY proxy1.domain.com:8080; DIRECT";
  }

  I made sure every client domain has the appropriate 'proxy' entry.
  The clients are automatically configured with two nameservers using
  DHCP.

       --Rodney van den Oever 

  5.5.  Microsoft Internet Explorer configuration

  Select Options from the View menu.  Click on the Connection tab.  Tick
  the Connect through Proxy Server option and hit the Proxy Settings
  button.  For each protocol that your Squid server supports (by
  default, HTTP, FTP, and gopher) enter the Squid server's hostname or
  IP address and put the HTTP port number for the Squid server (by
  default, 3128) in the Port column.  For any protocols that your Squid
  does not support, leave the fields blank.

  Here is a screen shot  of the Internet Explorer
  proxy configuration screen.

  Microsoft is also starting to support Netscape-style JavaScript
  automated proxy configuration.  As of now, only MSIE version 3.0a for
  Windows 3.1 and Windows NT 3.51 supports this feature (i.e., as of
  version 3.01 build 1225 for Windows 95 and NT 4.0, the feature was not
  included).

  If you have a version of MSIE that does have this feature, elect
  Options from the View menu.  Click on the Advanced tab.  In the lower
  left-hand corner, click on the Automatic Configuration button.  Fill
  in the URL for your JavaScript file in the dialog box it presents you.
  Then exit MSIE and restart it for the changes to take effect.  MSIE
  will reload the JavaScript file every time it starts.

  5.6.  Netmanage Internet Chameleon WebSurfer configuration

  Netmanage WebSurfer supports manual proxy configuration and exclusion
  lists for hosts or domains that should not be fetched via proxy (this
  information is current as of WebSurfer 5.0).  Select Preferences from
  the Settings menu.  Click on the Proxies tab.  Select the Use Proxy
  options for HTTP, FTP, and gopher.  For each protocol that enter the
  Squid server's hostname or IP address and put the HTTP port number for
  the Squid server (by default, 3128) in the Port boxes.  For any
  protocols that your Squid does not support, leave the fields blank.

  Take a look at this screen shot  if the
  instructions confused you.

  On the same configuration window, you'll find a button to bring up the
  exclusion list dialog box, which will let you enter some hosts or
  domains that you don't want fetched via proxy.  It should be self-
  explanatory, but you might look at this screen shot
   just for fun anyway.

  5.7.  Opera 2.12 proxy configuration

  Select Proxy Servers... from the Preferences menu.  Check each
  protocol that your Squid server supports (by default, HTTP, FTP, and
  Gopher) and enter the Squid server's address as hostname:port (e.g.
  mycache.example.com:3128 or 123.45.67.89:3128).  Click on Okay to
  accept the setup.

  Notes:

  o  Opera 2.12 doesn't support gopher on its own, but requires a proxy;
     therefore Squid's gopher proxying can extend the utility of your
     Opera immensely.

  o  Unfortunately, Opera 2.12 chokes on some HTTP requests, for example
     abuse.net .  At the moment I think it
     has something to do with cookies.  If you have trouble with a site,
     try disabling the HTTP proxying by unchecking that protocol in the
     Preferences|Proxy Servers... dialogue.  Opera will remember the
     address, so reenabling is easy.

       --Hume Smith 

  5.8.  How do I tell Squid to use a specific username for FTP urls?

  Insert your username in the host part of the URL, for example:

          ftp://joecool@ftp.foo.org/

  Squid should then prompt you for your account password.  Alterna-
  tively, you can specify both your username and password in the URL
  itself:

          ftp://joecool:secret@ftp.foo.org/

  However, we certainly do not recommend this, as it could be very easy
  for someone to see or grab your password.

  5.9.  Configuring Browsers for WPAD

  by Mark Reynolds 

  You may like to start by reading the Internet-Draft
   that
  describes WPAD.

  After reading the 8 steps below, if you don't understand any of the
  terms or methods mentioned, you probably shouldn't be doing this.
  Implementing wpad requires you to fully understand:

  1. web server installations and modifications.

  2. squid proxy server (or others) installation etc.

  3. Domain Name System maintenance etc.

     Please don't bombard the squid list with web server or dns
     questions. See your system administrator, or do some more research
     on those topics.

  This is not a recommendation for any product or version. As far as I
  know IE5 is the only browser out now implementing wpad. I think wpad
  is an excellent feature that will return several hours of life per
  month.  Hopefully, all browser clients will implement it as well. But
  it will take years for all the older browsers to fade away though.

  I have only focused on the domain name method, to the exclusion of the
  DHCP method. I think the dns method might be easier for most people.
  I don't currently, and may never, fully understand wpad and IE5, but
  this method worked for me. It may work for you.

  But if you'd rather just have a go ...

  1. Create a standard ``netscape auto proxy config file''.  The sample
     provided there is more than adequate to get you going.  No doubt
     all the other load balancing and backup scripts will be fine also.

  2. Store the resultant file in the document root directory of a handy
     web server as wpad.dat (Not proxy.pac as you may have previously
     done.)

  3. If you do nothing more, a url like
     http://www.your.domain.name/wpad.dat should bring up the script
     text in your browser window.

  4. Insert the following entry into your web server mime.types file.
     Maybe in addition to your pac file type, if you've done this
     before.

             application/x-ns-proxy-autoconfig       dat

  And then restart your web server, for new mime type to work.

  5. Assuming Internet Explorer 5, under Tools, Internet Options,
     Connections, Settings or Lan Settings, set ONLY Use Automatic
     Configuration Script to be the URL for where your new wpad.dat file
     can be found.  i.e.  http://www.your.domain.name/wpad.dat Test that
     that all works as per your script and network.  There's no point
     continuing until this works ...

  6. Create/install/implement a DNS record so that wpad.your.domain.name
     resolves to the host above where you have a functioning auto config
     script running. You should now be able to use
     http://wpad.your.domain.name/wpad.dat as the Auto Config Script
     location in step 5 above.
  7. And finally, go back to the setup screen detailed in 5 above, and
     choose nothing but the Automatically Detect Settings option,
     turning everything else off. Best to restart IE5, as you normally
     do with any Microsoft product... And it should all work. Did for me
     anyway.

  8. One final question might be 'Which domain name does the client
     (IE5) use for the wpad... lookup?' I think it is determined after
     doing a reverse DNS lookup on the allocated IP number. Else it
     might use the default domain name... Not sure. Might require
     reverse lookup tables for all dialup IP number ranges, which is a
     good thing anyways.

  Anybody using these steps to install and test, please feel free to
  make notes, corrections or additions for improvements, and post back
  to the squid list...

  There are probably many more tricks and tips which hopefully will be
  detailed here in the future. Things like wpad.dat files being served
  from the proxy server themselves, maybe with a round robin dns setup
  for the WPAD host.

  6.  Squid Log Files

  The logs are a valuable source of information about Squid workloads
  and performance. The logs record not only access information, but also
  system configuration errors and resource consumption (eg, memory, disk
  space). There are several log file maintained by Squid. Some have to
  be explicitely activated during compile time, others can safely be
  deactivated during run-time.

  There are a few basic points common to all log files. The time stamps
  logged into the log files are usually UTC seconds unless stated
  otherwise.  The initial time stamp usually contains a millisecond
  extension.

  The frequent time lookups on busy caches may have a performance impact
  on some systems. The compile time configuration option --enable-time-
  hack makes Squid only look up a new time in one second intervals. The
  implementation uses Unix's alarm() functionality.  Note that the
  resolution of logged times is much coarser afterwards, and may not
  suffice for some log file analysis programs.  Usually there is no need
  to fiddle with the timestamp hack.

  6.1.  squid.out

  If you run your Squid from the RunCache script, a file squid.out
  contains the Squid startup times, and also all fatal errors, e.g. as
  produced by an assert() failure. If you are not using RunCache, you
  will not see such a file.

  6.2.  cache.log

  The cache.log file contains the debug and error messages that Squid
  generates. If you start your Squid using the default RunCache script,
  or start it with the -s command line option, a copy of certain
  messages will go into your syslog facilities. It is a matter of
  personal preferences to use a separate file for the squid log data.
  From the area of automatic log file analysis, the cache.log file does
  not have much to offer. You will usually look into this file for
  automated error reports, when programming Squid, testing new features,
  or searching for reasons of a perceived misbehaviour, etc.

  6.3.  useragent.log

  The user agent log file is only maintained, if

  1. you configured the compile time --enable-useragent-log option, and

  2. you pointed the useragent_log configuration option to a file.

  From the user agent log file you are able to find out about
  distributation of browsers of your clients. Using this option in
  conjunction with a loaded production squid might not be the best of
  all ideas.

  6.4.  store.log

  The store.log file covers the objects currently kept on disk or
  removed ones. As a kind of transaction log it is ususally used for
  debugging purposes. A definitive statement, whether an object resides
  on your disks is only possible after analysing the complete log file.
  The release (deletion) of an object may be logged at a later time than
  the swap out (save to disk).

  The store.log file may be of interest to log file analysis which looks
  into the objects on your disks and the time they spend there, or how
  many times a hot object was accessed. The latter may be covered by
  another log file, too. With knowledge of the cache_dir configuration
  option, this log file allows for a URL to filename mapping without
  recursing your cache disks. However, the Squid developers recommend to
  treat store.log primarily as a debug file, and so should you, unless
  you know what you are doing.

  The print format for a store log entry (one line) consists of eleven
  space-separated columns, compare with the storeLog() function in file
  src/store_log.c:

      "%9d.%03d %-7s %08X %4d %9d %9d %9d %s %d/%d %s %s\n"

     time
        The timestamp when the line was logged in UTC with a millisecond
        fraction.

     action
        The action the object was sumitted to, compare with
        src/store_log.c:

     o  CREATE Seems to be unused.

     o  RELEASE The object was removed from the cache (see also ``file
        number'').

     o  SWAPOUT The object was saved to disk.

     o  SWAPIN The object existed on disk and was read into memory.

     file number

        The file number for the object storage file. Please note that
        the path to this file is calculated according to your cache_dir
        configuration.

        A file number of FFFFFFFF denominates "memory only" objects. Any
        action code for such a file number refers to an object which
        existed only in memory, not on disk.  For instance, if a RELEASE
        code was logged with file number FFFFFFFF, the object existed
        only in memory, and was released from memory.

     status
        The HTTP reply status code.

     datehdr

        The value of the HTTP "Date: " reply header.

     lastmod
        The value of the HTTP "Last-Modified: " reply header.

     expires

        The value of the HTTP "Expires: " reply header.

     type
        The HTTP "Content-Type" major value, or "unknown" if it cannot
        be determined.

     sizes
        This column consists of two slash separated fields:

        1. The advertised content length from the HTTP "Content-Length:
           " reply header.

        2. The size actually read.

        If the advertised (or expected) length is missing, it will be
        set to zero. If the advertised length is not zero, but not equal
        to the real length, the object will be realeased from the cache.

     method
        The request method for the object, e.g. GET.

     key

        The key to the object, usually the URL.

  The timestamp format for the columns ``Date'' to ``Expires'' are all
  expressed in UTC seconds. The actual values are parsed from the HTTP
  reply headers. An unparsable header is represented by a value of -1,
  and a missing header is represented by a value of -2.

  The column ``key'' usually contains just the URL of the object. Some
  objects though will never become public. Thus the key is said to
  include a unique integer number and the request method in addition to
  the URL.

  6.5.  hierarchy.log

  This logfile exists for Squid-1.0 only.  The format is

          [date] URL peerstatus peerhost

  6.6.  access.log

  Most log file analysis program are based on the entries in access.log.
  Currently, there are two file formats possible for the log file,
  depending on your configuration for the emulate_httpd_log option. By
  default, Squid will log in its native log file format. If the above
  option is enabled, Squid will log in the common log file format as
  defined by the CERN web daemon.

  The common log file format contains other information than the native
  log file, and less. The native format contains more information for
  the admin interested in cache evaluation.

  6.6.1.  The common log file format

  The Common Logfile Format
   is used by numerous HTTP servers. This format consists
  of the following seven fields:

          remotehost rfc931 authuser [date] "method URL" status bytes

  It is parsable by a variety of tools. The common format contains
  different information than the native log file format. The HTTP
  version is logged, which is not logged in native log file format.

  6.6.2.  The native log file format

  The native format is different for different major versions of Squid.
  For Squid-1.0 it is:

          time elapsed remotehost code/status/peerstatus bytes method URL

  For Squid-1.1, the information from the hierarchy.log was moved into
  access.log.  The format is:

          time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type

  For Squid-2 the columns stay the same, though the content within may
  change a little.

  The native log file format logs more and different information than
  the common log file format: the request duration, some timeout
  information, the next upstream server address, and the content type.

  There exist tools, which convert one file format into the other.
  Please mind that even though the log formats share most information,
  both formats contain information which is not part of the other
  format, and thus this part of the information is lost when converting.
  Especially converting back and forth is not possible without loss.

  squid2common.pl is a conversion utility, which converts any of the
  squid log file formats into the old CERN proxy style output. There
  exist tools to analyse, evaluate and graph results from that format.

  6.6.3.  access.log native format in detail

  It is recommended though to use Squid's native log format due to its
  greater amount of information made available for later analysis. The
  print format line for native access.log entries looks like this:

      "%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"

  Therefore, an access.log entry usually consists of (at least) 10
  columns separated by one ore more spaces:

     time
        A Unix timestamp as UTC seconds with a millisecond resolution.
        You can convert Unix timestamps into something more human
        readable using this short perl script:

                #! /usr/bin/perl -p
                s/^\d+\.\d+/localtime $&/e;

     duration
        The elapsed time considers how many milliseconds the transaction
        busied the cache. It differs in interpretation between TCP and
        UDP:

     o  For HTTP/1.0, this is basically the time between accept() and
        close().

     o  For persistent connections, this ought to be the time between
        scheduling the reply and finishing sending it.
     o  For ICP, this is the time between scheduling a reply and
        actually sending it.

        Please note that the entries are logged after the reply finished
        being sent, not during the lifetime of the transaction.

     client address
        The IP address of the requesting instance, the client IP
        address. The client_netmask configuration option can distort the
        clients for data protection reasons, but it makes analysis more
        difficult. Often it is better to use one of the log file
        anonymizers.

        Also, the log_fqdn configuration option may log the fully
        qualified domain name of the client instead of the dotted quad.
        The use of that option is discouraged due to its performance
        impact.

     result codes

        This column is made up of two entries separated by a slash. This
        column encodes the transaction result:

        1. The cache result of the request contains information on the
           kind of request, how it was satisfied, or in what way it
           failed. Please refer to section ``Squid result codes'' for
           valid symbolic result codes.

           Several codes from older versions are no longer available,
           were renamed, or split. Especially the ERR_ codes do not seem
           to appear in the log file any more. Also refer to section
           ``Squid result codes'' for details on the codes no longer
           available in Squid-2.

           The NOVM versions and Squid-2 also rely on the Unix buffer
           cache, thus you will see less TCP_MEM_HITs than with a
           Squid-1.  Basically, the NOVM feature relies on read() to
           obtain an object, but due to the kernel buffer cache, no disk
           activity is needed.  Only small objects (below 8KByte) are
           kept in Squid's part of main memory.

        2. The status part contains the HTTP result codes with some
           Squid specific extensions. Squid uses a subset of the RFC
           defined error codes for HTTP. Refer to section ``status
           codes'' for details of the status codes recognized by a
           Squid-2.

     bytes
        The size is the amount of data delivered to the client. Mind
        that this does not constitute the net object size, as headers
        are also counted. Also, failed requests may deliver an error
        page, the size of which is also logged here.

     request method
        The request method to obtain an object. Please refer to section
        ``'' for available methods.  If you turned off log_icp_queries
        in your configuration, you will not see (and thus unable to
        analyse) ICP exchanges. The PURGE method is only available, if
        you have an ACL for ``method purge'' enabled in your
        configuration file.

     URL
        This column contains the URL requested. Please note that the log
        file may contain whitespaces for the URI. The default
        configuration for uri_whitespace denies whitespaces, though.

     rfc931
        The eigth column may contain the ident lookups for the
        requesting client. Since ident lookups have performance impact,
        the default configuration turns ident_loookups off. If turned
        off, or no ident information is available, a ``-'' will be
        logged.

     hierarchy code
        The hierarchy information consists of three items:

        1. Any hierarchy tag may be prefixed with TIMEOUT_, if the
           timeout occurs waiting for all ICP replies to return from the
           neighbours. The timeout is either dynamic, if the
           icp_query_timeout was not set, or the time configured there
           has run up.

        2. A code that explains how the request was handled, e.g. by
           forwarding it to a peer, or going straight to the source.
           Refer to section ``'' for details on hierarchy codes and
           removed hierarchy codes.

        3. The name of the host the object was requested from. This host
           may be the origin site, a parent or any other peer. Also note
           that the hostname may be numerical.

     type
        The content type of the object as seen in the HTTP reply header.
        Please note that ICP exchanges usually don't have any content
        type, and thus are logged ``-''. Also, some weird replies have
        content types ``:'' or even empty ones.

  There may be two more columns in the access.log, if the (debug) option
  log_mime_headers is enabled In this case, the HTTP request headers are
  logged between a ``['' and a ``]'', and the HTTP reply headers are
  also logged between ``['' and ``]''. All control characters like CR
  and LF are URL-escaped, but spaces are not escaped! Parsers should
  watch out for this.

  6.7.  Squid result codes

  The TCP_ codes refer to requests on the HTTP port (usually 3128). The
  UDP_ codes refer to requests on the ICP port (usually 3130). If ICP
  logging was disabled using the log_icp_queries option, no ICP replies
  will be logged.

  The following result codes were taken from a Squid-2, compare with the
  log_tags struct in src/access_log.c:

     TCP_HIT
        A valid copy of the requested object was in the cache.

     TCP_MISS
        The requested object was not in the cache.

     TCP_REFRESH_HIT
        The requested object was cached but STALE. The IMS query for the
        object resulted in "304 not modified".

     TCP_REF_FAIL_HIT
        The requested object was cached but STALE. The IMS query failed
        and the stale object was delivered.

     TCP_REFRESH_MISS
        The requested object was cached but STALE. The IMS query
        returned the new content.

     TCP_CLIENT_REFRESH_MISS

        The client issued a "no-cache" pragma, or some analogous cache
        control command along with the request. Thus, the cache has to
        refetch the object.

     TCP_IMS_HIT

        The client issued an IMS request for an object which was in the
        cache and fresh.

     TCP_SWAPFAIL_MISS

        The object was believed to be in the cache, but could not be
        accessed.

     TCP_NEGATIVE_HIT
        Request for a negatively cached object, e.g. "404 not found",
        for which the cache believes to know that it is inaccessible.
        Also refer to the explainations for negative_ttl in your
        squid.conf file.

     TCP_MEM_HIT
        A valid copy of the requested object was in the cache and it was
        in memory, thus avoiding disk accesses.

     TCP_DENIED
        Access was denied for this request.

     TCP_OFFLINE_HIT
        The requested object was retrieved from the cache during offline
        mode. The offline mode never validates any object, see
        offline_mode in squid.conf file.

     UDP_HIT
        A valid copy of the requested object was in the cache.
     UDP_MISS
        The requested object is not in this cache.

     UDP_DENIED
        Access was denied for this request.

     UDP_INVALID
        An invalid request was received.

     UDP_MISS_NOFETCH

        During "-Y" startup, or during frequent failures, a cache in hit
        only mode will return either UDP_HIT or this code. Neighbours
        will thus only fetch hits.

     NONE
        Seen with errors and cachemgr requests.

  The following codes are no longer available in Squid-2:

     ERR_*
        Errors are now contained in the status code.

     TCP_CLIENT_REFRESH
        See: ``TCP_CLIENT_REFRESH_MISS''.

     TCP_SWAPFAIL
        See: ``TCP_SWAPFAIL_MISS''.

     TCP_IMS_MISS
        Deleted, ``TCP_IMS_HIT'' used instead.

     UDP_HIT_OBJ
        Hit objects are no longer available.

     UDP_RELOADING
        See: ``UDP_MISS_NOFETCH''.

  6.8.  HTTP status codes

  These are taken from RFC 2616  and verified for Squid. Squid-2 uses
  almost all codes except 307 (Temporary Redirect), 416 (Request Range
  Not Satisfiable), and 417 (Expectation Failed). Extra codes include 0
  for a result code being unavailable, and 600 to signal an invalid
  header, a proxy error. Also, some definitions were added as for RFC
  2518 
  (WebDAV).  Yes, there are really two entries for status code 424,
  compare with http_status in src/enums.h:

   000 Used mostly with UDP traffic.

   100 Continue
   101 Switching Protocols
  *102 Processing

   200 OK
   201 Created
   202 Accepted
   203 Non-Authoritative Information
   204 No Content
   205 Reset Content
   206 Partial Content
  *207 Multi Status

   300 Multiple Choices
   301 Moved Permanently
   302 Moved Temporarily
   303 See Other
   304 Not Modified
   305 Use Proxy
  [307 Temporary Redirect]

   400 Bad Request
   401 Unauthorized
   402 Payment Required
   403 Forbidden
   404 Not Found
   405 Method Not Allowed
   406 Not Acceptable
   407 Proxy Authentication Required
   408 Request Timeout
   409 Conflict
   410 Gone
   411 Length Required
   412 Precondition Failed
   413 Request Entity Too Large
   414 Request URI Too Large
   415 Unsupported Media Type
  [416 Request Range Not Satisfiable]
  [417 Expectation Failed]
  *424 Locked
  *424 Failed Dependency
  *433 Unprocessable Entity

   500 Internal Server Error
   501 Not Implemented
   502 Bad Gateway
   503 Service Unavailable
   504 Gateway Timeout
   505 HTTP Version Not Supported
  *507 Insufficient Storage

   600 Squid header parsing error

  6.9.  Request methods

  Squid recognizes several request methods as defined in RFC 2616
  .
  Newer versions of Squid (2.2.STABLE5 and above) also recognize RFC
  2518 
  ``HTTP Extensions for Distributed Authoring -- WEBDAV'' extensions.

   method    defined    cachabil.  meaning
   --------- ---------- ---------- -------------------------------------------
   GET       HTTP/0.9   possibly   object retrieval and simple searches.
   HEAD      HTTP/1.0   possibly   metadata retrieval.
   POST      HTTP/1.0   CC or Exp. submit data (to a program).
   PUT       HTTP/1.1   never      upload data (e.g. to a file).
   DELETE    HTTP/1.1   never      remove resource (e.g. file).
   TRACE     HTTP/1.1   never      appl. layer trace of request route.
   OPTIONS   HTTP/1.1   never      request available comm. options.
   CONNECT   HTTP/1.1r3 never      tunnel SSL connection.

   ICP_QUERY Squid      never      used for ICP based exchanges.
   PURGE     Squid      never      remove object from cache.

   PROPFIND  rfc2518    ?          retrieve properties of an object.
   PROPATCH  rfc2518    ?          change properties of an object.
   MKCOL     rfc2518    never      create a new collection.
   MOVE      rfc2518    never      create a duplicate of src in dst.
   COPY      rfc2518    never      atomically move src to dst.
   LOCK      rfc2518    never      lock an object against modifications.
   UNLOCK    rfc2518    never      unlock an object.

  6.10.  Hierarchy Codes

  The following hierarchy codes are used with Squid-2:

     NONE
        For TCP HIT, TCP failures, cachemgr requests and all UDP
        requests, there is no hierarchy information.

     DIRECT
        The object was fetched from the origin server.

     SIBLING_HIT
        The object was fetched from a sibling cache which replied with
        UDP_HIT.

     PARENT_HIT
        The object was requested from a parent cache which replied with
        UDP_HIT.

     DEFAULT_PARENT
        No ICP queries were sent. This parent was chosen because it was
        marked ``default'' in the config file.

     SINGLE_PARENT
        The object was requested from the only parent appropriate for
        the given URL.

     FIRST_UP_PARENT
        The object was fetched from the first parent in the list of
        parents.

     NO_PARENT_DIRECT
        The object was fetched from the origin server, because no
        parents existed for the given URL.

     FIRST_PARENT_MISS
        The object was fetched from the parent with the fastest
        (possibly weighted) round trip time.

     CLOSEST_PARENT_MISS
        This parent was chosen, because it included the the lowest RTT
        measurement to the origin server. See also the closests-only
        peer configuration option.

     CLOSEST_PARENT
        The parent selection was based on our own RTT measurements.

     CLOSEST_DIRECT
        Our own RTT measurements returned a shorter time than any
        parent.

     NO_DIRECT_FAIL
        The object could not be requested because of a firewall
        configuration, see also never_direct and related material, and
        no parents were available.

     SOURCE_FASTEST
        The origin site was chosen, because the source ping arrived
        fastest.

     ROUNDROBIN_PARENT
        No ICP replies were received from any parent. The parent was
        chosen, because it was marked for round robin in the config file
        and had the lowest usage count.

     CACHE_DIGEST_HIT
        The peer was chosen, because the cache digest predicted a hit.
        This option was later replaced in order to distinguish between
        parents and siblings.

     CD_PARENT_HIT
        The parent was chosen, because the cache digest predicted a hit.

     CD_SIBLING_HIT
        The sibling was chosen, because the cache digest predicted a
        hit.

     NO_CACHE_DIGEST_DIRECT
        This output seems to be unused?

     CARP
        The peer was selected by CARP.

     ANY_PARENT
        part of src/peer_select.c:hier_strings[].

     INVALID CODE
        part of src/peer_select.c:hier_strings[].

  Almost any of these may be preceded by 'TIMEOUT_' if the two-second
  (default) timeout occurs waiting for all ICP replies to arrive from
  neighbors, see also the icp_query_timeout configuration option.

  The following hierarchy codes were removed from Squid-2:

  code                  meaning
  --------------------  -------------------------------------------------
  PARENT_UDP_HIT_OBJ    hit objects are not longer available.
  SIBLING_UDP_HIT_OBJ   hit objects are not longer available.
  SSL_PARENT_MISS       SSL can now be handled by squid.
  FIREWALL_IP_DIRECT    No special logging for hosts inside the firewall.
  LOCAL_IP_DIRECT       No special logging for local networks.

  6.11.  cache/log  (Squid-1.x)

  This file has a rather unfortunate name.  It also is often called the
  swap log.  It is a record of every cache object written to disk.  It
  is read when Squid starts up to ``reload'' the cache.  If you remove
  this file when squid is NOT running, you will effectively wipe out
  your cache contents.  If you remove this file while squid IS running,
  you can easily recreate it.  The safest way is to simply shutdown the
  running process:

          % squid -k shutdown

  This will disrupt service, but at least you will have your swap log
  back.  Alternatively, you can tell squid to rotate its log files.
  This also causes a clean swap log to be written.

          % squid -k rotate

  For Squid-1.1, there are six fields:

  1. fileno: The swap file number holding the object data.  This is
     mapped to a pathname on your filesystem.

  2. timestamp: This is the time when the object was last verified to be
     current.  The time is a hexadecimal representation of Unix time.

  3. expires: This is the value of the Expires header in the HTTP reply.
     If an Expires header was not present, this will be -2 or fffffffe.
     If the Expires header was present, but invalid (unparsable), this
     will be -1 or ffffffff.

  4. lastmod: Value of the HTTP reply Last-Modified header.  If missing
     it will be -2, if invalid it will be -1.

  5. size: Size of the object, including headers.

  6. url: The URL naming this object.

  6.12.  swap.state  (Squid-2.x)

  In Squid-2, the swap log file is now called swap.state.  This is a
  binary file that includes MD5 checksums, and StoreEntry fields.
  Please see the Programmers Guide <../Prog-Guide/> for information on
  the contents and format of that file.

  If you remove swap.state while Squid is running, simply send Squid the
  signal to rotate its log files:

          % squid -k rotate

  Alternatively, you can tell Squid to shutdown and it will rewrite this
  file before it exits.

  If you remove the swap.state while Squid is not running, you will not
  lose your entire cache.  In this case, Squid will scan all of the
  cache directories and read each swap file to rebuild the cache.  This
  can take a very long time, so you'll have to be patient.

  By default the swap.state file is stored in the top-level of each
  cache_dir.  You can move the logs to a different location with the
  cache_swap_log option.

  6.13.  Which log files can I delete safely?

  You should never delete access.log, store.log, cache.log, or
  swap.state while Squid is running.  With Unix, you can delete a file
  when a process has the file opened.  However, the filesystem space is
  not reclaimed until the process closes the file.

  If you accidentally delete swap.state while Squid is running, you can
  recover it by following the instructions in the previous questions.
  If you delete the others while Squid is running, you can not recover
  them.

  The correct way to maintain your log files is with Squid's ``rotate''
  feature.  You should rotate your log files at least once per day.  The
  current log files are closed and then renamed with numeric extensions
  (.0, .1, etc).  If you want to, you can write your own scripts to
  archive or remove the old log files.  If not, Squid will only keep up
  to logfile_rotate versions of each log file.  The logfile rotation
  procedure also writes a clean swap.state file, but it does not leave
  numbered versions of the old files.

  To rotate Squid's logs, simple use this command:

          squid -k rotate

  For example, use this cron entry to rotate the logs at midnight:

          0 0 * * * /usr/local/squid/bin/squid -k rotate

  6.14.  How can I disable Squid's log files?

  To disable access.log:

          cache_access_log /dev/null

  To disable store.log:

          cache_store_log none

  It is a bad idea to disable the cache.log because this file contains
  many important status and debugging messages.   However, if you really
  want to, you can: To disable access.log:

          cache_log /dev/null

  6.15.  My log files get very big!

  You need to rotate your log files with a cron job.  For example:

          0 0 * * * /usr/local/squid/bin/squid -k rotate

  6.16.  Managing log files

  The preferred log file for analysis is the access.log file in native
  format. For long term evaluations, the log file should be obtained at
  regular intervals. Squid offers an easy to use API for rotating log
  files, in order that they may be moved (or removed) without disturbing
  the cache operations in progress. The procedures were described above.

  Depending on the disk space allocated for log file storage, it is
  recommended to set up a cron job which rotates the log files every 24,
  12, or 8 hour. You will need to set your logfile_rotate to a
  sufficiently large number. During a time of some idleness, you can
  safely transfer the log files to your analysis host in one burst.

  Before transport, the log files can be compressed during off-peak
  time. On the analysis host, the log file are concatinated into one
  file, so one file for 24 hours is the yield. Also note that with
  log_icp_queries enabled, you might have around 1 GB of uncompressed
  log information per day and busy cache. Look into you cache manager
  info page to make an educated guess on the size of your log files.

  The EU project DESIRE  developed some some
  basic rules
   to
  obey when handling and processing log files:

  o  Respect the privacy of your clients when publishing results.

  o  Keep logs unavailable unless anonymized. Most countries have laws
     on privacy protection, and some even on how long you are legally
     allowed to keep certain kinds of information.

  o  Rotate and process log files at least once a day. Even if you don't
     process the log files, they will grow quite large, see section
     ``''. If you rely on processing the log files, reserve a large
     enough partition solely for log files.

  o  Keep the size in mind when processing. It might take longer to
     process log files than to generate them!

  o  Limit yourself to the numbers you are interested in. There is data
     beyond your dreams available in your log file, some quite obvious,
     others by combination of different views. Here are some examples
     for figures to watch:

  o  The hosts using your cache.

  o  The elapsed time for HTTP requests - this is the latency the user
     sees. Usually, you will want to make a distinction for HITs and
     MISSes and overall times. Also, medians are preferred over
     averages.

  o  The requests handled per interval (e.g. second, minute or hour).

  6.17.  Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?

  This message means that the requested object was in ``Delete Behind''
  mode and the user aborted the transfer.  An object will go into
  ``Delete Behind'' mode if

  o  It is larger than maximum_object_size

  o  It is being fetched from a neighbor which has the proxy-only option
     set.

  6.18.  What does ERR_LIFETIME_EXP mean?

  This means that a timeout occurred while the object was being
  transferred.  Most likely the retrieval of this object was very slow
  (or it stalled before finishing) and the user aborted the request.
  However, depending on your settings for quick_abort, Squid may have
  continued to try retrieving the object.  Squid imposes a maximum
  amount of time on all open sockets, so after some amount of time the
  stalled request was aborted and logged win an ERR_LIFETIME_EXP
  message.

  6.19.  Retrieving ``lost'' files from the cache

       I've been asked to retrieve an object which was accidentally
       destroyed at the source for recovery.  So, how do I figure
       out where the things are so I can copy them out and strip
       off the headers?

  The following method applies only to the Squid-1.1 versions:

  Use grep to find the named object (Url) in the ``cache/log'' file.
  The first field in this file is an integer file number.

  Then, find the file fileno-to-pathname.pl from the ``scripts''
  directory of the Squid source distribution.  The usage is

          perl fileno-to-pathname.pl [-c squid.conf]

  file numbers are read on stdin, and pathnames are printed on stdout.

  7.  Operational issues

  7.1.  How do I see system level Squid statistics?

  The Squid distribution includes a CGI utility called cachemgr.cgi
  which can be used to view squid statistics with a web browser.  This
  document has a section devoted to cachemgr.cgi usage which you should
  consult for more information.

  7.2.  How can I find the biggest objects in my cache?

          sort -r -n +4 -5 access.log | awk '{print $5, $7}' | head -25

  7.3.  I want to restart Squid with a clean cache

  Note: The information here is current for version 2.2.

  First of all, you must stop Squid of course.  You can use the command:

          % squid -k shutdown

  The fastest way to restart with an entirely clean cache is to over
  write the swap.state files for each cache_dir in  your config file.
  Note, you can not just remove the swap.state file, or truncate it to
  zero size.  Instead, you should put just one byte of garbage there.
  For example:

          % echo "" > /cache1/swap.state

  Repeat that for every cache_dir, then restart Squid.  Be sure to leave
  the swap.state file with the same owner and permissions that it had
  before!

  Another way, which takes longer, is to have squid recreate all the
  cache_dir directories.  But first you must move the existing
  directories out of the way.  For example,  you can try this:

          % cd /cache1
          % mkdir JUNK
          % mv ?? swap.state* JUNK
          % rm -rf JUNK &

  Repeat this for your other cache_dir's, then tell Squid to create new
  directories:

          % squid -z

  7.4.  How can I proxy/cache Real Audio?

  by Rodney van den Oever , and James R
  Grinter 

  o  Point the RealPlayer at your Squid server's HTTP port (e.g. 3128).

  o  Using the Preferences->Transport tab, select Use specified
     transports and with the Specified Transports button, select use
     HTTP Only.

  The RealPlayer (and RealPlayer Plus) manual states:

  Use HTTP Only
          Select this option if you are behind a firewall and cannot
          receive data through TCP.  All data will be streamed through
          HTTP.

          Note:  You may not be able to receive some content if you select
          this option.

  Again, from the documentation:

  RealPlayer 4.0 identifies itself to the firewall when making a
  request for content to a RealServer.  The following string is
  attached to any URL that the Player requests using HTTP GET:

          /SmpDsBhgRl

  Thus, to identify an HTTP GET request from the RealPlayer, look
  for:

          http://[^/]+/SmpDsBhgRl

  The Player can also be identified by the mime type in a POST to
  the RealServer.  The RealPlayer POST has the following mime
  type:

          "application/x-pncmd"

  Note that the first request is a POST, and the second has a '?' in the
  URL, so standard Squid configurations would treat it as non-cachable.
  It also looks rather ``magic.''

  HTTP is an alternative delivery mechanism introduced with version 3
  players, and it allows a reasonable approximation to ``streaming''
  data - that is playing it as you receive it.  For more details, see
  their notes on HTTP Pseudo-Streaming
  .

  It isn't available in the general case: only if someone has made the
  realaudio file available via an HTTP server, or they're using a
  version 4 server, they've switched it on, and you're using a version 4
  client. If someone has made the file available via their HTTP server,
  then it'll be cachable. Otherwise, it won't be (as far as we can
  tell.)

  The more common RealAudio link connects via their own pnm: method and
  is transferred using their proprietary protocol (via TCP or UDP) and
  not using HTTP. It can't be cached nor proxied by Squid, and requires
  something such as the simple proxy that Progressive Networks
  themselves have made available, if you're in a firewall/no direct
  route situation. Their product does not cache (and I don't know of any
  software available that does.)

  Some confusion arises because there is also a configuration option to
  use an HTTP proxy (such as Squid) with the Realaudio/RealVideo
  players. This is because the players can fetch the ``.ram'' file that
  contains the pnm: reference for the audio/video stream. They fetch
  that .ram file from an HTTP server, using HTTP.

  7.5.  How can I purge an object from my cache?

  Squid does not allow you to purge objects unless it is configured with
  access controls in squid.conf.  First you must add something like

          acl PURGE method purge
          acl localhost src 127.0.0.1
          http_access allow purge localhost
          http_access deny purge

  The above only allows purge requests which come from the local host
  and denies all other purge requests.

  To purge an object, you can use the client program:

          client -m PURGE http://www.miscreant.com/

  If the purge was successful, you will see a ``200 OK'' response:

          HTTP/1.0 200 OK
          Date: Thu, 17 Jul 1997 16:03:32 GMT
          Server: Squid/1.1.14

  If the object was not found in the cache, you will see a ``404 Not
  Found'' response:

          HTTP/1.0 404 Not Found
          Date: Thu, 17 Jul 1997 16:03:22 GMT
          Server: Squid/1.1.14

  7.6.  Using ICMP to Measure the Network

  As of version 1.1.9, Squid is able to utilize ICMP Round-Trip-Time
  (RTT) measurements to select the optimal location to forward a cache
  miss.  Previously, cache misses would be forwarded to the parent cache
  which returned the first ICP reply message.  These were logged with
  FIRST_PARENT_MISS in the access.log file.  Now we can select the
  parent which is closest (RTT-wise) to the origin server.

  7.6.1.  Supporting ICMP in your Squid cache

  It is more important that your parent caches enable the ICMP features.
  If you are acting as a parent, then you may want to enable ICMP on
  your cache.  Also, if your cache makes RTT measurements, it will fetch
  objects directly if your cache is closer than any of the parents.

  If you want your Squid cache to measure RTT's to origin servers, Squid
  must be compiled with the USE_ICMP option.  This is easily
  accomplished by uncommenting "-DUSE_ICMP=1" in src/Makefile and/or
  src/Makefile.in.

  An external program called pinger is responsible for sending and
  receiving ICMP packets.  It must run with root privileges.  After
  Squid has been compiled, the pinger program must be installed
  separately.  A special Makefile target will install pinger with
  appropriate permissions.

          % make install
          % su
          # make install-pinger

  There are three configuration file options for tuning the measurement
  database on your cache.  netdb_low and netdb_high specify high and low
  water marks for keeping the database to a certain size  (e.g. just
  like with the IP cache).  The netdb_ttl option specifies the minimum
  rate for pinging a site.  If netdb_ttl is set to 300 seconds (5 min-
  utes) then an ICMP packet will not be sent to the same site more than
  once every five minutes.  Note that a site is only pinged when an HTTP
  request for the site is received.

  Another option, minimum_direct_hops can be used to try finding servers
  which are close to your cache.  If the measured hop count to the
  origin server is less than or equal to minimum_direct_hops, the
  request will be forwarded directly to the origin server.

  7.6.2.  Utilizing your parents database

  Your parent caches can be asked to include the RTT measurements in
  their ICP replies.  To do this, you must enable query_icmp in your
  config file:

          query_icmp on

  This causes a flag to be set in your outgoing ICP queries.

  If your parent caches return ICMP RTT measurements then the eighth
  column of your access.log will have lines similar to:

          CLOSEST_PARENT_MISS/it.cache.nlanr.net

  In this case, it means that it.cache.nlanr.net returned the lowest RTT
  to the origin server.  If your cache measured a lower RTT than any of
  the parents, the request will be logged with

          CLOSEST_DIRECT/www.sample.com

  7.6.3.  Inspecting the database

  The measurement database can be viewed from the cachemgr by selecting
  "Network Probe Database."  Hostnames are aggregated into /24 networks.
  All measurements made are averaged over time.  Measurements are made
  to specific hosts, taken from the URLs of HTTP requests.  The recv and
  sent fields are the number of ICMP packets sent and received.  At this
  time they are only informational.

  A typical database entry looks something like this:

      Network          recv/sent     RTT  Hops Hostnames
      192.41.10.0        20/  21    82.3   6.0 www.jisedu.org www.dozo.com
          bo.cache.nlanr.net        42.0   7.0
          uc.cache.nlanr.net        48.0  10.0
          pb.cache.nlanr.net        55.0  10.0
          it.cache.nlanr.net       185.0  13.0

  This means we have sent 21 pings to both www.jisedu.org and
  www.dozo.com.  The average RTT is 82.3 milliseconds.  The next four
  lines show the measured values from our parent caches.  Since
  bo.cache.nlanr.net has the lowest RTT, it would be selected as the
  location to forward a request for a www.jisedu.org or www.dozo.com
  URL.

  7.7.  Why are so few requests logged as TCP_IMS_MISS?

  When Squid receives an If-Modified-Since request, it will not forward
  the request unless the object needs to be refreshed according to the
  refresh_pattern rules.  If the request does need to be refreshed, then
  it will be logged as TCP_REFRESH_HIT or TCP_REFRESH_MISS.

  If the request is not forwarded, Squid replies to the IMS request
  according to the object in its cache.  If the modification times are
  the same, then Squid returns TCP_IMS_HIT.  If the modification times
  are different, then Squid returns TCP_IMS_MISS.  In most cases, the
  cached object will not have changed, so the result is TCP_IMS_HIT.
  Squid will only return TCP_IMS_MISS if some other client causes a
  newer version of the object to be pulled into the cache.

  7.8.  How can I make Squid NOT cache some servers or URLs?

  In Squid-2, you use the no_cache option to specify uncachable
  requests.  For example, this makes all responses from origin servers
  in the 10.0.1.0/24 network uncachable:

          acl Local dst 10.0.1.0/24
          no_cache deny Local

  This example makes all URL's with '.html' uncachable:

          acl HTML url_regex .html$
          no_cache deny HTML

  This example makes  a specific URL uncachable:

          acl XYZZY url_regex ^http://www.i.suck.com/foo.html$
          no_cache deny XYZZY

  This example caches nothing between the hours of 8AM to 11AM:

          acl Morning time 08:00-11:00
          no_cache deny Morning

  In Squid-1.1, whether or not an object gets cached is controlled by
  the cache_stoplist, and cache_stoplist_pattern options.  So, you may
  add:

          cache_stoplist my.domain.com

  Specifying uncachable objects by IP address is harder.  The 1.1 patch
  page <../1.1/patches.html> includes a patch called no-cache-
  local.patch which changes the behaviour of the local_ip and
  local_domain so that matching requests are NOT CACHED, in addition to
  being fetched directly.

  7.9.  How can I delete and recreate a cache directory?

  Deleting an existing cache directory is easy to do.  Unfortunately, it
  may require a brief interruption of service.

  1. Edit your squid.conf file and comment out, or delete the cache_dir
     line for the cache directory that you want to remove.

  2. You can not delete a cache directory from a running Squid process.
     Thus, you can not simply reconfigure squid.  You must shutdown
     Squid:

             squid -k shutdown

  3. Once Squid exits, you may immediately start it up again.  If you
     use the RunCache script, Squid should start up again automatically.

  4. Now Squid is no longer using the cache directory that you removed
     from the config file.  You can verify this by checking "Store
     Directory" information with the cache manager.  From the command
     line, type:

             client mgr:storedir

  Now that Squid is not using the cache directory, you can rm -rf it,
  format the disk, build a new filesystem, or whatever.

  The procedure is similar to recreate the directory.

  1. Edit squid.conf and add a new cache_dir line.

  2. Initialize the new directory by running

             % squid -z

  NOTE: it is safe to run this even if Squid is already running.  squid
  -z will harmlessly try to create all of the subdirectories that
  already exist.

  3. Reconfigure Squid

             squid -k reconfigure

  Unlike deleting, you can add new cache directories while Squid is
  already running.

  7.10.  Why can't I run Squid as root?

  by Dave J Woolley

  If someone were to discover a buffer overrun bug in Squid and it runs
  as a user other than root, they can only corrupt the files writeable
  to that user, but if it runs a root, they can take over the whole
  machine.  This applies to all programs that don't absolutely need root
  status, not just squid.

  7.11.  Can you tell me a good way to upgrade Squid with minimal down-
  time?

  Here is a technique that was described by Radu Greab
  .

  Start a second Squid server on an unused HTTP port (say 4128).  This
  instance of Squid probably doesn't need a large disk cache.  When this
  second server has finished reloading the disk store, swap the
  http_port values in the two squid.conf files.  Set the original Squid
  to use port 5128, and the second one to use 3128.  Next, run ``squid
  -k reconfigure'' for both Squids.  New requests will go to the second
  Squid, now on port 3128 and the first Squid will finish handling its
  current requests.  After a few minutes, it should be safe to fully
  shut down the first Squid and upgrade it.  Later you can simply repeat
  this process in reverse.

  7.12.  Can Squid listen on more than one HTTP port?

  Note: The information here is current for version 2.3.

  Yes, you can specify multiple http_port lines in your squid.conf file.
  Squid attempts to bind() to each port that you specify.  Sometimes
  Squid may not be able to bind to a port, either because of permissions
  or because the port is already in use.  If Squid can bind to at least
  one port, then it will continue running.  If it can not bind to any of
  the ports, then Squid stops.

  With version 2.3 and later you can specify IP addresses and port
  numbers together (see the squid.conf comments).

  8.  Memory

  8.1.  Why does Squid use so much memory!?

  Squid uses a lot of memory for performance reasons.  It takes much,
  much longer to read something from disk than it does to read directly
  from memory.

  A small amount of metadata for each cached object is kept in memory.
  This is the StoreEntry data structure.  For Squid-2 this is 56-bytes
  on "small" pointer architectures (Intel, Sparc, MIPS, etc) and
  88-bytes on "large" pointer architectures (Alpha).  In addition, There
  is a 16-byte cache key (MD5 checksum) associated with each StoreEntry.
  This means there are 72 or 104 bytes of metadata in memory for every
  object in your cache.  A cache with 1,000,000 objects therefore
  requires 72 MB of memory for metadata only.  In practice it requires
  much more than that.

  Squid-1.1 also uses a lot of memory to store in-transit objects.  This
  version stores incoming objects only in memory, until the transfer is
  complete.  At that point it decides whether or not to store the object
  on disk.  This means that when users download large files, your memory
  usage will increase significantly.  The squid.conf parameter
  maximum_object_size determines how much memory an in-transit object
  can consume before we mark it as uncachable.  When an object is marked
  uncachable, there is no need to keep all of the object in memory, so
  the memory is freed for the part of the object which has already been
  written to the client.  In other words, lowering maximum_object_size
  also lowers Squid-1.1 memory usage.

  Other uses of memory by Squid include:

  o  Disk buffers for reading and writing

  o  Network I/O buffers

  o  IP Cache contents

  o  FQDN Cache contents

  o  Netdb ICMP measurement database

  o  Per-request state information, including full request and reply
     headers

  o  Miscellaneous statistics collection.

  o  ``Hot objects'' which are kept entirely in memory.

  8.2.  How can I tell how much memory my Squid process is using?

  One way is to simply look at ps output on your system.  For BSD-ish
  systems, you probably want to use the -u option and look at the VSZ
  and RSS fields:

      wessels ~ 236% ps -axuhm
      USER       PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED       TIME COMMAND
      squid     9631  4.6 26.4 141204 137852  ??  S    10:13PM   78:22.80 squid -NCYs

  For SYSV-ish, you probably want to use the -l option.  When interpret-
  ing the ps output, be sure to check your ps manual page.  It may not
  be obvious if the reported numbers are kbytes, or pages (usually 4
  kb).

  A nicer way to check the memory usage is with a program called top:

              last pid: 20128;  load averages:  0.06,  0.12,  0.11                   14:10:58
      46 processes:  1 running, 45 sleeping
      CPU states:     % user,     % nice,     % system,     % interrupt,     % idle
      Mem: 187M Active, 1884K Inact, 45M Wired, 268M Cache, 8351K Buf, 1296K Free
      Swap: 1024M Total, 256K Used, 1024M Free

        PID USERNAME PRI NICE SIZE    RES STATE    TIME   WCPU    CPU COMMAND
       9631 squid     2   0   138M   135M select  78:45  3.93%  3.93% squid

  Finally, you can ask the Squid process to report its own memory usage.
  This is available on the Cache Manager info page.  Your output may
  vary depending upon your operating system and Squid version, but it
  looks similar to this:

      Resource usage for squid:
              Maximum Resident Size: 137892 KB
      Memory usage for squid via mstats():
              Total space in arena:  140144 KB
              Total free:              8153 KB 6%

  If your RSS (Resident Set Size) value is much lower than your process
  size, then your cache performance is most likely suffering due to
  ``paging''.

  8.3.  My Squid process grows without bounds.

  You might just have your cache_mem parameter set too high.  See the
  ````What can I do to reduce Squid's memory usage?''''  entry below.

  When a process continually grows in size, without levelling off or
  slowing down, it often indicates a memory leak.  A memory leak is when
  some chunk of memory is used, but not free'd when it is done being
  used.

  Memory leaks are a real problem for programs (like Squid) which do all
  of their processing within a single process.  Historically, Squid has
  had real memory leak problems.  But as the software has matured, we
  believe almost all of Squid's memory leaks have been eliminated, and
  new ones are least easy to identify.

  Memory leaks may also be present in your system's libraries, such as
  libc.a or even libmalloc.a.  If you experience the ever-growing
  process size phenomenon, we suggest you first try an ``alternative
  malloc library''.

  8.4.  I set cache_mem  to XX, but the process grows beyond that!

  The cache_mem parameter does NOT specify the maximum size of the
  process.  It only specifies how much memory to use for caching ``hot''
  (very popular) replies.   Squid's actual memory usage is depends very
  strongly on your cache size (disk space) and your incoming request
  load.  Reducing cache_mem will usually also reduce the process size,
  but not necessarily, and there are other ways to reduce Squid's memory
  usage (see below).

  8.5.  How do I analyze memory usage from the cache manger output?

  Note: This information is specific to Squid-1.1 versions

  Look at your cachemgr.cgi Cache Information page.  For example:

          Memory usage for squid via mallinfo():
                 Total space in arena:   94687 KB
                 Ordinary blocks:        32019 KB 210034 blks
                 Small blocks:           44364 KB 569500 blks
                 Holding blocks:             0 KB   5695 blks
                 Free Small blocks:       6650 KB
                 Free Ordinary blocks:   11652 KB
                 Total in use:           76384 KB 81%
                 Total free:             18302 KB 19%

          Meta Data:
          StoreEntry                246043 x 64 bytes =  15377 KB
          IPCacheEntry              971 x   88 bytes  =     83 KB
          Hash link                 2 x   24 bytes    =      0 KB
          URL strings                                 =  11422 KB
          Pool MemObject structures 514 x  144 bytes  =     72 KB (    70 free)
          Pool for Request structur 516 x 4380 bytes  =   2207 KB (  2121 free)
          Pool for in-memory object 6200 x 4096 bytes =  24800 KB ( 22888 free)
          Pool for disk I/O         242 x 8192 bytes =   1936 KB (  1888 free)
          Miscellaneous                              =   2600 KB
          total Accounted                            =  58499 KB

  First note that mallinfo() reports 94M in ``arena.''  This is pretty
  close to what top says (97M).

  Of that 94M, 81% (76M) is actually being used at the moment.  The rest
  has been freed, or pre-allocated by malloc(3) and not yet used.

  Of the 76M in use, we can account for 58.5M (76%).  There are some
  calls to malloc(3) for which we can't account.

  The Meta Data list gives the breakdown of where the accounted memory
  has gone.  45% has gone to StoreEntry and URL strings.  Another 42%
  has gone to buffering hold objects in VM while they are fetched and
  relayed to the clients (Pool for in-memory object).

  The pool sizes are specified by squid.conf parameters.  In version
  1.0, these pools are somewhat broken:  we keep a stack of unused pages
  instead of freeing the block.  In the Pool for in-memory object, the
  unused stack size is 1/2 of cache_mem.  The Pool for disk I/O is
  hardcoded at 200.  For MemObject and Request it's 1/8 of your system's
  FD_SETSIZE value.

  If you need to lower your process size, we recommend lowering the max
  object sizes in the 'http', 'ftp' and 'gopher' config lines.  You may
  also want to lower cache_mem to suit your needs. But if you make
  cache_mem too low, then some objects may not get saved to disk during
  high-load periods.  Newer Squid versions allow you to set memory_pools
  off to disable the free memory pools.

  8.6.  The ``Total memory accounted'' value is less than the size of my
  Squid process.

  We are not able to account for all memory that Squid uses.  This would
  require excessive amounts of code to keep track of every last byte.
  We do our best to account for the major uses of memory.

  Also, note that the malloc and free functions have their own overhead.
  Some additional memory is required to keep track of which chunks are
  in use, and which are free.  Additionally, most operating systems do
  not allow processes to shrink in size.  When a process gives up memory
  by calling free, the total process size does not shrink.  So the
  process size really represents the maximum size your Squid process has
  reached.

  8.7.  xmalloc: Unable to allocate 4096 bytes!

  by Henrik Nordstrom 

  Messages like "FATAL: xcalloc: Unable to allocate 4096 blocks of 1
  bytes!"  appear when Squid can't allocate more memory, and on most
  operating systems (inclusive BSD) there are only two possible reasons:

  1. The machine is out of swap

  2. The process' maximum data segment size has been reached

     The first case is detected using the normal swap monitoring tools
     available on the platform (pstat on SunOS, perhaps pstat is used on
     BSD as well).

  To tell if it is the second case, first rule out the first case and
  then monitor the size of the Squid process. If it dies at a certain
  size with plenty of swap left then the max data segment size is
  reached without no doubts.

  The data segment size can be limited by two factors:

  1. Kernel imposed maximum, which no user can go above

  2. The size set with ulimit, which the user can control.

  When squid starts it sets data and file ulimit's to the hard level. If
  you manually tune ulimit before starting Squid make sure that you set
  the hard limit and not only the soft limit (the default operation of
  ulimit is to only change the soft limit). root is allowed to raise the
  soft limit above the hard limit.

  This command prints the hard limits:

          ulimit -aH

  This command sets the data size to unlimited:

          ulimit -HSd unlimited

  8.7.1.  BSD/OS

  by Arjan de Vet 

  The default kernel limit on BSD/OS for datasize is 64MB (at least on
  3.0 which I'm using).

  Recompile a kernel with larger datasize settings:

          maxusers        128
          # Support for large inpcb hash tables, e.g. busy WEB servers.
          options         INET_SERVER
          # support for large routing tables, e.g. gated with full Internet routing:
          options         "KMEMSIZE=\(16*1024*1024\)"
          options         "DFLDSIZ=\(128*1024*1024\)"
          options         "DFLSSIZ=\(8*1024*1024\)"
          options         "SOMAXCONN=128"
          options         "MAXDSIZ=\(256*1024*1024\)"

  See /usr/share/doc/bsdi/config.n for more info.

  In /etc/login.conf I have this:

          default:\
                  :path=/bin /usr/bin /usr/contrib/bin:\
                  :datasize-cur=256M:\
                  :openfiles-cur=1024:\
                  :openfiles-max=1024:\
                  :maxproc-cur=1024:\
                  :stacksize-cur=64M:\
                  :radius-challenge-styles=activ,crypto,skey,snk,token:\
                  :tc=auth-bsdi-defaults:\
                  :tc=auth-ftp-bsdi-defaults:

          #
          # Settings used by /etc/rc and root
          # This must be set properly for daemons started as root by inetd as well.
          # Be sure reset these values back to system defaults in the default class!
          #
          daemon:\
                  :path=/bin /usr/bin /sbin /usr/sbin:\
                  :widepasswords:\
                  :tc=default:
          #       :datasize-cur=128M:\
          #       :openfiles-cur=256:\
          #       :maxproc-cur=256:\

  This should give enough space for a 256MB squid process.

  8.7.2.  FreeBSD (2.2.X)

  by Duane Wessels

  The procedure is almost identical to that for BSD/OS above.  Increase
  the open filedescriptor limit in /sys/conf/param.c:

          int     maxfiles = 4096;
          int     maxfilesperproc = 1024;

  Increase the maximum and default data segment size in your kernel con-
  fig file, e.g. /sys/conf/i386/CONFIG:

          options         "MAXDSIZ=(512*1024*1024)"
          options         "DFLDSIZ=(128*1024*1024)"

  We also found it necessary to increase the number of mbuf clusters:

          options         "NMBCLUSTERS=10240"

  And, if you have more than 256 MB of physical memory, you probably
  have to disable BOUNCE_BUFFERS (whatever that is), so comment out this
  line:

          #options        BOUNCE_BUFFERS          #include support for DMA bounce buffers

  Also, update limits in /etc/login.conf:

          # Settings used by /etc/rc
          #
          daemon:\
                  :coredumpsize=infinity:\
                  :datasize=infinity:\
                  :maxproc=256:\
                  :maxproc-cur@:\
                  :memoryuse-cur=64M:\
                  :memorylocked-cur=64M:\
                  :openfiles=4096:\
                  :openfiles-cur@:\
                  :stacksize=64M:\
                  :tc=default:

  And don't forget to run ``cap_mkdb /etc/login.conf'' after editing
  that file.

  8.7.3.  OSF, Digital Unix

  by Ong Beng Hui 

  To increase the data size for Digital UNIX, edit the file
  /etc/sysconfigtab and add the entry...

          proc:
                  per-proc-data-size=1073741824

  Or, with csh, use the limit command, such as

          > limit datasize 1024M

  Editing /etc/sysconfigtab requires a reboot, but the limit command
  doesn't.

  8.8.  fork: (12) Cannot allocate memory

  When Squid is reconfigured (SIGHUP) or the logs are rotated (SIGUSR1),
  some of the helper processes (dnsserver) must be killed and restarted.
  If your system does not have enough virtual memory, the Squid process
  may not be able to fork to start the new helper processes.  The best
  way to fix this is to increase your virtual memory by adding swap
  space.  Normally your system uses raw disk partitions for swap space,
  but most operating systems also support swapping on regular files
  (Digital Unix excepted).  See your system manual pages for swap,
  swapon, and mkfile.

  8.9.  What can I do to reduce Squid's memory usage?

  If your cache performance is suffering because of memory limitations,
  you might consider buying more memory.  But if that is not an option,
  There are a number of things to try:

  o  Try a ``different malloc library''.

  o  Reduce the cache_mem parameter in the config file.  This controls
     how many ``hot'' objects are kept in memory.  Reducing this
     parameter will not significantly affect performance, but you may
     recieve some warnings in cache.log if your cache is busy.

  o  Turn the memory_pools off in the config file.  This causes Squid to
     give up unused memory by calling free() instead of holding on to
     the chunk for potential, future use.

  o  Reduce the cache_swap parameter in your config file.  This will
     reduce the number of objects Squid keeps.  Your overall hit ratio
     may go down a little, but your cache will perform significantly
     better.

  o  Reduce the maximum_object_size parameter (Squid-1.1 only).  You
     won't be able to cache the larger objects, and your byte volume hit
     ratio may go down, but Squid will perform better overall.

  o  If you are using Squid-1.1.x, try the ``NOVM'' version.

  8.10.  Using an alternate malloc  library.

  Many users have found improved performance and memory utilization when
  linking Squid with an external malloc library.  We recommend either
  GNU malloc, or dlmalloc.

  8.10.1.  Using GNU malloc

  To make Squid use GNU malloc follow these simple steps:

  1. Download the GNU malloc source, available from one of The GNU FTP
     Mirror sites .

  2. Compile GNU malloc

             % gzip -dc malloc.tar.gz | tar xf -
             % cd malloc
             % vi Makefile     # edit as needed
             % make

  3. Copy libmalloc.a to your system's library directory and be sure to
     name it libgnumalloc.a.

             % su
             # cp malloc.a /usr/lib/libgnumalloc.a

  4. (Optional) Copy the GNU malloc.h to your system's include directory
     and be sure to name it gnumalloc.h.  This step is not required, but
     if you do this, then Squid will be able to use the mstat() function
     to report memory usage statistics on the cachemgr info page.

             # cp malloc.h /usr/include/gnumalloc.h

  5. Reconfigure and recompile Squid

             % make realclean
             % ./configure ...
             % make
             % make install

  Note, In later distributions, 'realclean' has been changed to 'dist-
  clean'.  As the configure script runs, watch its output.  You should
  find that it locates libgnumalloc.a and optionally gnumalloc.h.

  8.10.2.  dlmalloc

  dlmalloc  has been written by
  Doug Lea .  According to Doug:

       This is not the fastest, most space-conserving, most
       portable, or most tunable malloc ever written. However it is
       among the fastest while also being among the most space-con-
       serving, portable and tunable.

  dlmalloc is included with the Squid-2 source distribution.  To use
  this library, you simply give an option to the configure script:

          % ./configure --enable-dlmalloc ...

  9.  The Cache Manager

  by Jonathan Larmour 

  9.1.  What is the cache manager?

  The cache manager (cachemgr.cgi) is a CGI utility for displaying
  statistics about the squid process as it runs.  The cache manager is a
  convenient way to manage the cache and view statistics without logging
  into the server.

  9.2.  How do you set it up?

  That depends on which web server you're using.  Below you will find
  instructions for configuring the CERN and Apache servers to permit
  cachemgr.cgi usage.

  EDITOR"S NOTE: readers are encouraged to submit instructions for
  configuration of cachemgr.cgi on other web server platforms, such as
  Netscape.

  After you edit the server configuration files, you will probably need
  to either restart your web server or or send it a SIGHUP signal to
  tell it to re-read its configuration files.

  When you're done configuring your web server, you'll connect to the
  cache manager with a web browser, using a URL such as:

          http://www.example.com/Squid/cgi-bin/cachemgr.cgi/

  9.3.  Cache manager configuration for CERN httpd 3.0

  First, you should ensure that only specified workstations can access
  the cache manager.  That is done in your CERN httpd.conf, not in
  squid.conf.

          Protection MGR-PROT {
                   Mask    @(workstation.example.com)
          }

  Wildcards are acceptable, IP addresses are acceptable, and others can
  be added with a comma-separated list of IP addresses. There are many
  more ways of protection.  Your server documentation has details.

  You also need to add:

          Protect         /Squid/*        MGR-PROT
          Exec            /Squid/cgi-bin/*.cgi    /usr/local/squid/bin/*.cgi

  This marks the script as executable to those in MGR-PROT.

  9.4.  Cache manager configuration for Apache

  First, make sure the cgi-bin directory you're using is listed with a
  ScriptAlias in your Apache srm.conf file like this:

          ScriptAlias /Squid/cgi-bin/ /usr/local/squid/cgi-bin/

  It's probably a bad idea to ScriptAlias the entire
  usr/local/squid/bin/ directory where all the Squid executables live.

  Next, you should ensure that only specified workstations can access
  the cache manager.  That is done in your Apache access.conf, not in
  squid.conf.  At the bottom of access.conf file, insert:

          
          order deny,allow
          deny from all
          allow from workstation.example.com
          

  You can have more than one allow line, and you can allow domains or
  networks.

  Alternately, cachemgr.cgi can be password-protected.  You'd add the
  following to access.conf:

          
          AuthUserFile /path/to/password/file
          AuthGroupFile /dev/null
          AuthName User/Password Required
          AuthType Basic
          require user cachemanager
          

  Consult the Apache documentation for information on using htpasswd to
  set a password for this ``user.''

  9.5.  Cache manager ACLs in squid.conf

  The default cache manager access configuration in squid.conf is:

          acl manager proto cache_object
          acl localhost src 127.0.0.1/255.255.255.255
          acl all src 0.0.0.0/0.0.0.0

  With the following rules:

          http_access deny manager !localhost
          http_access allow all

  The first ACL is the most important as the cache manager program
  interrogates squid using a special cache_object protocol.  Try it
  yourself by doing:

          telnet mycache.example.com 3128
          GET cache_object://mycache.example.com/info HTTP/1.0

  The default ACLs say that if the request is for a cache_object, and it
  isn't the local host, then deny access; otherwise allow access.

  In fact, only allowing localhost access means that on the initial
  cachemgr.cgi form you can only specify the cache host as localhost. We
  recommend the following:

          acl manager proto cache_object
          acl localhost src 127.0.0.1/255.255.255.255
          acl example src 123.123.123.123/255.255.255.255
          acl all src 0.0.0.0/0.0.0.0

  Where 123.123.123.123 is the IP address of your web server.  Then
  modify the rules like this:

          http_access allow manager localhost
          http_access allow manager example
          http_access deny manager
          http_access allow all

  If you're using miss_access, then don't forget to also add a
  miss_access rule for the cache manager:

          miss_access allow manager

  The default ACLs assume that your web server is on the same machine as
  squid. Remember that the connection from the cache manager program to
  squid originates at the web server, not the browser. So if your web
  server lives somewhere else, you should make sure that IP address of
  the web server that has cachemgr.cgi installed on it is in the example
  ACL above.

  Always be sure to send a SIGHUP signal to squid any time you change
  the squid.conf file.

  9.6.  Why does it say I need a password and a URL?

  If you ``drop'' the list box, and browse it, you will see that the
  password is only required to shutdown the cache, and the URL is
  required to refresh an object (i.e., retrieve it from its original
  source again) Otherwise these fields can be left blank:  a password is
  not required to obtain access to the informational aspects of
  cachemgr.cgi.

  9.7.  I want to shutdown the cache remotely. What's the password?

  See the cachemgr_passwd directive in squid.conf.

  9.8.  How do I make the cache host default to my  cache?

  Edit Makefile.in. Look at the line

          HOST_OPT        = # -DCACHEMGR_HOSTNAME="getfullhostname()"

  If the webserver that cachemgr.cgi runs from is the same machine as
  Squid runs on, just remove the #.  If your web server is somewhere
  else use:

          HOST_OPT        = -DCACHEMGR_HOSTNAME=\"mycache.example.com\"

  If you change this, you will need to recompile and reinstall
  cachemgr.cgi before the changes take effect.

  9.9.  What's the difference between Squid TCP connections and Squid
  UDP connections?

  Browsers and caches use TCP connections to retrieve web objects from
  web servers or caches.  UDP connections are used when another cache
  using you as a sibling or parent wants to find out if you have an
  object in your cache that it's looking for.  The UDP connections are
  ICP queries.

  9.10.  It says the storage expiration will happen in 1970!

  Don't worry. The default (and sensible) behavior of squid is to expire
  an object when it happens to overwrite it.  It doesn't explicitly
  garbage collect (unless you tell it to in other ways).

  9.11.  What do the Meta Data entries mean?

     StoreEntry
        Entry describing an object in the cache.

     IPCacheEntry
        An entry in the DNS cache.

     Hash link
        Link in the cache hash table structure.

     URL strings
        The strings of the URLs themselves that map to an object number
        in the cache, allowing access to the StoreEntry.

  Basically just like the log file in your cache directory:

  1. PoolMemObject structures

  2. Info about objects currently in memory, (eg, in the process of
     being transferred).

  3. Pool for Request structures

  4. Information about each request as it happens.

  5. Pool for in-memory object

  6. Space for object data as it is retrieved.

  If squid is much smaller than this field, run for cover!  Something is
  very wrong, and you should probably restart squid.

  9.12.  In the utilization section, what is Other ?

  Other is a default category to track objects which don't fall into one
  of the defined categories.

  9.13.  column always zero?  In the utilization section, why is the
  Transfer KB/sec

  This column contains gross estimations of data transfer rates averaged
  over the entire time the cache has been running.  These numbers are
  unreliable and mostly useless.

  9.14.  In the utilization section, what is the Object Count ?

  The number of objects of that type in the cache right now.

  9.15.  In the utilization section, what is the Max/Current/Min KB ?

  These refer to the size all the objects of this type have grown
  to/currently are/shrunk to.

  9.16.  What is the I/O  section about?

  These are histograms on the number of bytes read from the network per
  read(2) call.  Somewhat useful for determining maximum buffer sizes.

  9.17.  What is the Objects  section for?

  Warning: this will download to your browser a list of every URL in the
  cache and statistics about it. It can be very, very large.  Sometimes
  it will be larger than the amount of available memory in your client!
  You probably don't need this information anyway.

  9.18.  What is the VM Objects  section for?

  VM Objects are the objects which are in Virtual Memory.  These are
  objects which are currently being retrieved and those which were kept
  in memory for fast access (accelerator mode).

  9.19.  What does AVG RTT  mean?

  Average Round Trip Time. This is how long on average after an ICP ping
  is sent that a reply is received.

  9.20.  In the IP cache section, what's the difference between a hit, a
  negative hit and a miss?

  A HIT means that the document was found in the cache. A MISS, that it
  wasn't found in the cache. A negative hit means that it was found in
  the cache, but it doesn't exist.

  9.21.  What do the IP cache contents mean anyway?

  The hostname is the name that was requested to be resolved.

  For the Flags column:

  o  C Means positively cached.

  o  N Means negatively cached.

  o  P Means the request is pending being dispatched.

  o  D Means the request has been dispatched and we're waiting for an
     answer.

  o  L Means it is a locked entry because it represents a parent or
     sibling.

  The TTL column represents ``Time To Live'' (i.e., how long the cache
  entry is valid).  (May be negative if the document has expired.)

  The N column is the number of IP addresses from which the cache has
  documents.

  The rest of the line lists all the IP addresses that have been
  associated with that IP cache entry.

  9.22.  What is the fqdncache and how is it different from the ipcache?

  IPCache contains data for the Hostname to IP-Number mapping, and
  FQDNCache does it the other way round.  For example:

  IP Cache Contents:

          Hostname                      Flags lstref    TTL  N [IP-Number]
          gorn.cc.fh-lippe.de               C       0  21581 1 193.16.112.73
          lagrange.uni-paderborn.de         C       6  21594 1 131.234.128.245
          www.altavista.digital.com         C      10  21299 4 204.123.2.75  ...
          2/ftp.symantec.com                DL   1583 -772855 0

          Flags:  C --> Cached
                  D --> Dispatched
                  N --> Negative Cached
                  L --> Locked
          lstref: Time since last use
          TTL:    Time-To-Live until information expires
          N:      Count of addresses

  FQDN Cache Contents:

          IP-Number                    Flags    TTL N Hostname
          130.149.17.15                    C -45570 1 andele.cs.tu-berlin.de
          194.77.122.18                    C -58133 1 komet.teuto.de
          206.155.117.51                   N -73747 0

          Flags:  C --> Cached
                  D --> Dispatched
                  N --> Negative Cached
                  L --> Locked
          TTL:    Time-To-Live until information expires
          N:      Count of names

  9.23.  What does ``Page faults with physical i/o: 4897'' mean?

  This question was asked on the squid-users mailing list, to which
  there were three excellent replies.

  by Jonathan Larmour 

  You get a ``page fault'' when your OS tries to access something in
  memory which is actually swapped to disk. The term ``page fault''
  while correct at the kernel and CPU level, is a bit deceptive to a
  user, as there's no actual error - this is a normal feature of
  operation.

  Also, this doesn't necessarily mean your squid is swapping by that
  much.  Most operating systems also implement paging for executables,
  so that only sections of the executable which are actually used are
  read from disk into memory. Also, whenever squid needs more memory,
  the fact that the memory was allocated will show up in the page
  faults.

  However, if the number of faults is unusually high, and getting
  bigger, this could mean that squid is swapping. Another way to verify
  this is using a program called ``vmstat'' which is found on most UNIX
  platforms. If you run this as ``vmstat 5'' this will update a display
  every 5 seconds. This can tell you if the system as a whole is
  swapping a lot (see your local man page for vmstat for more
  information).

  It is very bad for squid to swap, as every single request will be
  blocked until the requested data is swapped in. It is better to tweak
  the cache_mem and/or memory_pools setting in squid.conf, or switch to
  the NOVM versions of squid, than allow this to happen.

  by Peter Wemm 

  There's two different operations at work, Paging and swapping.  Paging
  is when individual pages are shuffled (either discarded or swapped
  to/from disk), while ``swapping'' generally means the entire process
  got sent to/from disk.

  Needless to say, swapping a process is a pretty drastic event, and
  usually only reserved for when there's a memory crunch and paging out
  cannot free enough memory quickly enough.  Also, there's some
  variation on how swapping is implemented in OS's.  Some don't do it at
  all or do a hybrid of paging and swapping instead.

  As you say, paging out doesn't necessarily involve disk IO, eg: text
  (code) pages are read-only and can simply be discarded if they are not
  used (and reloaded if/when needed).  Data pages are also discarded if
  unmodified, and paged out if there's been any changes.  Allocated
  memory (malloc) is always saved to disk since there's no executable
  file to recover the data from.  mmap() memory is variable..  If it's
  backed from a file, it uses the same rules as the data segment of a
  file - ie: either discarded if unmodified or paged out.

  There's also ``demand zeroing'' of pages as well that cause faults..
  If you malloc memory and it calls brk()/sbrk() to allocate new pages,
  the chances are that you are allocated demand zero pages.  Ie: the
  pages are not ``really'' attached to your process yet, but when you
  access them for the first time, the page fault causes the page to be
  connected to the process address space and zeroed - this saves
  unnecessary zeroing of pages that are allocated but never used.

  The ``page faults with physical IO'' comes from the OS via
  getrusage(). It's highly OS dependent on what it means.  Generally, it
  means that the process accessed a page that was not present in memory
  (for whatever reason) and there was disk access to fetch it.  Many
  OS's load executables by demand paging as well, so the act of starting
  squid implicitly causes page faults with disk IO - however, many (but
  not all) OS's use ``read ahead'' and ``prefault'' heuristics to
  streamline the loading.  Some OS's maintain ``intent queues'' so that
  pages can be selected as pageout candidates ahead of time.  When (say)
  squid touches a freshly allocated demand zero page and one is needed,
  the OS can page out one of the candidates on the spot, causing a
  'fault with physical IO' with demand zeroing of allocated memory which
  doesn't happen on many other OS's.  (The other OS's generally put the
  process to sleep while the pageout daemon finds a page for it).

  The meaning of ``swapping'' varies.  On FreeBSD for example, swapping
  out is implemented as unlocking upages, kernel stack, PTD etc for
  aggressive pageout with the process.  The only thing left of the
  process in memory is the 'struct proc'.  The FreeBSD paging system is
  highly adaptive and can resort to paging in a way that is equivalent
  to the traditional swapping style operation (ie: entire process).
  FreeBSD also tries stealing pages from active processes in order to
  make space for disk cache.  I suspect this is why setting
  'memory_pools off' on the non-NOVM squids on FreeBSD is reported to
  work better - the VM/buffer system could be competing with squid to
  cache the same pages.  It's a pity that squid cannot use mmap() to do
  file IO on the 4K chunks in it's memory pool (I can see that this is
  not a simple thing to do though, but that won't stop me wishing. :-).

  by John Line 

  The comments so far have been about what paging/swapping figures mean
  in a ``traditional'' context, but it's worth bearing in mind that on
  some systems (Sun's Solaris 2, at least), the virtual memory and
  filesystem handling are unified and what a user process sees as
  reading or writing a file, the system simply sees as paging something
  in from disk or a page being updated so it needs to be paged out. (I
  suppose you could view it as similar to the operating system memory-
  mapping the files behind-the-scenes.)

  The effect of this is that on Solaris 2, paging figures will also
  include file I/O. Or rather, the figures from vmstat certainly appear
  to include file I/O, and I presume (but can't quickly test) that
  figures such as those quoted by Squid will also include file I/O.

  To confirm the above (which represents an impression from what I've
  read and observed, rather than 100% certain facts...), using an
  otherwise idle Sun Ultra 1 system system I just tried using cat
  (small, shouldn't need to page) to copy (a) one file to another, (b) a
  file to /dev/null, (c) /dev/zero to a file, and (d) /dev/zero to
  /dev/null (interrupting the last two with control-C after a while!),
  while watching with vmstat. 300-600 page-ins or page-outs per second
  when reading or writing a file (rather than a device), essentially
  zero in other cases (and when not cat-ing).

  So ... beware assuming that all systems are similar and that paging
  figures represent *only* program code and data being shuffled to/from
  disk - they may also include the work in reading/writing all those
  files you were accessing...

  9.23.1.  Ok, so what is unusually high?

  You'll probably want to compare the number of page faults to the
  number of HTTP requests.  If this ratio is close to, or exceeding 1,
  then Squid is paging too much.

  9.24.  What does the IGNORED field mean in the 'cache server list'?

  This refers to ICP replies which Squid ignored, for one of these
  reasons:

  o  The URL in the reply could not be found in the cache at all.

  o  The URL in the reply was already being fetched.  Probably this ICP
     reply arrived too late.

  o  The URL in the reply did not have a MemObject associated with it.
     Either the request is already finished, or the user aborted before
     the ICP arrived.

  o  The reply came from a multicast-responder, but the
     cache_peer_access configuration does not allow us to forward this
     request to that neighbor.

  o  Source-Echo replies from known neighbors are ignored.

  o  ICP_OP_DENIED replies are ignored after the first 100.

  10.  Access Controls

  10.1.  How do I implement an ACL ban list?

  As an example, we will assume that you would like to prevent users
  from accessing cooking recipes.

  One way to implement this would be to deny access to any URLs that
  contain the words ``cooking'' or ``recipe.''  You would use these
  configuration lines:

          acl Cooking1 url_regex cooking
          acl Recipe1 url_regex recipe
          http_access deny Cooking1
          http_access deny Recipe1
          http_access allow all

  The url_regex means to search the entire URL for the regular expres-
  sion you specify.  Note that these regular expressions are case-sensi-
  tive, so a url containing ``Cooking'' would not be denied.

  Another way is to deny access to specific servers which are known to
  hold recipes.  For example:

          acl Cooking2 dstdomain gourmet-chef.com
          http_access deny Cooking2
          http_access allow all

  The dstdomain means to search the hostname in the URL for the string
  ``gourmet-chef.com.''  Note that when IP addresses are used in URLs
  (instead of domain names), Squid-1.1 implements relaxed access con-
  trols.  If the a domain name for the IP address has been saved in
  Squid's ``FQDN cache,'' then Squid can compare the destination domain
  against the access controls.  However, if the domain is not immedi-
  ately available, Squid allows the request and makes a lookup for the
  IP address so that it may be available for future reqeusts.

  10.2.  How do I block specific users or groups from accessing my
  cache?

  10.2.1.  Ident

  You can use ident lookups  to allow specific users access to your
  cache.  This requires that an ident server
   process runs on the
  user's machine(s).  In your squid.conf configuration file you would
  write something like this:

          ident_lookup on
          acl friends user kim lisa frank joe
          http_access allow friends
          http_access deny all

  10.2.2.  Proxy Authentication

  Another option is to use proxy-authentication.    In this scheme, you
  assign usernames and passwords to individuals.  When they first use
  the proxy they are asked to authenticate themselves by entering their
  username and password.

  In Squid v2 this authentication is hanled via external processes.  For
  information on how to configure this, please see ``Configuring Proxy
  Authentication''.

  10.3.  Do you have a CGI program which lets users change their own
  proxy passwords?

  Pedro L Orso  has adapted the Apache's
  htpasswd into a CGI program called chpasswd.cgi .

  10.4.  Is there a way to do ident lookups only for a certain host and
  compare the result with a userlist in squid.conf?

  Sort of.

  If you use a user ACL in squid conf, then Squid will perform an ident
  lookup 
  for every client request.  In other words, Squid-1.1 will perform
  ident lookups for all requests or no requests.  Defining a user ACL
  enables ident lookups, regardless of the ident_lookup setting.

  However, even though ident lookups are performed for every request,
  Squid does not wait for the lookup to complete unless the ACL rules
  require it.  Consider this configuration:

          acl host1 src 10.0.0.1
          acl host2 src 10.0.0.2
          acl pals  user kim lisa frank joe
          http_access allow host1
          http_access allow host2 pals

  Requests coming from 10.0.0.1 will be allowed immediately because
  there are no user requirements for that host.  However, requests from
  10.0.0.2 will be allowed only after the ident lookup completes, and if
  the username is in the set kim, lisa, frank, or joe.

  10.5.  Common Mistakes

  10.5.1.  And/Or logic

  You've probably noticed (and been frustrated by) the fact that you
  cannot combine access controls with terms like ``and'' or ``or.''
  These operations are already built in to the access control scheme in
  a fundamental way which you must understand.

  o  All elements of an acl entry are OR'ed together.

  o  All elements of an access entry are AND'ed together.  e.g.
     http_access and icp_access.

  For example, the following access control configuration will never
  work:

          acl ME src 10.0.0.1
          acl YOU src 10.0.0.2
          http_access allow ME YOU

  In order for the request to be allowed, it must match the ``ME'' acl
  AND the ``YOU'' acl.  This is impossible because any IP address could
  only match one or the other.  This should instead be rewritten as:

          acl ME src 10.0.0.1
          acl YOU src 10.0.0.2
          http_access allow ME
          http_access allow YOU

  Or, alternatively, this would also work:

          acl US src 10.0.0.1 10.0.0.2
          http_access allow US

  10.5.2.  allow/deny mixups

  I have read through my squid.conf numerous times, spoken to my
  neighbors, read the FAQ and Squid Docs and cannot for the life of me
  work out why the following will not work.

  I can successfully access cachemgr.cgi from our web server machine
  here, but I would like to use MRTG to monitor various aspects of our
  proxy.  When I try to use 'client' or GET cache_object from the
  machine the proxy is running on, I always get access denied.

          acl manager proto cache_object
          acl localhost src 127.0.0.1/255.255.255.255
          acl server    src 1.2.3.4/255.255.255.255
          acl all src 0.0.0.0/0.0.0.0
          acl ourhosts src 1.2.0.0/255.255.0.0

          http_access deny manager !localhost !server
          http_access allow ourhosts
          http_access deny all

  The intent here is to allow cache manager requests from the localhost
  and server addresses, and deny all others.  This policy has been
  expressed here:

          http_access deny manager !localhost !server

  The problem here is that for allowable requests, this access rule is
  not matched.  For example, if the source IP address is localhost, then
  ``!localhost'' is false and the access rule is not matched, so Squid
  continues checking the other rules.  Cache manager requests from the
  server address work because server is a subset of ourhosts and the
  second access rule will match and allow the request.  Also note that
  this means any cache manager request from ourhosts would be allowed.

  To implement the desired policy correctly, the access rules should be
  rewritten as

          http_access allow manager localhost
          http_access allow manager server
          http_access deny manager
          http_access allow ourhosts
          http_access deny all

  If you're using miss_access, then don't forget to also add a
  miss_access rule for the cache manager:

          miss_access allow manager

  You may be concerned that the having five access rules instead of
  three may have an impact on the cache performance.  In our experience
  this is not the case.  Squid is able to handle a moderate amount of
  access control checking without degrading overall performance.  You
  may like to verify that for yourself, however.

  10.5.3.  Differences between src  and srcdomain  ACL types.

  For the srcdomain ACL type, Squid does a reverse lookup of the
  client's IP address and checks the result with the domains given on
  the acl line.  With the src ACL type, Squid converts hostnames to IP
  addresses at startup and then only compares the client's IP address.
  The src ACL is preferred over srcdomain because it does not require
  address-to-name lookups for each request.

  10.6.  I set up my access controls, but they don't work!  why?

  You can debug your access control configuration by setting the
  debug_options parameter in squid.conf and watching cache.log as
  requests are made.  The access control routes correspond to debug
  section 28, so you might enter:

          debug_options ALL,1 28,9

  10.7.  Proxy-authentication and neighbor caches

  The problem...

                              [ Parents ]
                              /         \
                             /           \
                      [ Proxy A ] --- [ Proxy B ]
                          |
                          |
                         USER

       Proxy A sends and ICP query to Proxy B about an object,
       Proxy B replies with an ICP_HIT.  Proxy A forwards the HTTP
       request to Proxy B, but does not pass on the authentication
       details, therefore the HTTP GET from Proxy A fails.

  Only ONE proxy cache in a chain is allowed to ``use'' the Proxy-
  Authentication request header.  Once the header is used, it must not
  be passed on to other proxies.

  Therefore, you must allow the neighbor caches to request from each
  other without proxy authentication.  This is simply accomplished by
  listing the neighbor ACL's first in the list of http_access lines.
  For example:

          acl proxy-A src 10.0.0.1
          acl proxy-B src 10.0.0.2
          acl user_passwords proxy_auth /tmp/user_passwds

          http_access allow proxy-A
          http_access allow proxy-B
          http_access allow user_passwords
          http_access deny all

  10.8.  Is there an easy way of banning all Destination addresses
  except one?

          acl GOOD dst 10.0.0.1
          acl BAD dst 0.0.0.0/0.0.0.0
          http_access allow GOOD
          http_access deny BAD

  10.9.  Does anyone have a ban list of porn sites and such?

  o  Pedro Lineu Orso's List
     

  o  Linux Center Hong Kong's List 

  o  Snerpa, an ISP in Iceland operates a DNS-database of IP-addresses
     of blacklisted sites containing porn, violence, etc. which is
     utilized using a small perl-script redirector.  Information on this
     on the INfilter  webpage.

  10.10.  Squid doesn't match my subdomains

  There is a subtle problem with domain-name based access controls when
  a single ACL element has an entry that is a subdomain of another
  entry.  For example, consider this list:

          acl FOO dstdomain boulder.co.us vail.co.us co.us

  In the first place, the above list is simply wrong because the first
  two (boulder.co.us and vail.co.us) are unnecessary.  Any domain name
  that matches one of the first two will also match the last one
  (co.us).  Ok, but why does this happen?

  The problem stems from the data structure used to index domain names
  in an access control list.  Squid uses Splay trees for lists of domain
  names.  As other tree-based data structures, the searching algorithm
  requires a comparison function that returns -1, 0, or +1 for any pair
  of keys (domain names).  This is similar to the way that strcmp()
  works.

  The problem is that it is wrong to say that co.us is greater-than,
  equal-to, or less-than boulder.co.us.

  For example, if you said that co.us is LESS than fff.co.us, then the
  Splay tree searching algorithm might never discover co.us as a match
  for kkk.co.us.

  similarly, if you said that co.us is GREATER than fff.co.us, then the
  Splay tree searching algorithm might never discover co.us as a match
  for bbb.co.us.

  The bottom line is that you can't have one entry that is a subdomain
  of another.  Squid-2.2 will warn you if it detects this condition.

  10.11.  Why does Squid deny some port numbers?

  It is dangerous to allow Squid to connect to certain port numbers.
  For example, it has been demonstrated that someone can use Squid as an
  SMTP (email) relay.  As I'm sure you know, SMTP relays are one of the
  ways that spammers are able to flood our mailboxes.  To prevent mail
  relaying, Squid denies requests when the URL port number is 25.  Other
  ports should be blocked as well, as a precaution.

  There are two ways to filter by port number: either allow specific
  ports, or deny specific ports.  By default, Squid does the first.
  This is the ACL entry that comes in the default squid.conf:

          acl Safe_ports port 80 21 443 563 70 210 1025-65535
          http_access deny !Safe_ports

  The above configuration denies requests when the URL port number is
  not in the list.  The list allows connections to the standard ports
  for HTTP, FTP, Gopher, SSL, WAIS, and all non-priveleged ports.

  Another approach is to deny dangerous ports.  The dangerous port list
  should look something like:

          acl Dangerous_ports 7 9 19 22 23 25 53 109 110 119
          http_access deny Dangerous_ports

  Please consult the /etc/services file on your system for a list of
  known ports and protocols.

  10.12.  Does Squid support the use of a database such as mySQL for
  storing the ACL list?

  Note: The information here is current for version 2.2.

  No, it does not.

  10.13.  How can I allow a single address to access a specific URL?

  This example allows only the special_client to access the special_url.
  Any other client that tries to access the special_url is denied.

          acl special_client src 10.1.2.3
          acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
          http_access allow special_client special_url
          http_access deny special_url

  10.14.  How can I allow some clients to use the cache at specific
  times?

  Let's say you have two workstations that should only be allowed access
  to the Internet during working hours (8:30 - 17:30).  You can use
  something like this:

  acl FOO src 10.1.2.3 10.1.2.4
  acl WORKING time MTWHF 08:30-17:30
  http_access allow FOO WORKING
  http_access deny FOO

  10.15.  Problems with IP ACL's that have complicated netmasks

  Note: The information here is current for version 2.3.

  The following ACL entry gives inconsistent or unexpected results:

          acl restricted  src 10.0.0.128/255.0.0.128 10.85.0.0/16

  The reason is that IP access lists are stored in ``splay'' tree data
  structures.  These trees require the keys to be sortable.  When you
  use a complicated, or non-standard, netmask (255.0.0.128), it confuses
  the function that compares two address/mask pairs.

  The best way to fix this problem is to use separate ACL names for each
  ACL value.  For example, change the above to:

          acl restricted1 src 10.0.0.128/255.0.0.128
          acl restricted2 src 10.85.0.0/16

  Then, of course, you'll have to rewrite your http_access lines as
  well.

  11.  Troubleshooting

  11.1.  Why am I getting ``Proxy Access Denied?''

  You may need to set up the http_access option to allow requests from
  your IP addresses.    Please see ``the Access Controls section'' for
  information about that.

  If squid is in httpd-accelerator mode, it will accept normal HTTP
  requests and forward them to a HTTP server, but it will not honor
  proxy requests.  If you want your cache to also accept proxy-HTTP
  requests then you must enable this feature:

          http_accel_with_proxy on

  Alternately, you may have misconfigured one of your ACLs.  Check the
  access.log and squid.conf files for clues.

  11.2.  I can't get local_domain  to work; Squid  is caching the
  objects from my local servers.

  The local_domain directive does not prevent local objects from being
  cached.  It prevents the use of sibling caches when fetching local
  objects.  If you want to prevent objects from being cached, use the
  cache_stoplist or http_stop configuration options (depending on your
  version).

  11.3.  I get Connection Refused  when the cache tries to retrieve an
  object located on a sibling, even though the sibling thinks it deliv-
  ered the object to my cache.

  If the HTTP port number is wrong but the ICP port is correct you will
  send ICP queries correctly and the ICP replies will fool your cache
  into thinking the configuration is correct but large objects will fail
  since you don't have the correct HTTP port for the sibling in your
  squid.conf file.  If your sibling changed their http_port, you could
  have this problem for some time before noticing.

  11.4.  Running out of filedescriptors

  If you see the Too many open files error message, you are most likely
  running out of file descriptors.  This may be due to running Squid on
  an operating system with a low filedescriptor limit.  This limit is
  often configurable in the kernel or with other system tuning tools.
  There are two ways to run out of file descriptors:  first, you can hit
  the per-process limit on file descriptors.  Second, you can hit the
  system limit on total file descriptors for all processes.

  11.4.1.  Linux

  Start with Dancer's Mini-'Adding File-descriptors-to-linux for squid'
  HOWTO , but realize
  that this information is specific to the Linux 2.0.36 kernel.

  You also might want to have a look at filehandle patch
   by Michael
  O'Reilly 

  If your kernel version is 2.2.x or greater, you can read and write the
  maximum number of file handles and/or inodes simply by accessing the
  special files:

          /proc/sys/fs/file-max
          /proc/sys/fs/inode-max

  So, to increase your file descriptor limit:

          echo 3072 > /proc/sys/fs/file-max

  If your kernel version is between 2.0.35 and 2.1.x (?), you can read
  and write the maximum number of file handles and/or inodes simply by
  accessing the special files:

          /proc/sys/kernel/file-max
          /proc/sys/kernel/inode-max

  While this does increase the current number of file descriptors,
  Squid's configure script probably won't figure out the new value
  unless you also update the include files, specifically the value of
  OPEN_MAX in /usr/include/linux/limits.h.

  11.4.2.  Solaris

  Add the following to your /etc/system file to increase your maximum
  file descriptors per process:

          set rlim_fd_max = 4096

  You should also #define SQUID_FD_SETSIZE in include/config.h to
  whatever you set rlim_fd_max to.

  If you have a very old version of Squid (1.1.X), and you want to use
  more than 1024 descriptors, then you must edit src/Makefile and enable
  $(USE_POLL_OPT).  Then recompile squid.

  Jens-S. Voeckler 
  advises that you should NOT change the soft limit (rlim_fd_cur) to
  anything larger than 256.  It will break other programs, such as the
  license manager needed for the SUN workshop compiler.  Jens-S. also
  says that it should be safe to raise the limit as high as 16,384.

  11.4.3.  IRIX

  For some hints, please see SGI's Tuning IRIX 6.2 for a Web Server
   document.

  11.4.4.  FreeBSD

  by Torsten Sturm 

  1. How do I check my maximum filedescriptors?

     Do sysctl -a and look for the value of kern.maxfilesperproc.

  2. How do I increase them?

             sysctl -w kern.maxfiles=XXXX
             sysctl -w kern.maxfilesperproc=XXXX

  Warning: You probably want maxfiles > maxfilesperproc if you're going
  to be pushing the limit.

  3. What is the upper limit?

     I don't think there is a formal upper limit inside the kernel.  All
     the data structures are dynamically allocated.  In practice there
     might be unintended metaphenomena (kernel spending too much time
     searching tables, for example).

  11.4.5.  General BSD

  For most BSD-derived systems (SunOS, 4.4BSD, OpenBSD, FreeBSD, NetBSD,
  BSD/OS, 386BSD, Ultrix) you can also use the ``brute force'' method to
  increase these values in the kernel (requires a kernel rebuild):

  1. How do I check my maximum filedescriptors?

     Do pstat -T and look for the files value, typically expressed as
     the ratio of currentmaximum/.

  2. How do I increase them the easy way?

     One way is to increase the value of the maxusers variable in the
     kernel configuration file and build a new kernel.  This method is
     quick and easy but also has the effect of increasing a wide variety
     of other variables that you may not need or want increased.

  3. Is there a more precise method?

     Another way is to find the param.c file in your kernel build area
     and change the arithmetic behind the relationship between maxusers
     and the maximum number of open files.

     Here are a few examples which should lead you in the right
     direction:

  1. SunOS

     Change the value of nfile in usr/kvm/sys/conf.common/param.c/tt> by
     altering this equation:

             int     nfile = 16 * (NPROC + 16 + MAXUSERS) / 10 + 64;

  Where NPROC is defined by:

          #define NPROC (10 + 16 * MAXUSERS)

  2. FreeBSD (from the 2.1.6 kernel)

     Very similar to SunOS, edit /usr/src/sys/conf/param.c and alter the
     relationship between maxusers and the maxfiles and maxfilesperproc
     variables:

             int     maxfiles = NPROC*2;
             int     maxfilesperproc = NPROC*2;

  Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) The per-
  process limit can also be adjusted directly in the kernel configura-
  tion file with the following directive: options OPEN_MAX=128

  3. BSD/OS (from the 2.1 kernel)

     Edit /usr/src/sys/conf/param.c and adjust the maxfiles math here:

             int     maxfiles = 3 * (NPROC + MAXUSERS) + 80;

  Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) You
  should also set the OPEN_MAX value in your kernel configuration file
  to change the per-process limit.

  11.4.6.  Reconfigure afterwards

  NOTE: After you rebuild/reconfigure your kernel with more
  filedescriptors, you must then recompile Squid.  Squid's configure
  script determines how many filedescriptors are available, so you must
  make sure the configure script runs again as well.  For example:

      cd squid-1.1.x
          make realclean
          ./configure --prefix=/usr/local/squid
          make

  11.5.  What are these strange lines about removing objects?

  For example:

          97/01/23 22:31:10| Removed 1 of 9 objects from bucket 3913
          97/01/23 22:33:10| Removed 1 of 5 objects from bucket 4315
          97/01/23 22:35:40| Removed 1 of 14 objects from bucket 6391

  These log entries are normal, and do not indicate that squid has
  reached cache_swap_high.

  Consult your cache information page in cachemgr.cgi for a line like
  this:

         Storage LRU Expiration Age:     364.01 days

  Objects which have not been used for that amount of time are removed
  as a part of the regular maintenance.  You can set an upper limit on
  the LRU Expiration Age value with reference_age in the config file.

  11.6.  Can I change a Windows NT FTP server to list directories in
  Unix format?

  Why, yes you can!  Select the following menus:

  o  Start

  o  Programs

  o  Microsoft Internet Server (Common)

  o  Internet Service Manager

  This will bring up a box with icons for your various services. One of
  them should be a little ftp ``folder.'' Double click on this.

  You will then have to select the server (there should only be one)
  Select that and then choose ``Properties'' from the menu and choose
  the ``directories'' tab along the top.

  There will be an option at the bottom saying ``Directory listing
  style.''  Choose the ``Unix'' type, not the ``MS-DOS'' type.

       --Oskar Pearson 

  11.7.  Why am I getting ``Ignoring MISS from non-peer x.x.x.x?''

  You are receiving ICP MISSes (via UDP) from a parent or sibling cache
  whose IP address your cache does not know about.  This may happen in
  two situations.

  1. If the peer is multihomed, it is sending packets out an interface
     which is not advertised in the DNS.  Unfortunately, this is a
     configuration problem at the peer site.  You can tell them to
     either add the IP address interface to their DNS, or use Squid's
     out a specific interface.  For example:

     on your parent squid.conf:
             udp_outgoing_address proxy.parent.com

  on your squid.conf:

          cache_host proxy.parent.com parent 3128 3130

  2. You can also see this warning when sending ICP queries to multicast
     addresses.  For security reasons, Squid requires your configuration
     to list all other caches listening on the multicast group address.
     If an unknown cache listens to that address and sends replies, your
     cache will log the warning message.  To fix this situation, either
     tell the unknown cache to stop listening on the multicast address,
     or if they are legitimate, add them to your configuration file.

  11.8.  DNS lookups for domain names with underscores (_) always fail.

  The standards for naming hosts (RFC 952
  , RFC 1101
  ) do not allow underscores in
  domain names:

       A "name" (Net, Host, Gateway, or Domain name) is a text
       string up to 24 characters drawn from the alphabet (A-Z),
       digits (0-9), minus sign (-), and period (.).

  The resolver library that ships with recent versions of BIND enforces
  this restriction, returning an error for any host with underscore in
  the hostname.  The best solution is to complain to the hostmaster of
  the offending site, and ask them to rename their host.

  Some people have noticed that RFC 1033
   implies that underscores are
  allowed.  However, this is an informational RFC with a poorly chosen
  example, and not a standard by any means.

  11.9.  Why does Squid say: ``Illegal character in hostname; under-
  scores are not allowed?'

  See the above question.  The underscore character is not valid for
  hostnames.

  Some DNS resolvers allow the underscore, so yes, the hostname might
  work fine when you don't use Squid.

  To make Squid allow underscores in hostnames, add this line to
  src/squid.h:

          #define ALLOW_HOSTNAME_UNDERSCORES 1

  and then recompile.

  11.10.  Why am I getting access denied from a sibling cache?

  The answer to this is somewhat complicated, so please hold on.  NOTE:
  most of this text is taken from ICP and the Squid Web Cache
  .

  An ICP query does not include any parent or sibling designation, so
  the receiver really has no indication of how the peer cache is
  configured to use it.  This issue becomes important when a cache is
  willing to serve cache hits to anyone, but only handle cache misses
  for its paying users or customers.  In other words, whether or not to
  allow the request depends on if the result is a hit or a miss.  To
  accomplish this, Squid acquired the miss_access feature in October of
  1996.

  The necessity of ``miss access'' makes life a little bit complicated,
  and not only because it was awkward to implement.  Miss access means
  that the ICP query reply must be an extremely accurate prediction of
  the result of a subsequent HTTP request.  Ascertaining this result is
  actually very hard, if not impossible to do, since the ICP request
  cannot convey the full HTTP request.  Additionally, there are more
  types of HTTP request results than there are for ICP.  The ICP query
  reply will either be a hit or miss.  However, the HTTP request might
  result in a ``304 Not Modified'' reply sent from the origin server.
  Such a reply is not strictly a hit since the peer needed to forward a
  conditional request to the source.  At the same time, its not strictly
  a miss either since the local object data is still valid, and the Not-
  Modified reply is quite small.

  One serious problem for cache hierarchies is mismatched freshness
  parameters.  Consider a cache C using ``strict'' freshness parameters
  so its users get maximally current data.  C has a sibling S with less
  strict freshness parameters.  When an object is requested at C, C
  might find that S already has the object via an ICP query and ICP HIT
  response.  C then retrieves the object from S.

  In an HTTP/1.0 world, C (and C's client) will receive an object that
  was never subject to its local freshness rules.  Neither HTTP/1.0 nor
  ICP provides any way to ask only for objects less than a certain age.
  If the retrieved object is stale by Cs rules, it will be removed from
  Cs cache, but it will subsequently be fetched from S so long as it
  remains fresh there.  This configuration miscoupling problem is a
  significant deterrent to establishing both parent and sibling
  relationships.

  HTTP/1.1 provides numerous request headers to specify freshness
  requirements, which actually introduces a different problem for cache
  hierarchies:  ICP still does not include any age information, neither
  in query nor reply.  So S may return an ICP HIT if its copy of the
  object is fresh by its configuration parameters, but the subsequent
  HTTP request may result in a cache miss due to any Cache-control:
  headers originated by C or by C's client.  Situations now emerge where
  the ICP reply no longer matches the HTTP request result.

  In the end, the fundamental problem is that the ICP query does not
  provide enough information to accurately predict whether the HTTP
  request will be a hit or miss.   In fact, the current ICP Internet
  Draft is very vague on this subject.  What does ICP HIT really mean?
  Does it mean ``I know a little about that URL and have some copy of
  the object?''  Or does it mean ``I have a valid copy of that object
  and you are allowed to get it from me?''

  So, what can be done about this problem?  We really need to change ICP
  so that freshness parameters are included.  Until that happens, the
  members of a cache hierarchy have only two options to totally
  eliminate the ``access denied'' messages from sibling caches:

  1. Make sure all members have the same refresh_rules parameters.

  2. Do not use miss_access at all.  Promise your sibling cache
     administrator that your cache is properly configured and that you
     will not abuse their generosity.  The sibling cache administrator
     can check his log files to make sure you are keeping your word.

     If neither of these is realistic, then the sibling relationship
     should not exist.

  11.11.  Cannot bind socket FD NN to *:8080 (125) Address already in
  use

  This means that another processes is already listening on port 8080
  (or whatever you're using).  It could mean that you have a Squid
  process already running, or it could be from another program.  To
  verify, use the netstat command:

          netstat -naf inet | grep LISTEN

  That will show all sockets in the LISTEN state.  You might also try

          netstat -naf inet | grep 8080

  If you find that some process has bound to your port, but you're not
  sure which process it is, you might be able to use the excellent lsof
   program.  It will show
  you which processes own every open file descriptor on your system.

  11.12.  icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe

  This means that the client socket was closed by the client before
  Squid was finished sending data to it.  Squid detects this by trying
  to read(2) some data from the socket.  If the read(2) call fails, then
  Squid konws the socket has been closed.   Normally the read(2) call
  returns ECONNRESET: Connection reset by peer and these are NOT logged.
  Any other error messages (such as EPIPE: Broken pipe are logged to
  cache.log.  See the ``intro'' of section 2 of your Unix manual for a
  list of all error codes.

  11.13.  icpDetectClientClose: FD 135, 255 unexpected bytes

  These are caused by misbehaving Web clients attempting to use
  persistent connections.  Squid-1.1 does not support persistent
  connections.

  11.14.  How come Squid doesn't work with NTLM Authorization.

  We are not sure. We were unable to find any detailed information on
  NTLM (thanks Microsoft!), but here is our best guess:

  Squid transparently passes the NTLM request and response headers
  between clients and servers.  The encrypted challenge and response
  strings most likely encode the IP address of the client.  Because the
  proxy is passing these strings and is connected with a different IP
  address, the authentication scheme breaks down.  This implies that if
  NTLM authentication works at all with proxy caches, the proxy would
  need to intercept the NTLM headers and process them itself.

  Henrik Nordstrom adds the following information about NTLM:

  NTLM authentication is carried entirely inside the HTTP protocol, but
  is different from Basic authentication in many ways.

  1. It is dependent on the IP addresses of both the server and the
     client, and thus cannot be proxied by a application level proxy
     (not even Microsoft Proxy server).

  2. It is only taking place once per connection, not per request. Once
     the connection is authenticated then all future requests on the
     same connection inherities the authentication. The connection must
     be reestablished to set up other authentication.

  The reasons why it is not implemented in Netscape is probably:

  o  It is very specific for the Windows platform

  o  It is not defined in any RFC or even internet draft.

  o  The protocol has several shortcomings, where the most apparent one
     is that it cannot be proxied.

  o  There exists an open internet standard which does mostly the same
     but without the shortcomings or platform dependencies: Digest
     authentication.

  11.15.  The default  parent option isn't working!

  This message was received at squid-bugs:

       If you have only one parent, configured as:

               cache_host xxxx parent 3128 3130 no-query default

       nothing is sent to the parent; neither UDP packets, nor TCP
       connections.

  Simply adding default to a parent does not force all requests to be
  sent to that parent.  The term default is perhaps a poor choice of
  words.  A default parent is only used as a last resort.  If the cache
  is able to make direct connections, direct will be preferred over
  default.  If you want to force all requests to your parent cache(s),
  use the never_direct option:

          acl all src 0.0.0.0/0.0.0.0
          never_direct allow all

  11.16.  ``Hot Mail'' complains about: Intrusion Logged. Access denied.

  ``Hot Mail'' is proxy-unfriendly and requires all requests to come
  from the same IP address.  You can fix this by adding to your
  squid.conf:

          hierarchy_stoplist hotmail.com

  11.17.  My Squid becomes very slow after it has been running for some
  time.

  This is most likely because Squid is using more memory than it should
  be for your system.  When the Squid process becomes large, it
  experiences a lot of paging.  This will very rapidly degrade the
  performance of Squid.  Memory usage is a complicated problem.  There
  are a number of things to consider.

  First, examine the Cache Manager Info ouput and look at these two
  lines:

          Number of HTTP requests received:  121104
          Page faults with physical i/o:      16720

  Note, if your system does not have the getrusage() function, then you
  will not see the page faults line.

  Divide the number of page faults by the number of connections.  In
  this case 16720/121104 = 0.14.  Ideally this ratio should be in the
  0.0 - 0.1 range.  It may be acceptable to be in the 0.1 - 0.2 range.
  Above that, however, and you will most likely find that Squid's
  performance is unacceptably slow.

  If the ratio is too high, you will need to make some changes to
  ``lower the amount of memory Squid uses''.

  11.18.  WARNING: Failed to start 'dnsserver'

  This could be a permission problem.  Does the Squid userid have
  permission to execute the dnsserver program?

  You might also try testing dnsserver from the command line:

          > echo oceana.nlanr.net | ./dnsserver

  Should produce something like:

          $name oceana.nlanr.net
          $h_name oceana.nlanr.net
          $h_len 4
          $ipcount 1
          132.249.40.200
          $aliascount 0
          $ttl 82067
          $end

  11.19.  Sending in Squid bug reports

  Bug reports for Squid should be sent to the squid-bugs alias
  .  Any bug report must include

  o  The Squid version

  o  Your Operating System type and version

  11.19.1.  crashes and core dumps

  There are two conditions under which squid will exit abnormally and
  generate a coredump.  First, a SIGSEGV or SIGBUS signal will cause
  Squid to exit and dump core.  Second, many functions include
  consistency checks.  If one of those checks fail, Squid calls abort()
  to generate a core dump.

  Many people report that Squid doesn't leave a coredump anywhere.  This
  may be due to one of the following reasons:

  o  Resource Limits.  The shell has limits on the size of a coredump
     file.  You may need to increase the limit.

  o  No debugging symbols.  The Squid binary must have debugging symbols
     in order to get a meaningful coredump.

  o  Threads and Linux.  On Linux, threaded applications do not generate
     core dumps.  When you use --enable-async-io, it uses threads and
     you can't get a coredump.

  o  It did leave a coredump file, you just can't find it.

  Resource Limits: These limits can usually be changed in shell scripts.
  The command to change the resource limits is usually either limit or
  limits.  Sometimes it is a shell-builtin function, and sometimes it is
  a regular program.  Also note that you can set resource limits in the
  /etc/login.conf file on FreeBSD and maybe other BSD systems.

  To change the coredumpsize limit you might use a command like:

          limit coredumpsize unlimited

  or

          limits coredump unlimited

  Debugging Symbols: To see if your Squid binary has debugging symbols,
  use this command:

          % nm /usr/local/squid/bin/squid | head

  The binary has debugging symbols if you see gobbledegook like this:

          0812abec B AS_tree_head
          080a7540 D AclMatchedName
          080a73fc D ActionTable
          080908a4 r B_BYTES_STR
          080908bc r B_GBYTES_STR
          080908ac r B_KBYTES_STR
          080908b4 r B_MBYTES_STR
          080a7550 D Biggest_FD
          08097c0c R CacheDigestHashFuncCount
          08098f00 r CcAttrs

  There are no debugging symbols if you see this instead:

          /usr/local/squid/bin/squid: no symbols

  Debugging symbols may have been removed by your install program.  If
  you look at the squid binary from the source directory, then it might
  have the debugging symbols.

  Coredump Location: The core dump file will be left in one of the
  following locations:

  1. The coredump_dir directory, if you set that option.

  2. The first cache_dir directory if you have used the
     cache_effective_user option.

  3. The current directory when Squid was started

     Recent versions of Squid report their current directory after
     starting, so look there first:

             2000/03/14 00:12:36| Set Current Directory to /usr/local/squid/cache

  If you cannot find a core file, then either Squid does not have per-
  mission to write in its current directory, or perhaps your shell lim-
  its (csh and clones) are preventing the core file from being written.

  Often you can get a coredump if you run Squid from the command line
  like this:

          % limit core un
          % /usr/local/squid/bin/squid -NCd1

  Once you have located the core dump file, use a debugger such as dbx
  or gdb to generate a stack trace:

  tirana-wessels squid/src 270% gdb squid /T2/Cache/core
  GDB is free software and you are welcome to distribute copies of it
   under certain conditions; type "show copying" to see the conditions.
  There is absolutely no warranty for GDB; type "show warranty" for details.
  GDB 4.15.1 (hppa1.0-hp-hpux10.10), Copyright 1995 Free Software Foundation, Inc...
  Core was generated by `squid'.
  Program terminated with signal 6, Aborted.

  [...]

  (gdb) where
  #0  0xc01277a8 in _kill ()
  #1  0xc00b2944 in _raise ()
  #2  0xc007bb08 in abort ()
  #3  0x53f5c in __eprintf (string=0x7b037048 "", expression=0x5f 
, line=8, filename=0x6b
) #4 0x29828 in fd_open (fd=10918, type=3221514150, desc=0x95e4 "HTTP Request") at fd.c:71 #5 0x24f40 in comm_accept (fd=2063838200, peer=0x7b0390b0, me=0x6b) at comm.c:574 #6 0x23874 in httpAccept (sock=33, notused=0xc00467a6) at client_side.c:1691 #7 0x25510 in comm_select_incoming () at comm.c:784 #8 0x25954 in comm_select (sec=29) at comm.c:1052 #9 0x3b04c in main (argc=1073745368, argv=0x40000dd8) at main.c:671 If possible, you might keep the coredump file around for a day or two. It is often helpful if we can ask you to send additional debugger output, such as the contents of some variables. 11.20. Debugging Squid If you believe you have found a non-fatal bug (such as incorrect HTTP processing) please send us a section of your cache.log with debugging to demonstrate the problem. The cache.log file can become very large, so alternatively, you may want to copy it to an FTP or HTTP server where we can download it. It is very simple to enable full debugging on a running squid process. Simply use the -k debug command line option: % ./squid -k debug This causes every debug() statement in the source code to write a line in the cache.log file. You also use the same command to restore Squid to normal debugging. To enable selective debugging (e.g. for one source file only), you need to edit squid.conf and add to the debug_options line. Every Squid source file is assigned a different debugging section. The debugging section assignments can be found by looking at the top of individual source files, or by reading the file doc/debug-levels.txt (correctly renamed to debug-sections.txt for Squid-2). You also specify the debugging level to control the amount of debugging. Higher levels result in more debugging messages. For example, to enable full debugging of Access Control functions, you would use debug_options ALL,1 28,9 Then you have to restart or reconfigure Squid. Once you have the debugging captured to cache.log, take a look at it yourself and see if you can make sense of the behaviour which you see. If not, please feel free to send your debugging output to the squid- users or squid-bugs lists. 11.21. FATAL: ipcache_init: DNS name lookup tests failed Squid normally tests your system's DNS configuration before it starts server requests. Squid tries to resolve some common DNS names, as defined in the dns_testnames configuration directive. If Squid cannot resolve these names, it could mean: 1. your DNS nameserver is unreachable or not running. 2. your /etc/resolv.conf file may contain incorrect information. 3. your /etc/resolv.conf file may have incorrect permissions, and may be unreadable by Squid. To disable this feature, use the -D command line option. Note, Squid does NOT use the dnsservers to test the DNS. The test is performed internally, before the dnsservers start. 11.22. FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied Starting with version 1.1.15, we have required that you first run squid -z to create the swap directories on your filesystem. If you have set the cache_effective_user option, then the Squid process takes on the given userid before making the directories. If the cache_dir direc- tory (e.g. /var/spool/cache) does not exist, and the Squid userid does not have permission to create it, then you will get the ``permission denied'' error. This can be simply fixed by manually creating the cache directory. # mkdir /var/spool/cache # chown /var/spool/cache # squid -z Alternatively, if the directory already exists, then your operating system may be returning ``Permission Denied'' instead of ``File Exists'' on the mkdir() system call. This patch by Miquel van Smoorenburg should fix it. 11.23. FATAL: Cannot open HTTP Port Either (1) the Squid userid does not have permission to bind to the port, or (2) some other process has bound itself to the port. Remember that root privileges are required to open port numbers less than 1024. If you see this message when using a high port number, or even when starting Squid as root, then the port has already been opened by another process. Maybe you are running in the HTTP Accelerator mode and there is already a HTTP server running on port 80? If you're really stuck, install the way cool lsof utility to show you which process has your port in use. 11.24. FATAL: All redirectors have exited! This is explained in the ``Redirector section''. 11.25. FATAL: file_map_allocate: Exceeded filemap limit See the next question. 11.26. FATAL: You've run out of swap file numbers. Note: The information here applies to version 2.2 and earlier. Squid keeps an in-memory bitmap of disk files that are available for use, or are being used. The size of this bitmap is determined at run name, based on two things: the size of your cache, and the average (mean) cache object size. The size of your cache is specified in squid.conf, on the cache_dir lines. The mean object size can also be specified in squid.conf, with the 'store_avg_object_size' directive. By default, Squid uses 13 Kbytes as the average size. When allocating the bitmaps, Squid allocates this many bits: 2 * cache_size / store_avg_object_size So, if you exactly specify the correct average object size, Squid should have 50% filemap bits free when the cache is full. You can see how many filemap bits are being used by looking at the 'storedir' cache manager page. It looks like this: Store Directory #0: /usr/local/squid/cache First level subdirectories: 4 Second level subdirectories: 4 Maximum Size: 1024000 KB Current Size: 924837 KB Percent Used: 90.32% Filemap bits in use: 77308 of 157538 (49%) Flags: Now, if you see the ``You've run out of swap file numbers'' message, then it means one of two things: 1. You've found a Squid bug. 2. Your cache's average file size is much smaller than the 'store_avg_object_size' value. To check the average file size of object currently in your cache, look at the cache manager 'info' page, and you will find a line like: Mean Object Size: 11.96 KB To make the warning message go away, set 'store_avg_object_size' to that value (or lower) and then restart Squid. 11.27. I am using up over 95% of the filemap bits?!! Note: The information here is current for version 2.3 Calm down, this is now normal. Squid now dynamically allocates filemap bits based on the number of objects in your cache. You won't run out of them, we promise. 11.28. FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Per- mission denied In Unix, things like processes and files have an owner. For Squid, the process owner and file owner should be the same. If they are not the same, you may get messages like ``permission denied.'' To find out who owns a file, use the ls -l command: % ls -l /usr/local/squid/logs/access.log A process is normally owned by the user who starts it. However, Unix sometimes allows a process to change its owner. If you specified a value for the effective_user option in squid.conf, then that will be the process owner. The files must be owned by this same userid. If all this is confusing, then you probably should not be running Squid until you learn some more about Unix. As a reference, I suggest Learning the UNIX Operating System, 4th Edition . 11.29. When using a username and password, I can not access some files. If I try by way of a test, to access ftp://username:password@ftpserver/somewhere/foo.tar.gz I get somewhere/foo.tar.gz: Not a directory. Use this URL instead: ftp://username:password@ftpserver/%2fsomewhere/foo.tar.gz 11.30. pingerOpen: icmp_sock: (13) Permission denied This means your pinger program does not have root priveleges. You should either do this: % su # make install-pinger or # chown root /usr/local/squid/bin/pinger # chmod 4755 /usr/local/squid/bin/pinger 11.31. What is a forwarding loop? A forwarding loop is when a request passes through one proxy more than once. You can get a forwarding loop if o a cache forwards requests to itself. This might happen with transparent caching (or server acceleration) configurations. o a pair or group of caches forward requests to each other. This can happen when Squid uses ICP, Cache Digests, or the ICMP RTT database to select a next-hop cache. Forwarding loops are detected by examining the Via request header. Each cache which "touches" a request must add its hostname to the Via header. If a cache notices its own hostname in this header for an incoming request, it knows there is a forwarding loop somewhere. NOTE: A pair of caches which have the same visible_hostname value will report forwarding loops. When Squid detects a forwarding loop, it is logged to the cache.log file with the recieved Via header. From this header you can determine which cache (the last in the list) forwarded the request to you. One way to reduce forwarding loops is to change a parent relationship to a sibling relationship. Another way is to use cache_peer_access rules. For example: # Our parent caches cache_peer A.example.com parent 3128 3130 cache_peer B.example.com parent 3128 3130 cache_peer C.example.com parent 3128 3130 # An ACL list acl PEERS src A.example.com acl PEERS src B.example.com acl PEERS src C.example.com # Prevent forwarding loops cache_peer_access A.example.com allow !PEERS cache_peer_access B.example.com allow !PEERS cache_peer_access C.example.com allow !PEERS The above configuration instructs squid to NOT forward a request to parents A, B, or C when a request is received from any one of those caches. 11.32. accept failure: (71) Protocol error This error message is seen mostly on Solaris systems. Mark Kennedy gives a great explanation: Error 71 [EPROTO] is an obscure way of reporting that clients made it onto your server's TCP incoming connection queue but the client tore down the connection before the server could accept it. I.e. your server ignored its clients for too long. We've seen this happen when we ran out of file descriptors. I guess it could also happen if something made squid block for a long time. 11.33. storeSwapInFileOpened: ... Size mismatch Got these messages in my cache log - I guess it means that the index contents do not match the contents on disk. 1998/09/23 09:31:30| storeSwapInFileOpened: /var/cache/00/00/00000015: Size mismatch: 776(fstat) != 3785(object) 1998/09/23 09:31:31| storeSwapInFileOpened: /var/cache/00/00/00000017: Size mismatch: 2571(fstat) != 4159(object) What does Squid do in this case? NOTE, these messages are specific to Squid-2. These happen when Squid reads an object from disk for a cache hit. After it opens the file, Squid checks to see if the size is what it expects it should be. If the size doesn't match, the error is printed. In this case, Squid does not send the wrong object to the client. It will re-fetch the object from the source. 11.34. Why do I get fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' These messages are caused by buggy clients, mostly Netscape Navigator. What happens is, Netscape sends an HTTPS/SSL request over a persistent HTTP connection. Normally, when Squid gets an SSL request, it looks like this: CONNECT www.buy.com:443 HTTP/1.0 Then Squid opens a TCP connection to the destination host and port, and the real request is sent encrypted over this connection. Thats the whole point of SSL, that all of the information must be sent encrypted. With this client bug, however, Squid receives a request like this: GET https://www.buy.com/corp/ordertracking.asp HTTP/1.0 Accept: */* User-agent: Netscape ... ... Now, all of the headers, and the message body have been sent, unen- crypted to Squid. There is no way for Squid to somehow turn this into an SSL request. The only thing we can do is return the error message. Note, this browser bug does represent a security risk because the browser is sending sensitive information unencrypted over the network. 11.35. Squid can't access URLs like http://3626046468/ab2/cyber- cards/moreinfo.html by Dave J Woolley (DJW at bts dot co dot uk) These are illegal URLs, generally only used by illegal sites; typically the web site that supports a spammer and is expected to survive a few hours longer than the spamming account. Their intention is to: o confuse content filtering rules on proxies, and possibly some browsers' idea of whether they are trusted sites on the local intranet; o confuse whois (?); o make people think they are not IP addresses and unknown domain names, in an attempt to stop them trying to locate and complain to the ISP. Any browser or proxy that works with them should be considered a security risk. RFC 1738 has this to say about the hostname part of a URL: The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses. 11.36. I get a lot of ``URI has whitespace'' error messages in my cache log, what should I do? Whitespace characters (space, tab, newline, carriage return) are not allowed in URI's and URL's. Unfortunately, a number of Web services generate URL's with whitespace. Of course your favorite browser silently accomodates these bad URL's. The servers (or people) that generate these URL's are in violation of Internet standards. The whitespace characters should be encoded. If you want Squid to accept URL's with whitespace, you have to decide how to handle them. There are four choices that you can set with the uri_whitespace option: 1. DENY: The request is denied with an ``Invalid Request'' message. This is the default. 2. ALLOW: The request is allowed and the URL remains unchanged. 3. ENCODE: The whitespace characters are encoded according to RFC 1738 . This can be considered a violation of the HTTP specification. 4. CHOP: The URL is chopped at the first whitespace character and then processed normally. This also can be considered a violation of HTTP. 11.37. commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address This likely means that your system does not have a loopback network device, or that device is not properly configured. All Unix systems should have a network device named lo0, and it should be configured with the address 127.0.0.1. If not, you may get the above error message. To check your system, run: % ifconfig lo0 The result should look something like: lo0: flags=8049 mtu 16384 inet 127.0.0.1 netmask 0xff000000 If you use FreeBSD, see ``this''. 11.38. Unknown cache_dir type '/var/squid/cache' The format of the cache_dir option changed with version 2.3. It now takes a type argument. All you need to do is insert ufs in the line, like this: cache_dir ufs /var/squid/cache ... 11.39. unrecognized: 'cache_dns_program /usr/local/squid/bin/dnsserver' As of Squid 2.3, the default is to use internal DNS lookup code. The cache_dns_program and dns_children options are not known squid.conf directives in this case. Simply comment out these two options. If you want to use external DNS lookups, with the dnsserver program, then add this to your configure command: --disable-internal-dns 12. How does Squid work? 12.1. What are cachable objects? An Internet Object is a file, document or response to a query for an Internet service such as FTP, HTTP, or gopher. A client requests an Internet object from a caching proxy; if the object is not already cached, the proxy server fetches the object (either from the host specified in the URL or from a parent or sibling cache) and delivers it to the client. 12.2. What is the ICP protocol? ICP is a protocol used for communication among squid caches. The ICP protocol is defined in two Internet RFC's. RFC 2186 describes the protocol itself, while RFC 2187 describes the application of ICP to hierarchical Web caching. ICP is primarily used within a cache hierarchy to locate specific objects in sibling caches. If a squid cache does not have a requested document, it sends an ICP query to its siblings, and the siblings respond with ICP replies indicating a ``HIT'' or a ``MISS.'' The cache then uses the replies to choose from which cache to resolve its own MISS. ICP also supports multiplexed transmission of multiple object streams over a single TCP connection. ICP is currently implemented on top of UDP. Current versions of Squid also support ICP via multicast. 12.3. What is the dnsserver ? The dnsserver is a process forked by squid to resolve IP addresses from domain names. This is necessary because the gethostbyname(3) function blocks the calling process until the DNS query is completed. Squid must use non-blocking I/O at all times, so DNS lookups are implemented external to the main process. The dnsserver processes do not cache DNS lookups, that is implemented inside the squid process. 12.4. What is the ftpget program for? ftpget exists only in Squid 1.1 and Squid 1.0 versions. The ftpget program is an FTP client used for retrieving files from FTP servers. Because the FTP protocol is complicated, it is easier to implement it separately from the main squid code. 12.5. FTP PUT's don't work! FTP PUT should work with Squid-2.0 and later versions. If you are using Squid-1.1, then you need to upgrade before PUT will work. 12.6. What is a cache hierarchy? What are parents and siblings? A cache hierarchy is a collection of caching proxy servers organized in a logical parent/child and sibling arrangement so that caches closest to Internet gateways (closest to the backbone transit entry- points) act as parents to caches at locations farther from the backbone. The parent caches resolve ``misses'' for their children. In other words, when a cache requests an object from its parent, and the parent does not have the object in its cache, the parent fetches the object, caches it, and delivers it to the child. This ensures that the hierarchy achieves the maximum reduction in bandwidth utilization on the backbone transit links, helps reduce load on Internet information servers outside the network served by the hierarchy, and builds a rich cache on the parents so that the other child caches in the hierarchy will obtain better ``hit'' rates against their parents. In addition to the parent-child relationships, squid supports the notion of siblings: caches at the same level in the hierarchy, provided to distribute cache server load. Each cache in the hierarchy independently decides whether to fetch the reference from the object's home site or from parent or sibling caches, using a a simple resolution protocol. Siblings will not fetch an object for another sibling to resolve a cache ``miss.'' 12.7. What is the Squid cache resolution algorithm? o Send ICP queries to all appropriate siblings o Wait for all replies to arrive with a configurable timeout (the default is two seconds). o Begin fetching the object upon receipt of the first HIT reply, or o Fetch the object from the first parent which replied with MISS (subject to weighting values), or o Fetch the object from the source The algorithm is somewhat more complicated when firewalls are involved. The single_parent_bypass directive can be used to skip the ICP queries if the only appropriate sibling is a parent cache (i.e., if there's only one place you'd fetch the object from, why bother querying?) 12.8. What features are Squid developers currently working on? There are several open issues for the caching project namely more automatic load balancing and (both configured and dynamic) selection of parents, routing, multicast cache-to-cache communication, and better recognition of URLs that are not worth caching. For our other to-do list items, please see our ``TODO'' file in the recent source distributions. Prospective developers should review the resources available at the Squid developers corner 12.9. Tell me more about Internet traffic workloads Workload can be characterized as the burden a client or group of clients imposes on a system. Understanding the nature of workloads is important to the managing system capacity. If you are interested in Internet traffic workloads then NLANR's Network Analysis activities is a good place to start. 12.10. What are the tradeoffs of caching with the NLANR cache system? The NLANR root caches are at the NSF supercomputer centers (SCCs), which are interconnected via NSF's high speed backbone service (vBNS). So inter-cache communication between the NLANR root caches does not cross the Internet. The benefits of hierarchical caching (namely, reduced network bandwidth consumption, reduced access latency, and improved resiliency) come at a price. Caches higher in the hierarchy must field the misses of their descendents. If the equilibrium hit rate of a leaf cache is 50%, half of all leaf references have to be resolved through a second level cache rather than directly from the object's source. If this second level cache has most of the documents, it is usually still a win, but if higher level caches often don't have the document, or become overloaded, then they could actually increase access latency, rather than reduce it. 12.11. Where can I find out more about firewalls? Please see the Firewalls mailing list and FAQ information site. 12.12. What is the ``Storage LRU Expiration Age?'' For example: Storage LRU Expiration Age: 4.31 days The LRU expiration age is a dynamically-calculated value. Any objects which have not been accessed for this amount of time will be removed from the cache to make room for new, incoming objects. Another way of looking at this is that it would take your cache approximately this many days to go from empty to full at your current traffic levels. As your cache becomes more busy, the LRU age becomes lower so that more objects will be removed to make room for the new ones. Ideally, your cache will have an LRU age value in the range of at least 3 days. If the LRU age is lower than 3 days, then your cache is probably not big enough to handle the volume of requests it receives. By adding more disk space you could increase your cache hit ratio. The configuration parameter reference_age places an upper limit on your cache's LRU expiration age. 12.13. What is ``Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes''? Consider a pair of caches named A and B. It may be the case that A can reach B, and vice-versa, but B has poor reachability to the rest of the Internet. In this case, we would like B to recognize that it has poor reachability and somehow convey this fact to its neighbor caches. Squid will track the ratio of failed-to-successful requests over short time periods. A failed request is one which is logged as ERR_DNS_FAIL, ERR_CONNECT_FAIL, or ERR_READ_ERROR. When the failed- to-successful ratio exceeds 1.0, then Squid will return ICP_MISS_NOFETCH instead of ICP_MISS to neighbors. Note, Squid will still return ICP_HIT for cache hits. 12.14. Does squid periodically re-read its configuration file? No, you must send a HUP signal to have Squid re-read its configuration file, including access control lists. An easy way to do this is with the -k command line option: squid -k reconfigure 12.15. How does unlinkd work? unlinkd is an external process used for unlinking unused cache files. Performing the unlink operation in an external process opens up some race-condition problems for Squid. If we are not careful, the following sequence of events could occur: 1. An object with swap file number S is removed from the cache. 2. We want to unlink file F which corresponds to swap file number S, so we write pathname F to the unlinkd socket. We also mark S as available in the filemap. 3. We have a new object to swap out. It is allocated to the first available file number, which happens to be S. Squid opens file F for writing. 4. The unlinkd process reads the request to unlink F and issues the actual unlink call. So, the problem is, how can we guarantee that unlinkd will not remove a cache file that Squid has recently allocated to a new object? The approach we have taken is to have Squid keep a stack of unused (but not deleted!) swap file numbers. The stack size is hard-coded at 128 entries. We only give unlink requests to unlinkd when the unused file number stack is full. Thus, if we ever have to start unlinking files, we have a pool of 128 file numbers to choose from which we know will not be removed by unlinkd. In terms of implementation, the only way to send unlink requests to the unlinkd process is via the storePutUnusedFileno function. Unfortunately there are times when Squid can not use the unlinkd process but must call unlink(2) directly. One of these times is when the cache swap size is over the high water mark. If we push the released file numbers onto the unused file number stack, and the stack is not full, then no files will be deleted, and the actual disk usage will remain unchanged. So, when we exceed the high water mark, we must call unlink(2) directly. 12.16. What is an icon URL? One of the most unpleasant things Squid must do is generate HTML pages of Gopher and FTP directory listings. For some strange reason, people like to have little icons next to each listing entry, denoting the type of object to which the link refers (image, text file, etc.). In Squid 1.0 and 1.1, we used internal browser icons with names like gopher-internal-image. Unfortunately, these were not very portable. Not all browsers had internal icons, or even used the same names. Perhaps only Netscape and Mosaic used these names. For Squid 2 we include a set of icons in the source distribution. These icon files are loaded by Squid as cached objects at runtime. Thus, every Squid cache now has its own icons to use in Gopher and FTP listings. Just like other objects available on the web, we refer to the icons with Uniform Resource Locators , or URLs. 12.17. Can I make my regular FTP clients use a Squid cache? Nope, its not possible. Squid only accepts HTTP requests. It speaks FTP on the server-side, but not on the client-side. The very cool wget will download FTP URLs via Squid (and probably any other proxy cache). 12.18. Why is the select loop average time so high? Is there any way to speed up the time spent dealing with select? Cachemgr shows: Select loop called: 885025 times, 714.176 ms avg This number is NOT how much time it takes to handle filedescriptor I/O. We simply count the number of times select was called, and divide the total process running time by the number of select calls. This means, on average it takes your cache .714 seconds to check all the open file descriptors once. But this also includes time select() spends in a wait state when there is no I/O on any file descriptors. My relatively idle workstation cache has similar numbers: Select loop called: 336782 times, 715.938 ms avg But my busy caches have much lower times: Select loop called: 16940436 times, 10.427 ms avg Select loop called: 80524058 times, 10.030 ms avg Select loop called: 10590369 times, 8.675 ms avg Select loop called: 84319441 times, 9.578 ms avg 12.19. How does Squid deal with Cookies? The presence of Cookies headers in requests does not affect whether or not an HTTP reply can be cached. Similarly, the presense of Set- Cookie headers in replies does not affect whether the reply can be cached. The proper way to deal with Set-Cookie reply headers, according to RFC 2109 is to cache the whole object, EXCEPT the Set-Cookie header lines. With Squid-1.1, we can not filter out specific HTTP headers, so Squid-1.1 does not cache any response which contains a Set-Cookie header. With Squid-2, however, we can filter out specific HTTP headers. But instead of filtering them on the receiving-side, we filter them on the sending-side. Thus, Squid-2 does cache replies with Set-Cookie headers, but it filters out the Set-Cookie header itself for cache hits. 12.20. How does Squid decide when to refresh a cached object? When checking the object freshness, we calculate these values: o OBJ_DATE is the time when the object was given out by the origin server. This is taken from the HTTP Date reply header. o OBJ_LASTMOD is the time when the object was last modified, given by the HTTP Last-Modified reply header. o OBJ_AGE is how much the object has aged since it was retrieved: OBJ_AGE = NOW - OBJ_DATE o LM_AGE is how old the object was when it was retrieved: LM_AGE = OBJ_DATE - OBJ_LASTMOD o LM_FACTOR is the ratio of OBJ_AGE to LM_AGE: LM_FACTOR = OBJ_AGE / LM_AGE o CLIENT_MAX_AGE is the (optional) maximum object age the client will accept as taken from the HTTP/1.1 Cache-Control request header. o EXPIRES is the (optional) expiry time from the server reply headers. These values are compared with the parameters of the refresh_pattern rules. The refresh parameters are: o URL regular expression o CONF_MIN: The time (in minutes) an object without an explicit expiry time should be considered fresh. The recommended value is 0, any higher values may cause dynamic applications to be erronously cached unless the application designer has taken the appropriate actions. o CONF_PERCENT: A percentage of the objects age (time since last modification age) an object without explicit exipry time will be considered fresh. o CONF_MAX: An upper limit on how long objects without an explicit expiry time will be considered fresh. The URL regular expressions are checked in the order listed until a match is found. Then the algorithms below are applied for determining if an object is fresh or stale. 12.20.1. Squid-1.1 and Squid-1.NOVM algorithm if (CLIENT_MAX_AGE) if (OBJ_AGE > CLIENT_MAX_AGE) return STALE if (OBJ_AGE <= CONF_MIN) return FRESH if (EXPIRES) { if (EXPIRES <= NOW) return STALE else return FRESH } if (OBJ_AGE > CONF_MAX) return STALE if (LM_FACTOR < CONF_PERCENT) return FRESH return STALE Kolics Bertold has made an excellent flow chart diagram showing this process. 12.20.2. Squid-2 algorithm For Squid-2 the refresh algorithm has been slightly modified to give the EXPIRES value a higher precedence, and the CONF_MIN value lower precedence: if (CLIENT_MAX_AGE) if (OBJ_AGE > CLIENT_MAX_AGE) return STALE if (EXPIRES) { if (EXPIRES <= NOW) return STALE else return FRESH } if (OBJ_AGE > CONF_MAX) return STALE if (OBJ_DATE > OBJ_LASTMOD) { if (LM_FACTOR < CONF_PERCENT) return FRESH else return STALE } if (OBJ_AGE <= CONF_MIN) return FRESH return STALE 12.21. What exactly is a deferred read ? The cachemanager I/O page lists deferred reads for various server-side protocols. Sometimes reading on the server-side gets ahead of writing to the client-side. Especially if your cache is on a fast network and your clients are connected at modem speeds. Squid-1.1 will read up to 256k (per request) ahead before it starts to defer the server-side reads. 12.22. Why is my cache's inbound traffic equal to the outbound traf- fic? I've been monitoring the traffic on my cache's ethernet adapter an found a behavior I can't explain: the inbound traffic is equal to the outbound traffic. The differences are negligible. The hit ratio reports 40%. Shouldn't the outbound be at least 40% greater than the inbound? by David J N Begley I can't account for the exact behavior you're seeing, but I can offer this advice; whenever you start measuring raw Ethernet or IP traffic on interfaces, you can forget about getting all the numbers to exactly match what Squid reports as the amount of traffic it has sent/received. Why? Squid is an application - it counts whatever data is sent to, or received from, the lower-level networking functions; at each successively lower layer, additional traffic is involved (such as header overhead, retransmits and fragmentation, unrelated broadcasts/traffic, etc.). The additional traffic is never seen by Squid and thus isn't counted - but if you run MRTG (or any SNMP/RMON measurement tool) against a specific interface, all this additional traffic will "magically appear". Also remember that an interface has no concept of upper-layer networking (so an Ethernet interface doesn't distinguish between IP traffic that's entirely internal to your organization, and traffic that's to/from the Internet); this means that when you start measuring an interface, you have to be aware of *what* you are measuring before you can start comparing numbers elsewhere. It is possible (though by no means guaranteed) that you are seeing roughly equivalent input/output because you're measuring an interface that both retrieves data from the outside world (Internet), *and* serves it to end users (internal clients). That wouldn't be the whole answer, but hopefully it gives you a few ideas to start applying to your own circumstance. To interpret any statistic, you have to first know what you are measuring; for example, an interface counts inbound and outbound bytes - that's it. The interface doesn't distinguish between inbound bytes from external Internet sites or from internal (to the organization) clients (making requests). If you want that, try looking at RMON2. Also, if you're talking about a 40% hit rate in terms of object requests/counts then there's absolutely no reason why you should expect a 40% reduction in traffic; after all, not every request/object is going to be the same size so you may be saving a lot in terms of requests but very little in terms of actual traffic. 12.23. How come some objects do not get cached? To determine whether a given object may be cached, Squid takes many things into consideration. The current algorithm (for Squid-2) goes something like this: 1. Responses with Cache-Control: Private are NOT cachable. 2. Responses with Cache-Control: No-Cache are NOT cachable. 3. Responses with Cache-Control: No-Store are NOT cachable. 4. Responses for requests with an Authorization header are cachable ONLY if the reponse includes Cache-Control: Public. 5. Responses with Vary headers are NOT cachable because Squid does not yet support Vary features. 6. The following HTTP status codes are cachable: o 200 OK o 203 Non-Authoritative Information o 300 Multiple Choices o 301 Moved Permanently o 410 Gone However, if Squid receives one of these responses from a neighbor cache, it will NOT be cached if ALL of the Date, Last-Modified, and Expires reply headers are missing. This prevents such objects from bouncing back-and-forth between siblings forever. 7. A 302 Moved Temporarily response is cachable ONLY if the response also includes an Expires header. 8. The following HTTP status codes are ``negatively cached'' for a short amount of time (configurable): o 204 No Content o 305 Use Proxy o 400 Bad Request o 403 Forbidden o 404 Not Found o 405 Method Not Allowed o 414 Request-URI Too Large o 500 Internal Server Error o 501 Not Implemented o 502 Bad Gateway o 503 Service Unavailable o 504 Gateway Time-out 9. All other HTTP status codes are NOT cachable, including: o 206 Partial Content o 303 See Other o 304 Not Modified o 401 Unauthorized o 407 Proxy Authentication Required 12.24. What does keep-alive ratio mean? The keep-alive ratio shows up in the server_list cache manager page for Squid 2. This is a mechanism to try detecting neighbor caches which might not be able to deal with HTTP/1.1 persistent connections. Every time we send a proxy-connection: keep-alive request header to a neighbor, we count how many times the neighbor sent us a proxy-connection: keep- alive reply header. Thus, the keep-alive ratio is the ratio of these two counters. If the ratio stays above 0.5, then we continue to assume the neighbor properly implements persistent connections. Otherwise, we will stop sending the keep-alive request header to that neighbor. 12.25. How does Squid's cache replacement algorithm work? Squid uses an LRU (least recently used) algorithm to replace old cache objects. This means objects which have not been accessed for the longest time are removed first. In the source code, the StoreEntry->lastref value is updated every time an object is accessed. Objects are not necessarily removed ``on-demand.'' Instead, a regularly scheduled event runs to periodically remove objects. Normally this event runs every second. Squid keeps the cache disk usage between the low and high water marks. By default the low mark is 90%, and the high mark is 95% of the total configured cache size. When the disk usage is close to the low mark, the replacement is less aggressive (fewer objects removed). When the usage is close to the high mark, the replacement is more aggressive (more objects removed). When selecting objects for removal, Squid examines some number of objects and determines which can be removed and which cannot. A number of factors determine whether or not any given object can be removed. If the object is currently being requested, or retrieved from an upstream site, it will not be removed. If the object is ``negatively-cached'' it will be removed. If the object has a private cache key, it will be removed (there would be no reason to keep it -- because the key is private, it can never be ``found'' by subsequent requests). Finally, if the time since last access is greater than the LRU threshold, the object is removed. The LRU threshold value is dynamically calculated based on the current cache size and the low and high marks. The LRU threshold scaled exponentially between the high and low water marks. When the store swap size is near the low water mark, the LRU threshold is large. When the store swap size is near the high water mark, the LRU threshold is small. The threshold automatically adjusts to the rate of incoming requests. In fact, when your cache size has stabilized, the LRU threshold represents how long it takes to fill (or fully replace) your cache at the current request rate. Typical values for the LRU threshold are 1 to 10 days. Back to selecting objects for removal. Obviously it is not possible to check every object in the cache every time we need to remove some of them. We can only check a small subset each time. The way in which this is implemented is very different between Squid-1.1 and Squid-2. 12.25.1. Squid 1.1 The Squid cache storage is implemented as a hash table with some number of "hash buckets." Squid-1.1 scans one bucket at a time and sorts all the objects in the bucket by their LRU age. Objects with an LRU age over the threshold are removed. The scan rate is adjusted so that it takes approximately 24 hours to scan the entire cache. The store buckets are randomized so that we don't always scan the same buckets at the same time of the day. This algorithm has some flaws. Because we only scan one bucket, there are going to be better candidates for removal in some of the other 16,000 or so buckets. Also, the qsort() function might take a non- trivial amount of CPU time, depending on how many entries are in each bucket. 12.25.2. Squid 2 For Squid-2 we eliminated the need to use qsort() by indexing cached objects into an automatically sorted linked list. Every time an object is accessed, it gets moved to the top of the list. Over time, the least used objects migrate to the bottom of the list. When looking for objects to remove, we only need to check the last 100 or so objects in the list. Unfortunately this approach increases our memory usage because of the need to store three additional pointers per cache object. But for Squid-2 we're still ahead of the game because we also replaced plain-text cache keys with MD5 hashes. 12.26. What are private and public keys? keys refers to the database keys which Squid uses to index cache objects. Every object in the cache--whether saved on disk or currently being downloaded--has a cache key. For Squid-1.0 and Squid-1.1 the cache key was basically the URL. Squid-2 uses MD5 checksums for cache keys. The Squid cache uses the notions of private and public cache keys. An object can start out as being private, but may later be changed to public status. Private objects are associated with only a single client whereas a public object may be sent to multiple clients at the same time. In other words, public objects can be located by any cache client. Private keys can only be located by a single client--the one who requested it. Objects are changed from private to public after all of the HTTP reply headers have been received and parsed. In some cases, the reply headers will indicate the object should not be made public. For example, if the no-cache Cache-Control directive is used. 12.27. What is FORW_VIA_DB for? We use it to collect data for Plankton . 12.28. Does Squid send packets to port 7 (echo)? If so, why? It may. This is an old feature from the Harvest cache software. The cache would send ICP ``SECHO'' message to the echo ports of origin servers. If the SECHO message came back before any of the other ICP replies, then it meant the origin server was probably closer than any neighbor cache. In that case Harvest/Squid sent the request directly to the origin server. With more attention focused on security, many administrators filter UDP packets to port 7. The Computer Emergency Response Team (CERT) once issued an advisory note (CA-96.01: UDP Port Denial-of-Service Attack ) that says UDP echo and chargen services can be used for a denial of service attack. This made admins extremely nervous about any packets hitting port 7 on their systems, and they made complaints. The source_ping feature has been disabled in Squid-2. If you're seeing packets to port 7 that are coming from a Squid cache (remote port 3130), then its probably a very old version of Squid. 12.29. What does ``WARNING: Reply from unknown nameserver [a.b.c.d]'' mean? It means Squid sent a DNS query to one IP address, but the response came back from a different IP address. By default Squid checks that the addresses match. If not, Squid ignores the response. There are a number of reasons why this would happen: 1. Your DNS name server just works this way, either becuase its been configured to, or because its stupid and doesn't know any better. 2. You have a weird broadcast address, like 0.0.0.0, in your /etc/resolv.conf file. 3. Somebody is trying to send spoofed DNS responses to your cache. If you recognize the IP address in the warning as one of your name server hosts, then its probably numbers (1) or (2). You can make these warnings stop, and allow responses from ``unknown'' name servers by setting this configuration option: ignore_unknown_nameservers off 12.30. How does Squid distribute cache files among the available directories? Note: The information here is current for version 2.2. See storeDirMapAllocate() in the source code. When Squid wants to create a new disk file for storing an object, it first selects which cache_dir the object will go into. This is done with the storeDirSelectSwapDir() function. If you have N cache directories, the function identifies the 3N/4 (75%) of them with the most available space. These directories are then used, in order of having the most available space. When Squid has stored one URL to each of the 3N/4 cache_dir's, the process repeats and storeDirSelectSwapDir() finds a new set of 3N/4 cache directories with the most available space. Once the cache_dir has been selected, the next step is to find an available swap file number. This is accomplished by checking the file map, with the file_map_allocate() function. Essentially the swap file numbers are allocated sequentially. For example, if the last number allocated happens to be 1000, then the next one will be the first number after 1000 that is not already being used. 12.31. Why do I see negative byte hit ratio? Byte hit ratio is calculated a bit differently than Request hit ratio. Squid counts the number of bytes read from the network on the server- side, and the number of bytes written to the client-side. The byte hit ratio is calculated as (client_bytes - server_bytes) / client_bytes If server_bytes is greater than client_bytes, you end up with a nega- tive value. The server_bytes may be greater than client_bytes for a number of reasons, including: o Cache Digests and other internally generated requests. Cache Digest messages are quite large. They are counted in the server_bytes, but since they are consumed internally, they do not count in client_bytes. o User-aborted requests. If your quick_abort setting allows it, Squid sometimes continues to fetch aborted requests from the server-side, without sending any data to the client-side. 12.32. What does ``Disabling use of private keys'' mean? First you need to understand the ``difference between public and private keys''. When Squid sends ICP queries, it uses the ICP reqnum field to hold the private key data. In other words, when Squid gets an ICP reply, it uses the reqnum value to build the private cache key for the pending object. Some ICP implementations always set the reqnum field to zero when they send a reply. Squid can not use private cache keys with such neighbor caches because Squid will not be able to locate cache keys for those ICP replies. Thus, if Squid detects a neighbor cache that sends zero reqnum's, it disables the use of private cache keys. Not having private cache keys has some important privacy implications. Two users could receive one response that was meant for only one of the users. This response could contain personal, confidential information. You will need to disable the ``zero reqnum'' neighbor if you want Squid to use private cache keys. 12.33. What is a half-closed filedescriptor? TCP allows connections to be in a ``half-closed'' state. This is accomplished with the shutdown(2) system call. In Squid, this means that a client has closed its side of the connection for writing, but leaves it open for reading. Half-closed connections are tricky because Squid can't tell the difference between a half-closed connection, and a fully closed one. If Squid tries to read a connection, and read() returns 0, and Squid knows that the client doesn't have the whole response yet, Squid puts marks the filedescriptor as half-closed. Most likely the client has aborted the request and the connection is really closed. However, there is a slight chance that the client is using the shutdown() call, and that it can still read the response. To disable half-closed connections, simply put this in squid.conf: half_closed_clients off Then, Squid will always close its side of the connection instead of marking it as half-closed. 12.34. What does --enable-heap-replacement do? Squid has traditionally used an LRU replacement algorithm. As of version 2.3 , you can use some other replacement algorithms by using the --enable-heap-replacement configure option. Currently, the heap replacement code supports two additional algorithms: LFUDA, and GDS. The heap replacement code was contributed by John Dilley and others from Hewlett-Packard. Their work is described in these papers: 1. Enhancement and Validation of Squid's Cache Replacement Policy (HP Tech Report). 2. Enhancement and Validation of the Squid Cache Replacement Policy (WCW 1999 paper). 12.35. Why is actual filesystem space used greater than what Squid thinks? If you compare df output and cachemgr storedir output, you will notice that actual disk usage is greater than what Squid reports. This may be due to a number of reasons: o Squid doesn't keep track of the size of the swap.state file, which normally resides on each cache_dir. o Directory entries and take up filesystem space. o Other applications might be using the same disk partition. o Your filesystem block size might be larger than what Squid thinks. When calculating total disk usage, Squid rounds file sizes up to a whole number of 1024 byte blocks. If your filesystem uses larger blocks, then some "wasted" space is not accounted. 12.36. How do positive_dns_ttl and negative_dns_ttl work? positive_dns_ttl is how long Squid caches a successful DNS lookup. Similarly, negative_dns_ttl is how long Squid caches a failed DNS lookup. positive_dns_ttl is not always used. It is NOT used in the following cases: o Squid-2.3 and later versions with internal DNS lookups. Internal lookups are the default for Squid-2.3 and later. o If you applied the ``DNS TTL'' ``patch'' for BIND. o If you are using FreeBSD, then it already has the DNS TTL patch built in. Let's say you have the following settings: positive_dns_ttl 1 hours negative_dns_ttl 1 minutes When Squid looks up a name like www.squid-cache.org, it gets back an IP address like 204.144.128.89. The address is cached for the next hour. That means, when Squid needs to know the address for www.squid- cache.org again, it uses the cached answer for the next hour. After one hour, the cached information expires, and Squid makes a new query for the address of www.squid-cache.org. If you have the DNS TTL patch, or are using internal lookups, then each hostname has its own TTL value, which was set by the domain name administrator. You can see these values in the 'ipcache' cache manager page. For example: Hostname Flags lstref TTL N www.squid-cache.org C 73043 12784 1( 0) 204.144.128.89-OK www.ircache.net C 73812 10891 1( 0) 192.52.106.12-OK polygraph.ircache.net C 241768 -181261 1( 0) 192.52.106.12-OK The TTL field shows how how many seconds until the entry expires. Negative values mean the entry is already expired, and will be refreshed upon next use. The negative_dns_ttl specifies how long to cache failed DNS lookups. When Squid fails to resolve a hostname, you can be pretty sure that it is a real failure, and you are not likely to get a successful answer within a short time period. Squid retries its lookups many times before declaring a lookup has failed. If you like, you can set negative_dns_ttl to zero. 13. Multicast 13.1. What is Multicast? Multicast is essentially the ability to send one IP packet to multiple receivers. Multicast is often used for audio and video conferencing systems. You often hear about the Mbone in reference to Multicast. The Mbone is essentially a ``virtual backbone'' which exists in the Internet itself. If you want to send and/or receive Multicast, you need to be ``on the Mbone.'' 13.2. How do I know if I'm on the Mbone? One way is to ask someone who manages your network. If your network manager doesn't know, or looks at you funny, then you are very likely NOT on the Mbone Another way is to use the mtrace program, which can be found on the Xerox PARC FTP site . Mtrace is similar to traceroute. It will tell you about the multicast path between your site and another. For example: > mtrace mbone.ucar.edu mtrace: WARNING: no multicast group specified, so no statistics printed Mtrace from 128.117.64.29 to 192.172.226.25 via group 224.2.0.1 Querying full reverse path... * switching to hop-by-hop: 0 oceana-ether.nlanr.net (192.172.226.25) -1 avidya-ether.nlanr.net (192.172.226.57) DVMRP thresh^ 1 -2 mbone.sdsc.edu (198.17.46.39) DVMRP thresh^ 1 -3 * nccosc-mbone.dren.net (138.18.5.224) DVMRP thresh^ 48 -4 * * FIXW-MBONE.NSN.NASA.GOV (192.203.230.243) PIM/Special thresh^ 64 -5 dec3800-2-fddi-0.SanFrancisco.mci.net (204.70.158.61) DVMRP thresh^ 64 -6 dec3800-2-fddi-0.Denver.mci.net (204.70.152.61) DVMRP thresh^ 1 -7 mbone.ucar.edu (192.52.106.7) DVMRP thresh^ 64 -8 mbone.ucar.edu (128.117.64.29) Round trip time 196 ms; total ttl of 68 required. If you think you need to be on the Mbone, this is how you can join . 13.3. Should I be using Multicast ICP? Short answer: No, probably not. Reasons why you SHOULD use Multicast: 1. It reduces the number of times Squid calls sendto() to put a UDP packet onto the network. 2. Its trendy and cool to use Multicast. Reasons why you SHOULD NOT use Multicast: 1. Multicast tunnels/configurations/infrastructure are often unstable. You may lose multicast connectivity but still have unicast connectivity. 2. Multicast does not simplify your Squid configuration file. Every trusted neighbor cache must still be specified. 3. Multicast does not reduce the number of ICP replies being sent around. It does reduce the number of ICP queries sent, but not the number of replies. 4. Multicast exposes your cache to some privacy issues. There are no special emissions required to join a multicast group. Anyone may join your group and eavesdrop on ICP query messages. However, the scope of your multicast traffic can be controlled such that it does not exceed certain boundaries. We only recommend people to use Multicast ICP over network infrastructure which they have close control over. In other words, only use Multicast over your local area network, or maybe your wide area network if you are an ISP. We think it is probably a bad idea to use Multicast ICP over congested links or commodity backbones. 13.4. How do I configure Squid to send Multicast ICP queries? To configure Squid to send ICP queries to a Multicast address, you need to create another neighbour cache entry specified as multicast. For example: cache_host 224.9.9.9 multicast 3128 3130 ttl=64 224.9.9.9 is a sample multicast group address. multicast indicates that this is a special type of neighbour. The HTTP-port argument (3128) is ignored for multicast peers, but the ICP-port (3130) is very important. The final argument, ttl=64 specifies the multicast TTL value for queries sent to this address. It is probably a good idea to increment the minimum TTL by a few to provide a margin for error and changing conditions. You must also specify which of your neighbours will respond to your multicast queries, since it would be a bad idea to implicitly trust any ICP reply from an unknown address. Note that ICP replies are sent back to unicast addresses; they are NOT multicast, so Squid has no indication whether a reply is from a regular query or a multicast query. To configure your multicast group neighbours, use the cache_host directive and the multicast-responder option: cache_host cache1 sibling 3128 3130 multicast-responder cache_host cache2 sibling 3128 3130 multicast-responder Here all fields are relevant. The ICP port number (3130) must be the same as in the cache_host line defining the multicast peer above. The third field must either be parent or sibling to indicate how Squid should treat replies. With the multicast-responder flag set for a peer, Squid will NOT send ICP queries to it directly (i.e. unicast). 13.5. How do I know what Multicast TTL to use? The Multicast TTL (which is specified on the cache_host line of your multicast group) determines how ``far'' your ICP queries will go. In the Mbone, there is a certain TTL threshold defined for each network interface or tunnel. A multicast packet's TTL must be larger than the defined TTL for that packet to be forwarded across that link. For example, the mrouted manual page recommends: 32 for links that separate sites within an organization. 64 for links that separate communities or organizations, and are attached to the Internet MBONE. 128 for links that separate continents on the MBONE. A good way to determine the TTL you need is to run mtrace as shown above and look at the last line. It will show you the minimum TTL required to reach the other host. If you set you TTL too high, then your ICP messages may travel ``too far'' and will be subject to eavesdropping by others. If you're only using multicast on your LAN, as we suggest, then your TTL will be quite small, for example ttl=4. 13.6. How do I configure Squid to receive and respond to Multicast ICP? You must tell Squid to join a multicast group address with the mcast_groups directive. For example: mcast_groups 224.9.9.9 Of course, all members of your Multicast ICP group will need to use the exact same multicast group address. NOTE: Choose a multicast group address with care! If two organizations happen to choose the same multicast address, then they may find that their groups ``overlap'' at some point. This will be especially true if one of the querying caches uses a large TTL value. There are two ways to reduce the risk of group overlap: 1. Use a unique group address 2. Limit the scope of multicast messages with TTLs or administrative scoping. Using a unique address is a good idea, but not without some potential problems. If you choose an address randomly, how do you know that someone else will not also randomly choose the same address? NLANR has been assigned a block of multicast addresses by the IANA for use in situations such as this. If you would like to be assigned one of these addresses, please write to us . However, note that NLANR or IANA have no authority to prevent anyone from using an address assigned to you. Limiting the scope of your multicast messages is probably a better solution. They can be limited with the TTL value discussed above, or with some newer techniques known as administratively scoped addresses. Here you can configure well-defined boundaries for the traffic to a specific address. The Administratively Scoped IP Multicast RFC describes this. 14. System-Dependent Weirdnesses 14.1. Solaris 14.1.1. select() select(3c) won't handle more than 1024 file descriptors. The configure script should enable poll() by default for Solaris. poll() allows you to use many more filedescriptors, probably 8192 or more. For older Squid versions you can enable poll() manually by changing HAVE_POLL in include/autoconf.h, or by adding -DUSE_POLL=1 to the DEFINES in src/Makefile. 14.1.2. malloc libmalloc.a is leaky. Squid's configure does not use -lmalloc on Solaris. 14.1.3. DNS lookups and nscd by David J N Begley . DNS lookups can be slow because of some mysterious thing called ncsd. You should edit /etc/nscd.conf and make it say: enable-cache hosts no Apparently nscd serializes DNS queries thus slowing everything down when an application (such as Squid) hits the resolver hard. You may notice something similar if you run a log processor executing many DNS resolver queries - the resolver starts to slow.. right.. down.. . . . According to Andres Kroonmaa , users of Solaris starting from version 2.6 and up should NOT completely disable nscd daemon. nscd should be running and caching passwd and group files, although it is suggested to disable hosts caching as it may interfere with DNS lookups. Several library calls rely on available free FILE descriptors FD < 256. Systems running without nscd may fail on such calls if first 256 files are all in use. Since solaris 2.6 Sun has changed the way some system calls work and is using nscd daemon as a implementor of them. To communicate to nscd Solaris is using undocumented door calls. Basically nscd is used to reduce memory usage of user-space system libraries that use passwd and group files. Before 2.6 Solaris cached full passwd file in library memory on the first use but as this was considered to use up too much ram on large multiuser systems Sun has decided to move implementation of these calls out of libraries and to a single dedicated daemon. 14.1.4. DNS lookups and /etc/nsswitch.conf by Jason Armistead . The /etc/nsswitch.conf file determines the order of searches for lookups (amongst other things). You might only have it set up to allow NIS and HOSTS files to work. You definitely want the "hosts:" line to include the word dns, e.g.: hosts: nis dns [NOTFOUND=return] files 14.1.5. DNS lookups and NIS by Chris Tilbury . Our site cache is running on a Solaris 2.6 machine. We use NIS to distribute authentication and local hosts information around and in common with our multiuser systems, we run a slave NIS server on it to help the response of NIS queries. We were seeing very high name-ip lookup times (avg ~2sec) and ip->name lookup times (avg ~8 sec), although there didn't seem to be that much of a problem with response times for valid sites until the cache was being placed under high load. Then, performance went down the toilet. After some time, and a bit of detective work, we found the problem. On Solaris 2.6, if you have a local NIS server running (ypserv) and you have NIS in your /etc/nsswitch.conf hosts entry, then check the flags it is being started with. The 2.6 ypstart script checks to see if there is a resolv.conf file present when it starts ypserv. If there is, then it starts it with the -d option. This has the same effect as putting the YP_INTERDOMAIN key in the hosts table -- namely, that failed NIS host lookups are tried against the DNS by the NIS server. This is a bad thing(tm)! If NIS itself tries to resolve names using the DNS, then the requests are serialised through the NIS server, creating a bottleneck (This is the same basic problem that is seen with nscd). Thus, one failing or slow lookup can, if you have NIS before DNS in the service switch file (which is the most common setup), hold up every other lookup taking place. If you're running in this kind of setup, then you will want to make sure that 1. ypserv doesn't start with the -d flag. 2. you don't have the YP_INTERDOMAIN key in the hosts table (find the B=-b line in the yp Makefile and change it to B=) We changed these here, and saw our average lookup times drop by up to an order of magnitude (~150msec for name-ip queries and ~1.5sec for ip-name queries, the latter still so high, I suspect, because more of these fail and timeout since they are not made so often and the entries are frequently non-existent anyway). 14.1.6. Tuning Solaris 2.x - tuning your TCP/IP stack and more by Jens-S. Vckler 14.1.7. disk write error: (28) No space left on device You might get this error even if your disk is not full, and is not out of inodes. Check your syslog logs (/var/adm/messages, normally) for messages like either of these: NOTICE: realloccg /proxy/cache: file system full NOTICE: alloc: /proxy/cache: file system full In a nutshell, the UFS filesystem used by Solaris can't cope with the workload squid presents to it very well. The filesystem will end up becoming highly fragmented, until it reaches a point where there are insufficient free blocks left to create files with, and only fragments available. At this point, you'll get this error and squid will revise its idea of how much space is actually available to it. You can do a "fsck -n raw_device" (no need to unmount, this checks in read only mode) to look at the fragmentation level of the filesystem. It will probably be quite high (>15%). Sun suggest two solutions to this problem. One costs money, the other is free but may result in a loss of performance (although Sun do claim it shouldn't, given the already highly random nature of squid disk access). The first is to buy a copy of VxFS, the Veritas Filesystem. This is an extent-based filesystem and it's capable of having online defragmentation performed on mounted filesystems. This costs money, however (VxFS is not very cheap!) The second is to change certain parameters of the UFS filesystem. Unmount your cache filesystems and use tunefs to change optimization to "space" and to reduce the "minfree" value to 3-5% (under Solaris 2.6 and higher, very large filesystems will almost certainly have a minfree of 2% already and you shouldn't increase this). You should be able to get fragmentation down to around 3% by doing this, with an accompanied increase in the amount of space available. Thanks to Chris Tilbury . 14.1.8. Solaris X86 and IPFilter by Jeff Madison Important update regarding Squid running on Solaris x86. I have been working for several months to resolve what appeared to be a memory leak in squid when running on Solaris x86 regardless of the malloc that was used. I have made 2 discoveries that anyone running Squid on this platform may be interested in. Number 1: There is not a memory leak in Squid even though after the system runs for some amount of time, this varies depending on the load the system is under, Top reports that there is very little memory free. True to the claims of the Sun engineer I spoke to this statistic from Top is incorrect. The odd thing is that you do begin to see performance suffer substantially as time goes on and the only way to correct the situation is to reboot the system. This leads me to discovery number 2. Number 2: There is some type of resource problem, memory or other, with IPFilter on Solaris x86. I have not taken the time to investigate what the problem is because we no longer are using IPFilter. We have switched to a Alteon ACE 180 Gigabit switch which will do the trans-proxy for you. After moving the trans-proxy, redirection process out to the Alteon switch Squid has run for 3 days strait under a huge load with no problem what so ever. We currently have 2 boxes with 40 GB of cached objects on each box. This 40 GB was accumulated in the 3 days, from this you can see what type of load these boxes are under. Prior to this change we were never able to operate for more than 4 hours. Because the problem appears to be with IPFilter I would guess that you would only run into this issue if you are trying to run Squid as a transparent proxy using IPFilter. That makes sense. If there is anyone with information that would indicate my finding are incorrect I am willing to investigate further. 14.1.9. Changing the directory lookup cache size by Mike Batchelor On Solaris, the kernel variable for the directory name lookup cache size is ncsize. In /etc/system, you might want to try set ncsize = 8192 or even higher. The kernel variable ufs_inode - which is the size of the inode cache itself - scales with ncsize in Solaris 2.5.1 and later. Previous versions of Solaris required both to be adjusted independently, but now, it is not recommended to adjust ufs_inode directly on 2.5.1 and later. You can set ncsize quite high, but at some point - dependent on the application - a too-large ncsize will increase the latency of lookups. Defaults are: Solaris 2.5.1 : (max_nprocs + 16 + maxusers) + 64 Solaris 2.6/Solaris 7 : 4 * (max_nprocs + maxusers) + 320 14.1.10. The priority_paging algorithm by Mike Batchelor Another new tuneable (actually a toggle) in Solaris 2.5.1, 2.6 or Solaris 7 is the priority_paging algorithm. This is actually a complete rewrite of the virtual memory system on Solaris. It will page out application data last, and filesystem pages first, if you turn it on (set priority_paging = 1 in /etc/system). As you may know, the Solaris buffer cache grows to fill available pages, and under the old VM system, applications could get paged out to make way for the buffer cache, which can lead to swap thrashing and degraded application performance. The new priority_paging helps keep application and shared library pages in memory, preventing the buffer cache from paging them out, until memory gets REALLY short. Solaris 2.5.1 requires patch 103640-25 or higher and Solaris 2.6 requires 105181-10 or higher to get priority_paging. Solaris 7 needs no patch, but all versions have it turned off by default. 14.2. FreeBSD 14.2.1. T/TCP bugs We have found that with FreeBSD-2.2.2-RELEASE, there some bugs with T/TCP. FreeBSD will try to use T/TCP if you've enabled the ``TCP Extensions.'' To disable T/TCP, use sysinstall to disable TCP Extensions, or edit /etc/rc.conf and set tcp_extensions="NO" # Allow RFC1323 & RFC1544 extensions (or NO). or add this to your /etc/rc files: sysctl -w net.inet.tcp.rfc1644=0 14.2.2. mbuf size We noticed an odd thing with some of Squid's interprocess communication. Often, output from the dnsserver processes would NOT be read in one chunk. With full debugging, it looks like this: 1998/04/02 15:18:48| comm_select: FD 46 ready for reading 1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (100 bytes) 1998/04/02 15:18:48| ipcache_dnsHandleRead: Incomplete reply 1998/04/02 15:18:48| comm_select: FD 46 ready for reading 1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (9 bytes) 1998/04/02 15:18:48| ipcache_parsebuffer: parsing: $name www.karup.com $h_name www.karup.inter.net $h_len 4 $ipcount 2 38.15.68.128 38.15.67.128 $ttl 2348 $end Interestingly, it is very common to get only 100 bytes on the first read. When two read() calls are required, this adds additional latency to the overall request. On our caches running Digital Unix, the median dnsserver response time was measured at 0.01 seconds. On our FreeBSD cache, however, the median latency was 0.10 seconds. Here is a simple patch to fix the bug: =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.40 retrieving revision 1.41 diff -p -u -r1.40 -r1.41 --- src/sys/kern/uipc_socket.c 1998/05/15 20:11:30 1.40 +++ /home/ncvs/src/sys/kern/uipc_socket.c 1998/07/06 19:27:14 1.41 @@ -31,7 +31,7 @@ * SUCH DAMAGE. * * @(#)uipc_socket.c 8.3 (Berkeley) 4/15/94 - * $Id: FAQ.sgml,v 1.89 2000/03/24 20:02:01 wessels Exp $ + * $Id: FAQ.sgml,v 1.89 2000/03/24 20:02:01 wessels Exp $ */ #include @@ -491,6 +491,7 @@ restart: mlen = MCLBYTES; len = min(min(mlen, resid), space); } else { + atomic = 1; nopages: len = min(min(mlen, resid), space); /* Another technique which may help, but does not fix the bug, is to increase the kernel's mbuf size. The default is 128 bytes. The MSIZE symbol is defined in /usr/include/machine/param.h. However, to change it we added this line to our kernel configuration file: options MSIZE="256" 14.2.3. Dealing with NIS /var/yp/Makefile has the following section: # The following line encodes the YP_INTERDOMAIN key into the hosts.byname # and hosts.byaddr maps so that ypserv(8) will do DNS lookups to resolve # hosts not in the current domain. Commenting this line out will disable # the DNS lookups. B=-b You will want to comment out the B=-b line so that ypserv does not do DNS lookups. 14.2.4. FreeBSD 3.3: The lo0 (loop-back) device is not configured on startup Squid requires a the loopback interface to be up and configured. If it is not, you will get errors such as ``commBind''. From FreeBSD 3.3 Errata Notes : Fix: Assuming that you experience this problem at all, edit /etc/rc.conf and search for where the network_interfaces variable is set. In its value, change the word auto to lo0 since the auto keyword doesn't bring the loop-back device up properly, for reasons yet to be adequately determined. Since your other interface(s) will already be set in the network_interfaces variable after initial installation, it's reasonable to simply s/auto/lo0/ in rc.conf and move on. Thanks to Robert Lister . 14.2.5. FreeBSD 3.x or newer: Speed up disk writes using Softupdates by Andre Albsmeier FreeBSD 3.x and newer support Softupdates. This is a mechanism to speed up disk writes as it is possible by mounting ufs volumes async. However, Softupdates does this in a way that a performance similar or better than async is achieved but without loosing security in a case of a system crash. For more detailed information and the copyright terms see /sys/contrib/softupdates/README and /sys/ufs/ffs/README.softupdate. To build a system supporting softupdates, you have to build a kernel with options SOFTUPDATES set (see LINT for a commented out example). After rebooting with the new kernel, you can enable softupdates on a per filesystem base with the command: $ tunefs -n /mountpoint The filesystem in question MUST NOT be mounted at this time. After that, softupdates are permanently enabled and the filesystem can be mounted normally. To verify that the softupdates code is running, sim- ply issue a mount command and an output similar to the following will appear: $ mount /dev/da2a on /usr/local/squid/cache (ufs, local, noatime, soft-updates, writes: sync 70 async 225) 14.3. OSF1/3.2 If you compile both libgnumalloc.a and Squid with cc, the mstats() function returns bogus values. However, if you compile libgnumalloc.a with gcc, and Squid with cc, the values are correct. 14.4. BSD/OS 14.4.1. gcc/yacc Some people report ``difficulties compiling squid on BSD/OS''. 14.4.2. process priority I've noticed that my Squid process seems to stick at a nice value of four, and clicks back to that even after I renice it to a higher priority. However, looking through the Squid source, I can't find any instance of a setpriority() call, or anything else that would seem to indicate Squid's adjusting its own priority. by Bill Bogstad BSD Unices traditionally have auto-niced non-root processes to 4 after they used alot (4 minutes???) of CPU time. My guess is that it's the BSD/OS not Squid that is doing this. I don't know offhand if there is a way to disable this on BSD/OS. by Arjan de Vet You can get around this by starting Squid with nice-level -4 (or another negative value). by Bert Driehuis The autonice behavior is a leftover from the history of BSD as a university OS. It penalises CPU bound jobs by nicing them after using 600 CPU seconds. Adding sysctl -w kern.autonicetime=0 to /etc/rc.local will disable the behavior systemwide. 14.5. Linux 14.5.1. Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address Try a different version of Linux. We have received many reports of this ``bug'' from people running Linux 2.0.30. The bind(2) system call should NEVER give this error when binding to port 0. 14.5.2. FATAL: Don't run Squid as root, set 'cache_effective_user'! Some users have reported that setting cache_effective_user to nobody under Linux does not work. However, it appears that using any cache_effective_user other than nobody will succeed. One solution is to create a user account for Squid and set cache_effective_user to that. Alternately you can change the UID for the nobody account from 65535 to 65534. Another problem is that RedHat 5.0 Linux seems to have a broken setresuid() function. There are two ways to fix this. Before running configure: % setenv ac_cv_func_setresuid no % ./configure ... % make clean % make install Or after running configure, manually edit include/autoconf.h and change the HAVE_SETRESUID line to: #define HAVE_SETRESUID 0 Also, some users report this error is due to a NIS configuration problem. By adding compat to the passwd and group lines of /etc/nsswitch.conf, the problem goes away. (Ambrose Li ). Russ Mellon notes that these problems with cache_effective_user are fixed in version 2.2.x of the Linux kernel. 14.5.3. Large ACL lists make Squid slow The regular expression library which comes with Linux is known to be very slow. Some people report it entirely fails to work after long periods of time. To fix, use the GNUregex library included with the Squid source code. With Squid-2, use the --enable-gnuregex configure option. 14.5.4. gethostbyname() leaks memory in RedHat 6.0 with glibc 2.1.1. by Radu Greab The gethostbyname() function leaks memory in RedHat 6.0 with glibc 2.1.1. The quick fix is to delete nisplus service from hosts entry in /etc/nsswitch.conf. In my tests dnsserver memory use remained stable after I made the above change. See RedHat bug id 3919 . 14.5.5. assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1' on Alpha system. by Jamie Raymond Some early versions of Linux have a kernel bug that causes this. All that is needed is a recent kernel that doesn't have the mentioned bug. 14.5.6. tools.c:605: storage size of `rl' isn't known There is an inconsistency between your "/usr/include" headers and your C library (glibc). The Linux 2.3.34 kernel is known to be broken, so try an earlier or a later version. If you upgrade or install a new version of glibc, be sure that the /usr/include headers get updated appropirately. 14.6. HP-UX 14.6.1. StatHist.c:74: failed assertion `statHistBin(H, min) == 0' This was a very mysterious and unexplainable bug with GCC on HP-UX. Certain functions, when specified as static, would cause math bugs. The compiler also failed to handle implied int-double conversions properly. These bugs should all be handled correctly in Squid version 2.2. 14.7. IRIX 14.7.1. dnsserver always returns 255.255.255.255 There is a problem with GCC (2.8.1 at least) on Irix 6 which causes it to always return the string 255.255.255.255 for _ANY_ address when calling inet_ntoa(). If this happens to you, compile Squid with the native C compiler instead of GCC. 14.8. SCO-UNIX by F.J. Bosscha To make squid run comfortable on SCO-unix you need to do the following: Increase the NOFILES paramater and the NUMSP parameter and compile squid with I had, although squid told in the cache.log file he had 3000 filedescriptors, problems with the messages that there were no filedescriptors more available. After I increase also the NUMSP value the problems were gone. One thing left is the number of tcp-connections the system can handle. Default is 256, but I increase that as well because of the number of clients we have. 15. Redirectors 15.1. What is a redirector? Squid has the ability to rewrite requested URLs. Implemented as an external process (similar to a dnsserver), Squid can be configured to pass every incoming URL through a redirector process that returns either a new URL, or a blank line to indicate no change. The redirector program is NOT a standard part of the Squid package. However, some examples are provided below, and in the "contrib/" directory of the source distribution. Since everyone has different needs, it is up to the individual administrators to write their own implementation. 15.2. Why use a redirector? A redirector allows the administrator to control the locations to which his users goto. Using this in conjunction with transparent proxies allows simple but effective porn control. 15.3. How does it work? The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision. The input line consists of four fields: URL ip-address/fqdn ident method Do you have any examples? A simple very fast redirector called SQUIRM is a good place to start, it uses the regex lib to allow pattern matching. Also see jesred . The following Perl script may also be used as a template for writing your own redirector: #!/usr/local/bin/perl $|=1; while (<>) { s@http://fromhost.com@http://tohost.org@; print; } 15.4. Can I use the redirector to return HTTP redirect messages? Normally, the redirector feature is used to rewrite requested URLs. Squid then transparently requests the new URL. However, in some situations, it may be desirable to return an HTTP "301" or "302" redirect message to the client. This is now possible with Squid version 1.1.19. Simply modify your redirector program to append either "301:" or "302:" before the new URL. For example, the following script might be used to direct external clients to a secure Web server for internal documents: #!/usr/local/bin/perl $|=1; while (<>) { @X = split; $url = $X[0]; if ($url =~ /^http:\/\/internal\.foo\.com/) { $url =~ s/^http/https/; $url =~ s/internal/secure/; print "302:$url\n"; } else { print "$url\n"; } } Please see sections 10.3.2 and 10.3.3 of RFC 2068 for an explanation of the 301 and 302 HTTP reply codes. 15.5. FATAL: All redirectors have exited! A redirector process must never exit (stop running). If you see the ``All redirectories have exited'' message, it probably means your redirector program has a bug. Maybe it runs out of memory or has memory access errors. You may want to test your redirector program outside of squid with a big input list, taken from your access.log perhaps. Also, check for ``coredump'' files from the redirector program. 16. Cache Digests Cache Digest FAQs compiled by Niall Doherty . 16.1. What is a Cache Digest? A Cache Digest is a summary of the contents of an Internet Object Caching Server. It contains, in a compact (i.e. compressed) format, an indication of whether or not particular URLs are in the cache. A "lossy" technique is used for compression, which means that very high compression factors can be achieved at the expense of not having 100% correct information. 16.2. How and why are they used? Cache servers periodically exchange their digests with each other. When a request for an object (URL) is received from a client a cache can use digests from its peers to find out which of its peers (if any) have that object. The cache can then request the object from the closest peer (Squid uses the NetDB database to determine this). Note that Squid will only make digest queries in those digests that are enabled. It will disable a peers digest IFF it cannot fetch a valid digest for that peer. It will enable that peers digest again when a valid one is fetched. The checks in the digest are very fast and they eliminate the need for per-request queries to peers. Hence: o Latency is eliminated and client response time should be improved. o Network utilisation may be improved. Note that the use of Cache Digests (for querying the cache contents of peers) and the generation of a Cache Digest (for retrieval by peers) are independent. So, it is possible for a cache to make a digest available for peers, and not use the functionality itself and vice versa. 16.3. What is the theory behind Cache Digests? Cache Digests are based on Bloom Filters - they are a method for representing a set of keys with lookup capabilities; where lookup means "is the key in the filter or not?". In building a cache digest: o A vector (1-dimensional array) of m bits is allocated, with all bits initially set to 0. o A number, k, of independent hash functions are chosen, h1, h2, (i.e. a key hashed with any of these functions gives a value between 1 and m inclusive). o The set of n keys to be operated on are denoted by: A = { a1, a2, a3, ..., an }. 16.3.1. Adding a Key To add a key the value of each hash function for that key is calculated. So, if the key was denoted by a, then h1(a), h2(a), ..., hk(a) are calculated. The value of each hash function for that key represents an index into the array and the corresponding bits are set to 1. So, a digest with 6 hash functions would have 6 bits to be set to 1 for each key added. Note that the addition of a number of different keys could cause one particular bit to be set to 1 multiple times. 16.3.2. Querying a Key To query for the existence of a key the indices into the array are calculated from the hash functions as above. o If any of the corresponding bits in the array are 0 then the key is not present. o If all of the corresponding bits in the array are 1 then the key is likely to be present. Note the term likely. It is possible that a collision in the digest can occur, whereby the digest incorrectly indicates a key is present. This is the price paid for the compact representation. While the probability of a collision can never be reduced to zero it can be controlled. Larger values for the ratio of the digest size to the number of entries added lower the probability. The number of hash functions chosen also influence the probability. 16.3.3. Deleting a Key To delete a key, it is not possible to simply set the associated bits to 0 since any one of those bits could have been set to 1 by the addition of a different key! Therefore, to support deletions a counter is required for each bit position in the array. The procedures to follow would be: o When adding a key, set appropriate bits to 1 and increment the corresponding counters. o When deleting a key, decrement the appropriate counters (while > 0), and if a counter reaches 0 then the corresponding bit is set to 0. 16.4. How is the size of the Cache Digest in Squid determined? Upon initialisation, the capacity is set to the number of objects that can be (are) stored in the cache. Note that there are upper and lower limits here. An arbitrary constant, bits_per_entry (currently set to 5), is used to calculate the size of the array using the following formula: number of bits in array = capacity * bits_per_entry + 7 The size of the digest, in bytes, is therefore: digest size = int (number of bits in array / 8) When a digest rebuild occurs, the change in the cache size (capacity) is measured. If the capacity has changed by a large enough amount (10%) then the digest array is freed and reallocated memory, otherwise the same digest is re-used. 16.5. What hash functions (and how many of them) does Squid use? The protocol design allows for a variable number of hash functions (k). However, Squid employs a very efficient method using a fixed number - four. Rather than computing a number of independent hash functions over a URL Squid uses a 128-bit MD5 hash of the key (actually a combination of the URL and the HTTP retrieval method) and then splits this into four equal chunks. Each chunk, modulo the digest size (m), is used as the value for one of the hash functions - i.e. an index into the bit array. Note: As Squid retrieves objects and stores them in its cache on disk, it adds them to the in-RAM index using a lookup key which is an MD5 hash - the very one discussed above. This means that the values for the Cache Digest hash functions are already available and consequently the operations are extremely efficient! Obviously, modifying the code to support a variable number of hash functions would prove a little more difficult and would most likely reduce efficiency. 16.6. How are objects added to the Cache Digest in Squid? Every object referenced in the index in RAM is checked to see if it is suitable for addition to the digest. A number of objects are not suitable, e.g. those that are private, not cachable, negatively cached etc. and are skipped immediately. A freshness test is next made in an attempt to guess if the object will expire soon, since if it does, it is not worthwhile adding it to the digest. The object is checked against the refresh patterns for staleness... Since Squid stores references to objects in its index using the MD5 key discussed earlier there is no URL actually available for each object - which means that the pattern used will fall back to the default pattern, ".". This is an unfortunate state of affairs, but little can be done about it. A cd_refresh_pattern option will be added to the configuration file soon which will at least make the confusion a little clearer :-) Note that it is best to be conservative with your refresh pattern for the Cache Digest, i.e. do not add objects if they might become stale soon. This will reduce the number of False Hits. 16.7. Does Squid support deletions in Cache Digests? What are diffs/deltas? Squid does not support deletions from the digest. Because of this the digest must, periodically, be rebuilt from scratch to erase stale bits and prevent digest pollution. A more sophisticated option is to use diffs or deltas. These would be created by building a new digest and comparing with the current/old one. They would essentially consist of aggregated deletions and additions since the previous digest. Since less bandwidth should be required using these it would be possible to have more frequent updates (and hence, more accurate information). Costs: o RAM - extra RAM needed to hold two digests while comparisons takes place. o CPU - probably a negligible amount. 16.8. When and how often is the local digest built? The local digest is built: o when store_rebuild completes after startup (the cache contents have been indexed in RAM), and o periodically thereafter. Currently, it is rebuilt every hour (more data and experience is required before other periods, whether fixed or dynamically varying, can "intelligently" be chosen). The good thing is that the local cache decides on the expiry time and peers must obey (see later). While the [new] digest is being built in RAM the old version (stored on disk) is still valid, and will be returned to any peer requesting it. When the digest has completed building it is then swapped out to disk, overwriting the old version. The rebuild is CPU intensive, but not overly so. Since Squid is programmed using an event-handling model, the approach taken is to split the digest building task into chunks (i.e. chunks of entries to add) and to register each chunk as an event. If CPU load is overly high, it is possible to extend the build period - as long as it is finished before the next rebuild is due! It may prove more efficient to implement the digest building as a separate process/thread in the future... 16.9. How are Cache Digests transferred between peers? Cache Digests are fetched from peers using the standard HTTP protocol (note that a pull rather than push technique is used). After the first access to a peer, a peerDigestValidate event is queued (this event decides if it is time to fetch a new version of a digest from a peer). The queuing delay depends on the number of peers already queued for validation - so that all digests from different peers are not fetched simultaneously. A peer answering a request for its digest will specify an expiry time for that digest by using the HTTP Expires header. The requesting cache thus knows when it should request a fresh copy of that peers digest. Note: requesting caches use an If-Modified-Since request in case the peer has not rebuilt its digest for some reason since the last time it was fetched. 16.10. How and where are Cache Digests stored? 16.10.1. Cache Digest built locally Since the local digest is generated purely for the benefit of its neighbours keeping it in RAM is not strictly required. However, it was decided to keep the local digest in RAM partly because of the following: o Approximately the same amount of memory will be (re-)allocated on every rebuild of the digest, o the memory requirements are probably quite small (when compared to other requirements of the cache server), o if ongoing updates of the digest are to be supported (e.g. additions/deletions) it will be necessary to perform these operations on a digest in RAM, and o if diffs/deltas are to be supported the "old" digest would have to be swapped into RAM anyway for the comparisons. When the digest is built in RAM, it is then swapped out to disk, where it is stored as a "normal" cache item - which is how peers request it. 16.10.2. Cache Digest fetched from peer When a query from a client arrives, fast lookups are required to decide if a request should be made to a neighbour cache. It it therefore required to keep all peer digests in RAM. Peer digests are also stored on disk for the following reasons: o Recovery - If stopped and restarted, peer digests can be reused from the local on-disk copy (they will soon be validated using an HTTP IMS request to the appropriate peers as discussed earlier), and o Sharing - peer digests are stored as normal objects in the cache. This allows them to be given to neighbour caches. 16.11. How are the Cache Digest statistics in the Cache Manager to be interpreted? Cache Digest statistics can be seen from the Cache Manager or through the client utility. The following examples show how to use the client utility to request the list of possible operations from the localhost, local digest statistics from the localhost, refresh statistics from the localhost and local digest statistics from another cache, respectively. ./client mgr:menu ./client mgr:store_digest ./client mgr:refresh ./client -h peer mgr:store_digest The available statistics provide a lot of useful debugging information. The refresh statistics include a section for Cache Digests which explains why items were added (or not) to the digest. The following example shows local digest statistics for a 16GB cache in a corporate intranet environment (may be a useful reference for the discussion below). store digest: size: 768000 bytes entries: count: 588327 capacity: 1228800 util: 48% deletion attempts: 0 bits: per entry: 5 on: 1953311 capacity: 6144000 util: 32% bit-seq: count: 2664350 avg.len: 2.31 added: 588327 rejected: 528703 ( 47.33 %) del-ed: 0 collisions: on add: 0.23 % on rej: 0.23 % entries:capacity is a measure of how many items "are likely" to be added to the digest. It represents the number of items that were in the local cache at the start of digest creation - however, upper and lower limits currently apply. This value is multiplied by bits: per entry (an arbitrary constant) to give bits:capacity, which is the size of the cache digest in bits. Dividing this by 8 will give store digest: size which is the size in bytes. The number of items represented in the digest is given by entries:count. This should be equal to added minus deletion attempts. Since (currently) no modifications are made to the digest after the initial build (no additions are made and deletions are not supported) deletion attempts will always be 0 and entries:count should simply be equal to added. entries:util is not really a significant statistic. At most it gives a measure of how many of the items in the store were deemed suitable for entry into the cache compared to how many were "prepared" for. rej shows how many objects were rejected. Objects will not be added for a number of reasons, the most common being refresh pattern settings. Remember that (currently) the default refresh pattern will be used for checking for entry here and also note that changing this pattern can significantly affect the number of items added to the digest! Too relaxed and False Hits increase, too strict and False Misses increase. Remember also that at time of validation (on the peer) the "real" refresh pattern will be used - so it is wise to keep the default refresh pattern conservative. bits: on indicates the number of bits in the digest that are set to 1. bits: util gives this figure as a percentage of the total number of bits in the digest. As we saw earlier, a figure of 50% represents the optimal trade-off. Values too high (say > 75%) would cause a larger number of collisions, and hence False Hits, while lower values mean the digest is under-utilised (using unnecessary RAM). Note that low values are normal for caches that are starting to fill up. A bit sequence is an uninterrupted sequence of bits with the same value. bit-seq: avg.len gives some insight into the quality of the hash functions. Long values indicate problem, even if bits:util is 50% (> 3 = suspicious, > 10 = very suspicious). 16.12. What are False Hits and how should they be handled? A False Hit occurs when a cache believes a peer has an object and asks the peer for it but the peer is not able to satisfy the request. Expiring or stale objects on the peer are frequent causes of False Hits. At the time of the query actual refresh patterns are used on the peer and stale entries are marked for revalidation. However, revalidation is prohibited unless the peer is behaving as a parent, or miss_access is enabled. Thus, clients can receive error messages instead of revalidated objects! The frequency of False Hits can be reduced but never eliminated completely, therefore there must be a robust way of handling them when they occur. The philosophy behind the design of Squid is to use lightweight techniques and optimise for the common case and robustly handle the unusual case (False Hits). Squid will soon support the HTTP only-if-cached header. Requests for objects made to a peer will use this header and if the objects are not available, the peer can reply appropriately allowing Squid to recognise the situation. The following describes what Squid is aiming towards: o Cache Digests used to obtain good estimates of where a requested object is located in a Cache Hierarchy. o Persistent HTTP Connections between peers. There will be no TCP startup overhead and both latency and network load will be similar for ICP (i.e. fast). o HTTP False Hit Recognition using the only-if-cached HTTP header - allowing fall back to another peer or, if no other peers are available with the object, then going direct (or through a parent if behind a firewall). 16.13. How can Cache Digest related activity be traced/debugged? 16.13.1. Enabling Cache Digests If you wish to use Cache Digests (available in Squid version 2) you need to add a configure option, so that the relevant code is compiled in: ./configure --enable-cache-digests ... 16.13.2. What do the access.log entries look like? If a request is forwarded to a neighbour due a HIT in that neighbour's Cache Digest the hierarchy (9th) field of the access.log file for the local cache will look like CACHE_DIGEST_HIT/neighbour. The Log Tag (4th field) should obviously show a MISS. On the peer cache the request should appear as a normal HTTP request from the first cache. 16.13.3. What does a False Hit look like? The easiest situation to analyse is when two caches (say A and B) are involved neither of which uses the other as a parent. In this case, a False Hit would show up as a CACHE_DIGEST_HIT on A and NOT as a TCP_HIT on B (or vice versa). If B does not fetch the object for A then the hierarchy field will look like NONE/- (and A will have received an Access Denied or Forbidden message). This will happen if the object is not "available" on B and B does not have miss_access enabled for A (or is not acting as a parent for A). 16.13.4. How is the cause of a False Hit determined? Assume A requests a URL from B and receives a False Hit o Using the client utility PURGE the URL from A, e.g. ./client -m PURGE 'URL' o Using the client utility request the object from A, e.g. ./client 'URL' The HTTP headers of the request are available. Two header types are of particular interest: o X-Cache - this shows whether an object is available or not. o X-Cache-Lookup - this keeps the result of a store table lookup before refresh causing rules are checked (i.e. it indicates if the object is available before any validation would be attempted). The X-Cache and X-Cache-Lookup headers from A should both show MISS. If A requests the object from B (which it will if the digest lookup indicates B has it - assuming B is closest peer of course :-) then there will be another set of these headers from B. If the X-Cache header from B shows a MISS a False Hit has occurred. This means that A thought B had an object but B tells A it does not have it available for retrieval. The reason why it is not available for retrieval is indicated by the X-Cache-Lookup header. If: o X-Cache-Lookup = MISS then either A's (version of B's) digest is out-of-date or corrupt OR a collision occurred in the digest (very small probability) OR B recently purged the object. o X-Cache-Lookup = HIT then B had the object, but refresh rules (or A's max-age requirements) prevent A from getting a HIT (validation failed). 16.13.5. Use The Source If there is something else you need to check you can always look at the source code. The main Cache Digest functionality is organised as follows: o CacheDigest.c (debug section 70) Generic Cache Digest routines o store_digest.c (debug section 71) Local Cache Digest routines o peer_digest.c (debug section 72) Peer Cache Digest routines Note that in the source the term Store Digest refers to the digest created locally. The Cache Digest code is fairly self-explanatory (once you understand how Cache Digests work): 16.14. What about ICP? COMING SOON! 16.15. Is there a Cache Digest Specification? There is now, thanks to Martin Hamilton and Alex Rousskov . Cache Digests, as implemented in Squid 2.1.PATCH2, are described in cache-digest-v5.txt . You'll notice the format is similar to an Internet Draft. We decided not to submit this document as a draft because Cache Digests will likely undergo some important changes before we want to try to make it a standard. 16.16. Would it be possible to stagger the timings when cache_digests are retrieved from peers? Note: The information here is current for version 2.2. Squid already has code to spread the digest updates. The algorithm is currently controlled by a few hard-coded constants in peer_digest.c. For example, GlobDigestReqMinGap variable determines the minimum interval between two requests for a digest. You may want to try to increase the value of GlobDigestReqMinGap from 60 seconds to whatever you feel comfortable with (but it should be smaller than hour/number_of_peers, of course). Note that whatever you do, you still need to give Squid enough time and bandwidth to fetch all the digests. Depending on your environment, that bandwidth may be more or less than an ICP would require. Upcoming digest deltas (x10 smaller than the digests themselves) may be the only way to solve the ``big scale'' problem. 17. Transparent Caching/Proxying How can I make my users' browsers use my cache without configuring the browsers for proxying? First, it is critical to read the full comments in the squid.conf file! That is the only authoritative source for configuration information. However, the following instructions are correct as of this writing (July 1999.) Getting transparent caching to work requires four distinct steps: 1. Get the packets to your cache server. There are several ways to do this. First, if your proxy machine is already in the path of the packets (i.e. it is routing between your proxy users and the Internet) then you don't have to worry about this step. This would be true if you install Squid on a firewall machine, or on a UNIX- based router. If the cache is not in the natural path of the connections, then you have to divert the packets from the normal path to your cache host using a router or switch. You may be able to do this with a Cisco router using their "route maps" feature, depending on your IOS version. You might also use a so-called layer-4 switch, such as the Alteon ACE-director or the Foundry Networks ServerIron. Finally, you might be able to use a stand- alone router/load-balancer type product, or routing capabilities of an access server. 2. Get your cache server to accept the packets. You have to configure your cache host to accept the redirected packets - any IP address, on port 80 - and deliver them to your cache application. This is typically done with IP filtering/forwarding features built into the kernel. On linux they call this ipfwadm. On FreeBSD and other *BSD systems they call it ip filter or ipnat; on many systems, it may require rebuilding the kernel or adding a new loadable kernel module. 3. Compile and run a version of Squid which accepts connections for other addresses. For some operating systems, you need to have configured and built a version of Squid which can recognize the hijacked connections and discern the destination addresses. For Linux this seems to work automatically. For *BSD-based systems, you probably have to configure squid with the --enable-ipf- transparent option. (Do a make clean if you previously configured without that option, or the correct settings may not be present.) 4. Configure Squid to accept and process the connections. Finally, you have to change the Squid configuration settings to recognize the hijacked connections and discern the destination addresses. Here are the important settings in squid.conf: http_port 8080 httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on Notes: o The http_port 8080 in this example assumes you will redirect incoming port 80 packets to port 8080 on your cache machine. If you are running Squid on port 3128 (for example) you can leave it there via http_port 3128, and redirect to that port via your IP filtering or forwarding commands. o In the httpd_accel_host option, virtual is the magic word! o The httpd_accel_with_proxy on is required to enable transparent proxy mode; essentially in transparent proxy mode Squid thinks it is acting both as an accelerator (hence accepting packets for other IPs on port 80) and a caching proxy (hence serving files out of cache.) o You must use httpd_accel_uses_host_header on to get the cache to work properly in transparent mode. This enables the cache to index its stored objects under the true hostname, as is done in a normal proxy, rather than under the IP address. This is especially important if you want to use a parent cache hierarchy, or to share cache data between transparent proxy users and non-transparent proxy users, which you can do with Squid in this configuration. 17.1. Transparent caching for Solaris, SunOS, and BSD systems 17.1.1. Install IP Filter First, get and install the IP Filter package . 17.1.2. Configure ipnat Put these lines in /etc/ipnat.rules: # Redirect direct web traffic to local web server. rdr de0 1.2.3.4/32 port 80 -> 1.2.3.4 port 80 tcp # Redirect everything else to squid on port 8080 rdr de0 0.0.0.0/0 port 80 -> 1.2.3.4 port 8080 tcp Modify your startup scripts to enable ipnat. For example, on FreeBSD it looks something like this: /sbin/modload /lkm/if_ipl.o /sbin/ipnat -f /etc/ipnat.rules chgrp nobody /dev/ipnat chmod 644 /dev/ipnat 17.1.3. Configure Squid 17.1.3.1. Squid-2 Squid-2 (after version beta25) has IP filter support built in. Simple enable it when you run configure: ./configure --enable-ipf-transparent Add these lines to your squid.conf file: http_port 8080 httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on Note, you don't have to use port 8080, but it must match whatever you used in the /etc/ipnat.rules file. 17.1.3.2. Squid-1.1 Patches for Squid-1.X are available from Quinton Dolan's Squid page . Add these lines to squid.conf: http_port 8080 httpd_accel virtual 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on Thanks to Quinton Dolan . 17.2. Transparent caching with Linux by Rodney van den Oever Note: Transparent proxying does NOT work with Linux 2.0.30! Linux 2.0.29 is known to work well. If you're using a more recent kernel, like 2.2.X, then you should probably use an ipchains configuration, ``as described below''. Warning: this technique has some shortcomings. 1. This method only supports the HTTP protocol, not gopher or FTP 2. Since the browser wasn't set up to use a proxy server, it uses the FTP protocol (with destination port 21) and not the required HTTP protocol. You can't setup a redirection-rule to the proxy server since the browser is speaking the wrong protocol. A similar problem occurs with gopher. Normally all proxy requests are translated by the client into the HTTP protocol, but since the client isn't aware of the redirection, this never happens. If you can live with the side-effects, go ahead and compile your kernel with firewalling and redirection support. Here are the important parameters from /usr/src/linux/.config: # # Code maturity level options # CONFIG_EXPERIMENTAL=y # # Networking options # CONFIG_FIREWALL=y # CONFIG_NET_ALIAS is not set CONFIG_INET=y CONFIG_IP_FORWARD=y # CONFIG_IP_MULTICAST is not set CONFIG_IP_FIREWALL=y # CONFIG_IP_FIREWALL_VERBOSE is not set CONFIG_IP_MASQUERADE=y CONFIG_IP_TRANSPARENT_PROXY=y CONFIG_IP_ALWAYS_DEFRAG=y # CONFIG_IP_ACCT is not set CONFIG_IP_ROUTER=y You may also need to enable IP Forwarding. One way to do it is to add this line to your startup scripts: echo 1 > /proc/sys/net/ipv4/ip_forward Go to the Linux IP Firewall and Accounting page, obtain the source distribution to ipfwadm and install it. Older versions of ipfwadm may not work. You might need at least version 2.3.0. You'll use ipfwadm to setup the redirection rules. I added this rule to the script that runs from /etc/rc.d/rc.inet1 (Slackware) which sets up the interfaces at boot-time. The redirection should be done before any other Input- accept rule. To really make sure it worked I disabled the forwarding (masquerading) I normally do. /etc/rc.d/rc.firewall: #!/bin/sh # rc.firewall Linux kernel firewalling rules FW=/sbin/ipfwadm # Flush rules, for testing purposes for i in I O F # A # If we enabled accounting too do ${FW} -$i -f done # Default policies: ${FW} -I -p rej # Incoming policy: reject (quick error) ${FW} -O -p acc # Output policy: accept ${FW} -F -p den # Forwarding policy: deny # Input Rules: # Loopback-interface (local access, eg, to local nameserver): ${FW} -I -a acc -S localhost/32 -D localhost/32 # Local Ethernet-interface: # Redirect to Squid proxy server: ${FW} -I -a acc -P tcp -D default/0 80 -r 8080 # Accept packets from local network: ${FW} -I -a acc -P all -S localnet/8 -D default/0 -W eth0 # Only required for other types of traffic (FTP, Telnet): # Forward localnet with masquerading (udp and tcp, no icmp!): ${FW} -F -a m -P tcp -S localnet/8 -D default/0 ${FW} -F -a m -P udp -S localnet/8 -D default/0 Here all traffic from the local LAN with any destination gets redirected to the local port 8080. Rules can be viewed like this: IP firewall input rules, default policy: reject type prot source destination ports acc all 127.0.0.1 127.0.0.1 n/a acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080 acc all 10.0.0.0/8 0.0.0.0/0 n/a acc tcp 0.0.0.0/0 0.0.0.0/0 * -> * I did some testing on Windows 95 with both Microsoft Internet Explorer 3.01 and Netscape Communicator pre-release and it worked with both browsers with the proxy-settings disabled. At one time squid seemed to get in a loop when I pointed the browser to the local port 80. But this could be avoided by adding a reject rule for client to this address: ${FW} -I -a rej -P tcp -S localnet/8 -D hostname/32 80 IP firewall input rules, default policy: reject type prot source destination ports acc all 127.0.0.1 127.0.0.1 n/a rej tcp 10.0.0.0/8 10.0.0.1 * -> 80 acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080 acc all 10.0.0.0/8 0.0.0.0/0 n/a acc tcp 0.0.0.0/0 0.0.0.0/0 * -> * NOTE on resolving names: Instead of just passing the URLs to the proxy server, the browser itself has to resolve the URLs. Make sure the workstations are setup to query a local nameserver, to minimize outgoing traffic. If you're already running a nameserver at the firewall or proxy server (which is a good idea anyway IMHO) let the workstations use this nameserver. Additional notes from Richard Ayres I'm using such a setup. The only issues so far have been that: 1. It's fairly useless to use my service providers parent caches (cache-?.www.demon.net) because by proxying squid only sees IP addresses, not host names and demon aren't generally asked for IP addresses by other users; 2. Linux kernel 2.0.30 is a no-no as transparent proxying is broken (I use 2.0.29); 3. Client browsers must do host name lookups themselves, as they don't know they're using a proxy; 4. The Microsoft Network won't authorize its users through a proxy, so I have to specifically *not* redirect those packets (my company is a MSN content provider). Aside from this, I get a 30-40% hit rate on a 50MB cache for 30-40 users and am quite pleased with the results. See also Daniel Kiracofe's page . 17.3. Transparent caching with Cisco routers by John Saunders This works with at least IOS 11.1 and later I guess. Possibly earlier, as I'm no CISCO expert I can't say for sure. If your router is doing anything more complicated that shuffling packets between an ethernet interface and either a serial port or BRI port, then you should work through if this will work for you. First define a route map with a name of proxy-redirect (name doesn't matter) and specify the next hop to be the machine Squid runs on. ! route-map proxy-redirect permit 10 match ip address 110 set ip next-hop 203.24.133.2 ! Define an access list to trap HTTP requests. The second line allows the Squid host direct access so an routing loop is not formed. By carefully writing your access list as show below, common cases are found quickly and this can greatly reduce the load on your router's processor. ! access-list 110 deny tcp any any neq www access-list 110 deny tcp host 203.24.133.2 any access-list 110 permit tcp any any ! Apply the route map to the ethernet interface. ! interface Ethernet0 ip policy route-map proxy-redirect ! 17.3.1. possible bugs Bruce Morgan notes that there is a Cisco bug relating to transparent proxying using IP policy route maps, that causes NFS and other applications to break. Apparently there are two bug reports raised in Cisco, but they are not available for public dissemination. The problem occurs with o/s packets with more than 1472 data bytes. If you try to ping a host with more than 1472 data bytes across a Cisco interface with the access-lists and ip policy route map, the icmp request will fail. The packet will be fragmented, and the first fragment is checked against the access-list and rejected - it goes the "normal path" as it is an icmp packet - however when the second fragment is checked against the access-list it is accepted (it isn't regarded as an icmp packet), and goes to the action determined by the policy route map! John notes that you may be able to get around this bug by carefully writing your access lists. If the last/default rule is to permit then this bug would be a problem, but if the last/default rule was to deny then it won't be a problem. I guess fragments, other than the first, don't have the information available to properly policy route them. Normally TCP packets should not be fragmented, at least my network runs an MTU of 1500 everywhere to avoid fragmentation. So this would affect UDP and ICMP traffic only. Basically, you will have to pick between living with the bug or better performance. This set has better performance, but suffers from the bug: access-list 110 deny tcp any any neq www access-list 110 deny tcp host 10.1.2.3 any access-list 110 permit tcp any any Conversely, this set has worse performance, but works for all proto- cols: access-list 110 deny tcp host 10.1.2.3 any access-list 110 permit tcp any any eq www access-list 110 deny tcp any any 17.4. Transparent caching with LINUX 2.0.29 and CISCO IOS 11.1 Just for kicks, here's an email message posted to squid-users on how to make transparent proxying work with a Cisco router and Squid running on Linux. by Brian Feeny Here is how I have Transparent proxying working for me, in an environment where my router is a Cisco 2501 running IOS 11.1, and Squid machine is running Linux 2.0.33. Many thanks to the following individuals and the squid-users list for helping me get redirection and transparent proxying working on my Cisco/Linux box. o Lincoln Dale o Riccardo Vratogna o Mark White o Henrik Nordstrom First, here is what I added to my Cisco, which is running IOS 11.1. In IOS 11.1 the route-map command is "process switched" as opposed to the faster "fast-switched" route-map which is found in IOS 11.2 and later. You may wish to be running IOS 11.2. I am running 11.1, and have had no problems with my current load of about 150 simultaneous connections to squid.: ! interface Ethernet0 description To Office Ethernet ip address 208.206.76.1 255.255.255.0 no ip directed-broadcast no ip mroute-cache ip policy route-map proxy-redir ! access-list 110 deny tcp host 208.206.76.44 any eq www access-list 110 permit tcp any any eq www route-map proxy-redir permit 10 match ip address 110 set ip next-hop 208.206.76.44 So basically from above you can see I added the "route-map" declaration, and an access-list, and then turned the route-map on under int e0 "ip policy route-map proxy-redir" ok, so the Cisco is taken care of at this point. The host above: 208.206.76.44, is the ip number of my squid host. My squid box runs Linux, so I had to do the following on it: my kernel (2.0.33) config looks like this: # # Networking options # CONFIG_FIREWALL=y # CONFIG_NET_ALIAS is not set CONFIG_INET=y CONFIG_IP_FORWARD=y CONFIG_IP_MULTICAST=y CONFIG_SYN_COOKIES=y # CONFIG_RST_COOKIES is not set CONFIG_IP_FIREWALL=y # CONFIG_IP_FIREWALL_VERBOSE is not set CONFIG_IP_MASQUERADE=y # CONFIG_IP_MASQUERADE_IPAUTOFW is not set CONFIG_IP_MASQUERADE_ICMP=y CONFIG_IP_TRANSPARENT_PROXY=y CONFIG_IP_ALWAYS_DEFRAG=y # CONFIG_IP_ACCT is not set CONFIG_IP_ROUTER=y You will need Firewalling and Transparent Proxy turned on at a minimum. Then some ipfwadm stuff: # Accept all on loopback ipfwadm -I -a accept -W lo # Accept my own IP, to prevent loops (repeat for each interface/alias) ipfwadm -I -a accept -P tcp -D 208.206.76.44 80 # Send all traffic destined to port 80 to Squid on port 3128 ipfwadm -I -a accept -P tcp -D 0/0 80 -r 3128 it accepts packets on port 80 (redirected from the Cisco), and redirects them to 3128 which is the port my squid process is sitting on. I put all this in /etc/rc.d/rc.local I am using v1.1.20 of Squid with Henrik's patch installed. You will want to install this patch if using a setup similar to mine. 17.5. The cache is trying to connect to itself... by Henrik Nordstrom I think almost everyone who have tried to build a transparent proxy setup have been bitten by this one. Measures you can take: o Deny Squid from fetching objects from itself (using ACL lists). o Apply a small patch that prevents Squid from looping infinitely (available from Henrik's Squid Patches ) o Don't run Squid on port 80, and redirect port 80 not destined for the local machine to Squid (redirection == ipfilter/ipfw/ipfadm). This avoids the most common loops. o If you are using ipfilter then you should also use transproxyd in front of Squid. Squid does not yet know how to interface to ipfilter (patches are welcome: squid-bugs@ircache.net). 17.6. Transparent caching with FreeBSD by Duane Wessels I set out yesterday to make transparent caching work with Squid and FreeBSD. It was, uh, fun. It was relatively easy to configure a cisco to divert port 80 packets to my FreeBSD box. Configuration goes something like this: access-list 110 deny tcp host 10.0.3.22 any eq www access-list 110 permit tcp any any eq www route-map proxy-redirect permit 10 match ip address 110 set ip next-hop 10.0.3.22 int eth2/0 ip policy route-map proxy-redirect Here, 10.0.3.22 is the IP address of the FreeBSD cache machine. Once I have packets going to the FreeBSD box, I need to get the kernel to deliver them to Squid. I started on FreeBSD-2.2.7, and then downloaded IPFilter . This was a dead end for me. The IPFilter distribution includes patches to the FreeBSD kernel sources, but many of these had conflicts. Then I noticed that the IPFilter page says ``It comes as a part of [FreeBSD-2.2 and later].'' Fair enough. Unfortunately, you can't hijack connections with the FreeBSD-2.2.X IPFIREWALL code (ipfw), and you can't (or at least I couldn't) do it with natd either. FreeBSD-3.0 has much better support for connection hijacking, so I suggest you start with that. You need to build a kernel with the following options: options IPFIREWALL options IPFIREWALL_FORWARD Next, its time to configure the IP firewall rules with ipfw. By default, there are no "allow" rules and all packets are denied. I added these commands to /etc/rc.local just to be able to use the machine on my network: ipfw add 60000 allow all from any to any But we're still not hijacking connections. To accomplish that, add these rules: ipfw add 49 allow tcp from 10.0.3.22 to any ipfw add 50 fwd 127.0.0.1 tcp from any to any 80 The second line (rule 50) is the one which hijacks the connection. The first line makes sure we never hit rule 50 for traffic originated by the local machine. This prevents forwarding loops. Note that I am not changing the port number here. That is, port 80 packets are simply diverted to Squid on port 80. My Squid configuration is: http_port 80 httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on If you don't want Squid to listen on port 80 (because that requires root privileges) then you can use another port. In that case your ipfw redirect rule looks like: ipfw add 50 fwd 127.0.0.1,3128 tcp from any to any 80 and the squid.conf lines are: http_port 3128 httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on 17.7. Transparent caching with Linux and ipchains by Martin Lyons The following shows important kernel features to include: [*] Network firewalls [ ] Socket Filtering [*] Unix domain sockets [*] TCP/IP networking [ ] IP: multicasting [ ] IP: advanced router [ ] IP: kernel level autoconfiguration [*] IP: firewalling [ ] IP: firewall packet netlink device [*] IP: always defragment (required for masquerading) [*] IP: transparent proxy support You must include the IP: always defragment, otherwise it prevents you from using the REDIRECT chain. The following script is used to configure ipchains: #Send all traffic destined to port 80 to squid on port 8081 /sbin/ipchains -A input -p tcp -s 10.0.3.22/16 -d 0/0 80 -j REDIRECT 8081 Also, Andrew Shipton notes that with 2.0.x kernels you don't need to enable packet forwarding, but with the 2.1.x and 2.2.x kernels using ipchains you do. Packet forwarding is enabled with the following command: echo 1 > /proc/sys/net/ipv4/ip_forward 17.8. Transparent caching with ACC Tigris digital access server by John Saunders This is to do with configuring transparent proxy for an ACC Tigris digital access server (like a CISCO 5200/5300 or an Ascend MAX 4000). I've found that doing this in the NAS reduces traffic on the LAN and reduces processing load on the CISCO. The Tigris has ample CPU for filtering. Step 1 is to create filters that allow local traffic to pass. Add as many as needed for all of your address ranges. ADD PROFILE IP FILTER ENTRY local1 INPUT 10.0.3.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL ADD PROFILE IP FILTER ENTRY local2 INPUT 10.0.4.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL Step 2 is to create a filter to trap port 80 traffic. ADD PROFILE IP FILTER ENTRY http INPUT 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 = 0x6 D= 80 NORMAL Step 3 is to set the "APPLICATION_ID" on port 80 traffic to 80. This causes all packets matching this filter to have ID 80 instead of the default ID of 0. SET PROFILE IP FILTER APPLICATION_ID http 80 Step 4 is to create a special route that is used for packets with "APPLICATION_ID" set to 80. The routing engine uses the ID to select which routes to use. ADD IP ROUTE ENTRY 0.0.0.0 0.0.0.0 PROXY-IP 1 SET IP ROUTE APPLICATION_ID 0.0.0.0 0.0.0.0 PROXY-IP 80 Step 5 is to bind everything to a filter ID called transproxy. List all local filters first and the http one last. ADD PROFILE ENTRY transproxy local1 local2 http With this in place use your RADIUS server to send back the ``Framed- Filter-Id = transproxy'' key/value pair to the NAS. You can check if the filter is being assigned to logins with the following command: display profile port table 17.9. ``Connection reset by peer'' and Cisco policy routing Fyodor has tracked down the cause of unusual ``connection reset by peer'' messages when using Cisco policy routing to hijack HTTP requests. When the network link between router and the cache goes down for just a moment, the packets that are supposed to be redirected are instead sent out the default route. If this happens, a TCP ACK from the client host may be sent to the origin server, instead of being diverted to the cache. The origin server, upon receiving an unexpected ACK packet, sends a TCP RESET back to the client, which aborts the client's request. To work around this problem, you can install a static route to the null0 interface for the cache address with a higher metric (lower precedence), such as 250. Then, when the link goes down, packets from the client just get dropped instead of sent out the default route. For example, if 1.2.3.4 is the IP address of your Squid cache, you may add: ip route 1.2.3.4 255.255.255.255 Null0 250 This appears to cause the correct behaviour. 17.10. WCCP - Web Cache Coordination Protocol Contributors: Glenn Chisholm and Lincoln Dale . 17.10.1. Does Squid support WCCP? CISCO's Web Cache Coordination Protocol V1.0 is supported in squid 2.3 and later. Due to licencing requirements squid is not able to support WCCP V2.0. If CISCO chooses to open WCCP V2 and relax the licensing terms, Squid may be able to support it in the future. 17.10.2. Configuring your Router There are two different methods of configuring WCCP on CISCO routers. The first method is for routers that only support V1.0 of the protocol. The second is for routers that support both. 17.10.2.1. IOS Version 11.x It is possible that later versions of IOS 11.x will support V2.0 of the protocol. If that is the case follow the 12.x instructions. Several people have reported that the squid implimentation of WCCP does not work with their 11.x routers. If you experience this please mail the debug output from your router to squid-bugs. conf t wccp enable ! interface [Interface Carrying Outgoing Traffic]x/x ! ip wccp web-cache redirect ! CTRL Z write mem 17.10.2.2. IOS Version 12.x Some of the early versions of 12.x do not have the 'ip wccp version' command. You will need to upgrade your IOS version to use V1.0. You will need to be running at least IOS Software Release 12.0(5)T if you're running the 12.0 T-train. IOS Software Releases 12.0(3)T and 12.0(4)T do not have WCCPv1, but 12.0(5)T does. conf t ip wccp version 1 ip wccp web-cache ! interface [Interface Carrying Outgoing/Incomming Traffic]x/x ip wccp web-cache redirect out|in ! CTRL Z write mem 17.10.3. Configuring FreeBSD FreeBSD first needs to be configured to recieve and strip the GRE encapsulation from the packets from the router. To do this you will need to patch and recompile your kernel. First, a patch needs to be applied to your kernel for GRE support. Apply this patch <../../WCCP-support/FreeBSD/gre.patch> to your 3.1 kernel. If you use a version other than 3.1 it may not apply. You will need to manually change the appropriate files. Secondly you will need to download gre.c <../../WCCP- support/FreeBSD/gre.c> and copy it to /usr/src/sys/netinet/gre.c. Finally add "OPTION GRE" to your kernel config file and rebuild your kernel. Note, the opt_gre.h file is created when you run config. Once your kernel is installed you will need to ``configure FreeBSD for transparent proxying''. 17.10.4. Configuring Linux 2.2 There are currently two methods for supporting WCCP with Linux 2.2. A specific purpose module. Or the standard Linux GRE tunneling driver. People have reported difficulty with the standard GRE tunneling driver, however it does allow GRE functionality other than WCCP. You should choose the method that suits your enviroment. 17.10.4.1. Standard Linux GRE Tunnel Linux 2.2 kernels already support GRE, as long as the GRE module is compiled into the kernel. You will need to patch the ip_gre.c code that comes with your Linux kernel with this patch supplied by Jan Haluza . Ensure that the GRE code is either built as static or as a module by chosing the appropriate option in your kernel config. Then rebuild your kernel. If it is a module you will need to: modprobe ip_gre The next step is to tell Linux to establish an IP tunnel between the router and your host. iptunnel add gre1 mode gre remote local dev ifconfig gre1 up is the IP address of your router that is intercepting the HTTP packets. is the IP address of your cache, and is the network interface that receives those packets (probably eth0). 17.10.4.2. WCCP Specific Module This module is not part of the standard Linux distributon. It needs to be compiled as a module and loaded on your system to function. Do not attempt to build this in as a static part of your kernel. Download the Linux WCCP module <../../WCCP-support/Linux/ip_wccp.c> and compile it as you would any Linux network module. Copy the module to /lib/modules/kernel-version/ipv4/ip_wccp.o. Edit /lib/modules/kernel-version/modules.dep and add: /lib/modules/kernel-version/ipv4/ip_wccp.o: Finally you will need to load the module: modprobe ip_wccp 17.10.4.3. Common Steps The machine should now be striping the GRE encapsulation from any packets recieved and requeuing them. The system will also need to be configured for transparent proxying, either with ``ipfwadm'' or with ``ipchains''. 17.10.5. Configuring Others If you have managed to configuring your operating system to support WCCP with Squid please contact us with the details so we may share them with others. 17.11. Can someone tell me what version of cisco IOS WCCP is added in? IOS releases: o 11.1(19?)CA/CC or later o 11.2(14)P or later o 12.0(anything) or later 17.12. Transparent caching with Foundry L4 switches by Brian Feeny . First, configure Squid for transparent caching as detailed at the ``beginning of this section''. Next, configure the Foundry layer 4 switch to transparently redirect traffic to your Squid box or boxes. By default, the Foundry redirects to port 80 of your squid box. This can be changed to a different port if needed, but won't be covered here. In addition, the switch does a "health check" of the port to make sure your squid is answering. If you squid does not answer, the switch defaults to sending traffic directly thru instead of redirecting it. When the Squid comes back up, it begins redirecting once again. This example assumes you have two squid caches: squid1.foo.com 192.168.1.10 squid2.foo.com 192.168.1.11 We will assume you have various workstations, customers, etc, plugged into the switch for which you want them to be transparently proxied. The squid caches themselves should be plugged into the switch as well. Only the interface that the router is connected to is important. Where you put the squid caches or other connections does not matter. This example assumes your router is plugged into interface 17 of the switch. If not, adjust the following commands accordingly. 1. Enter configuration mode: telnet@ServerIron#conf t 2. Configure each squid on the Foundry: telnet@ServerIron(config)# server cache-name squid1 192.168.1.10 telnet@ServerIron(config)# server cache-name squid2 192.168.1.11 3. Add the squids to a cache-group: telnet@ServerIron(config)#server cache-group 1 telnet@ServerIron(config-tc-1)#cache-name squid1 telnet@ServerIron(config-tc-1)#cache-name squid2 4. Create a policy for caching http on a local port telnet@ServerIron(config)# ip policy 1 cache tcp http local 5. Enable that policy on the port connected to your router telnet@ServerIron(config)#int e 17 telnet@ServerIron(config-if-17)# ip-policy 1 Since all outbound traffic to the Internet goes out interface 17 (the router), and interface 17 has the caching policy applied to it, HTTP traffic is going to be intercepted and redirected to the caches you have configured. The default port to redirect to can be changed. The load balancing algorithm used can be changed (Least Used, Round Robin, etc). Ports can be exempted from caching if needed. Access Lists can be applied so that only certain source IP Addresses are redirected, etc. This information was left out of this document since this was just a quick howto that would apply for most people, not meant to be a comprehensive manual of how to configure a Foundry switch. I can however revise this with any information necessary if people feel it should be included. 18. SNMP Contributors: Glenn Chisholm . 18.1. Does Squid support SNMP? True SNMP support is available in squid 2 and above. A significant change in the implimentation occured starting with the development 2.2 code. Therefore there are two sets of instructions on how to configure SNMP in squid, please make sure that you follow the correct one. 18.2. Enabling SNMP in Squid To use SNMP, it must first be enabled with the configure script, and squid rebuilt. To enable is first run the script: ./configure --enable-snmp [ ... other configure options ] Next, recompile after cleaning the source tree : make clean make all make install Once the compile is completed and the new binary is installed the squid.conf file needs to be configured to allow access; the default is to deny all requests. The instructions on how to do this have been broken into two parts, the first for all versions of Squid from 2.2 onwards and the second for 2.1 and below. 18.3. Configuring Squid 2.2 To configure SNMP first specify a list of communities that you would like to allow access by using a standard acl of the form: acl aclname snmp_community string For example: acl snmppublic snmp_community public acl snmpjoebloggs snmp_community joebloggs This creates two acl's, with two different communities, public and joebloggs. You can name the acl's and the community strings anything that you like. To specify the port that the agent will listen on modify the "snmp_port" parameter, it is defaulted to 3401. The port that the agent will forward requests that can not be furfilled by this agent to is set by "forward_snmpd_port" it is defaulted to off. It must be configured for this to work. Remember that as the requests will be originating from this agent you will need to make sure that you configure your access accordingly. To allow access to Squid's SNMP agent, define an snmp_access ACL with the community strings that you previously defined. For example: snmp_access allow snmppublic localhost snmp_access deny all The above will allow anyone on the localhost who uses the community public to access the agent. It will deny all others access. If you do not define any snmp_access ACL's, then SNMP access is denied by default. Finally squid allows to you to configure the address that the agent will bind to for incomming and outgoing traffic. These are defaulted to 0.0.0.0, changing these will cause the agent to bind to a specific address on the host, rather than the default which is all. snmp_incoming_address 0.0.0.0 snmp_outgoing_address 0.0.0.0 18.4. Configuring Squid 2.1 Prior to Squid 2.1 the SNMP code had a number of issues with the ACL's. If you are a frequent user of SNMP with Squid, please upgrade to 2.2 or higher. A sort of default, working configuration is: snmp_port 3401 snmp_mib_path /local/squid/etc/mib.txt snmp_agent_conf view all .1.3.6 included snmp_agent_conf view squid .1.3.6 included snmp_agent_conf user squid - all all public snmp_agent_conf user all all all all squid snmp_agent_conf community public squid squid snmp_agent_conf community readwrite all all Note that for security you are advised to restrict SNMP access to your caches. You can do this easily as follows: acl snmpmanagementhosts 1.2.3.4/255.255.255.255 1.2.3.0/255.255.255.0 snmp_acl public deny all !snmpmanagementhosts snmp_acl readwrite deny all You must follow these instructions for 2.1 and below exactly or you are likely to have problems. The parser has some issues which have been corrected in 2.2. 18.5. How can I query the Squid SNMP Agent You can test if your Squid supports SNMP with the snmpwalk program (snmpwalk is a part of the UCD-SNMP project ). Note that you have to specify the SNMP port, which in Squid defaults to 3401. snmpwalk -p 3401 hostname communitystring .1.3.6.1.4.1.3495.1.1 If it gives output like: enterprises.nlanr.squid.cacheSystem.cacheSysVMsize = 7970816 enterprises.nlanr.squid.cacheSystem.cacheSysStorage = 2796142 enterprises.nlanr.squid.cacheSystem.cacheUptime = Timeticks: (766299) 2:07:42.99 then it is working ok, and you should be able to make nice statistics out of it. For an explanation of what every string (OID) does, you should refer to the Cache SNMP web pages . 18.6. What can I use SNMP and Squid for? There are a lot of things you can do with SNMP and Squid. It can be useful in some extent for a longer term overview of how your proxy is doing. It can also be used as a problem solver. For example: how is it going with your filedescriptor usage? or how much does your LRU vary along a day. Things you can't monitor very well normally, aside from clicking at the cachemgr frequently. Why not let MRTG do it for you? 18.7. How can I use SNMP with Squid? There are a number of tools that you can use to monitor Squid via SNMP. A very popular one is MRTG, there are however a number of others. To learn what they are and to get additional documentation, please visit the Cache SNMP web pages . 18.7.1. MRTG We use MRTG to query Squid through its SNMP interface . To get instruction on using MRTG with Squid please visit the Cache SNMP web pages . 18.8. Where can I get more information/discussion about Squid and SNMP? General Discussion: cache-snmp@ircache.net These messages are archived . Subscriptions should be sent to: cache-snmp-request@ircache.net . 19. Squid version 2 19.1. What are the new features? o HTTP/1.1 persistent connections. o Lower VM usage; in-transit objects are not held fully in memory. o Totally independent swap directories. o Customizable error texts. o FTP supported internally; no more ftpget. o Asynchronous disk operations (optional, requires pthreads library). o Internal icons for FTP and gopher directories. o snprintf() used everywhere instead of sprintf(). o SNMP. o URN support o Routing requests based on AS numbers. o Cache Digests o 19.2. How do I configure 'ssl_proxy' now? By default, Squid connects directly to origin servers for SSL requests. But if you must force SSL requests through a parent, first tell Squid it can not go direct for SSL: acl SSL method CONNECT never_direct allow SSL With this in place, Squid should pick one of your parents to use for SSL requests. If you want it to pick a particular parent, you must use the cache_peer_access configuration: cache_peer parent1 parent 3128 3130 cache_peer parent2 parent 3128 3130 cache_peer_access parent2 allow !SSL The above lines tell Squid to NOT use parent2 for SSL, so it should always use parent1. 19.3. Logfile rotation doesn't work with Async I/O It is a know limitation when using Async I/O on Linux. The Linux Threads package steals (uses internally) the SIGUSR1 signal that squid uses to rotate logs. In order to not disturb the threads package SIGUSR1 use is disabled in Squid when threads is enabled on Linux. 19.4. Adding a new cache disk Simply add your new cache_dir line to squid.conf, then run squid -z again. Squid will create swap directories on the new disk and leave the existing ones in place. 19.5. Squid 2 performs badly on Linux by Henrik Nordstrom You may have enabled Asyncronous I/O with the --enable-async-io configure option. Be careful when using threads on Linux. Most versions of libc5 and early versions of glibc have problems with threaded applications. I would not recommend --enable-async-io on Linux unless your system uses a recent version of glibc. You should also know that --enable-async-io is not optimal unless you have a very busy cache. For low loads the cache performs slightly better without --enable-async-io. Try recompiling Squid without --enable-async-io. If a non-threaded Squid performs better then your libc probably can't handle threads correctly. (don't forget "make clean" after running configure) 19.6. How do I configure proxy authentication with Squid-2? For Squid-2, the implementation and configuration has changed. Authentication is now handled via external processes. Arjan's proxy auth page describes how to set it up. Some simple instructions are given below as well. 1. We assume you have configured an ACL entry with proxy_auth, for example: acl foo proxy_auth REQUIRED http_access allow foo 2. You will need to compile and install an external authenticator program. Most people will want to use ncsa_auth. The source for this program is included in the source distribution, in the auth_modules/NCSA directory. % cd auth_modules/NCSA % make % make install You should now have an ncsa_auth program in the same directory where your squid binary lives. 3. You may need to create a password file. If you have been using proxy authentication before, you probably already have such a file. You can get apache's htpasswd program from our server. Pick a pathname for your password file. We will assume you will want to put it in the same directory as your squid.conf. 4. Configure the external authenticator in squid.conf. For ncsa_auth you need to give the pathname to the executable and the password file as an argument. For example: authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd After all that, you should be able to start up Squid. If we left something out, or haven't been clear enough, please let us know (squid-faq@ircache.net). 19.7. Why does proxy-auth reject all users with Squid-2.2? The ACL for proxy-authentication has changed from: acl foo proxy_auth timeout to: acl foo proxy_auth username Please update your ACL appropriately - a username of REQUIRED will permit all valid usernames. The timeout is now specified with the configuration option: authenticate_ttl timeout 19.8. Delay Pools by David Luyer . The information here is current for version 2.2. It is strongly recommended that you use at least Squid 2.2 if you wish to use delay pools. Delay pools provide a way to limit the bandwidth of certain requests based on any list of criteria. The idea came from a Western Australian university who wanted to restrict student traffic costs (without affecting staff traffic, and still getting cache and local peering hits at full speed). There was some early Squid 1.0 code by Central Network Services at Murdoch University, which I then developed (at the University of Western Australia) into a much more complex patch for Squid 1.0 called ``DELAY_HACK.'' I then tried to code it in a much cleaner style and with slightly more generic options than I personally needed, and called this ``delay pools'' in Squid 2. I almost completely recoded this in Squid 2.2 to provide the greater flexibility requested by people using the feature. To enable delay pools features in Squid 2.2, you must use the --enable-delay-pools configure option before compilation. Terminology for this FAQ entry: pool a collection of bucket groups as appropriate to a given class bucket group a group of buckets within a pool, such as the per-host bucket group, the per-network bucket group or the aggregate bucket group (the aggregate bucket group is actually a single bucket) bucket an individual delay bucket represents a traffic allocation which is replenished at a given rate (up to a given limit) and causes traffic to be delayed when empty class the class of a delay pool determines how the delay is applied, ie, whether the different client IPs are treated seperately or as a group (or both) class 1 a class 1 delay pool contains a single unified bucket which is used for all requests from hosts subject to the pool class 2 a class 2 delay pool contains one unified bucket and 255 buckets, one for each host on an 8-bit network (IPv4 class C) class 3 contains 255 buckets for the subnets in a 16-bit network, and individual buckets for every host on these networks (IPv4 class B) Delay pools allows you to limit traffic for clients or client groups, with various features: o can specify peer hosts which aren't affected by delay pools, ie, local peering or other 'free' traffic (with the no-delay peer option). o delay behavior is selected by ACLs (low and high priority traffic, staff vs students or student vs authenticated student or so on). o each group of users has a number of buckets, a bucket has an amount coming into it in a second and a maximum amount it can grow to; when it reaches zero, objects reads are deferred until one of the object's clients has some traffic allowance. o any number of pools can be configured with a given class and any set of limits within the pools can be disabled, for example you might only want to use the aggregate and per-host bucket groups of class 3, not the per-network one. This allows options such as creating a number of class 1 delay pools and allowing a certain amount of bandwidth to given object types (by using URL regular expressions or similar), and many other uses I'm sure I haven't even though of beyond the original fair balancing of a relatively small traffic allocation across a large number of users. There are some limitations of delay pools: o delay pools are incompatible with slow aborts; quick abort should be set fairly low to prevent objects being retrived at full speed once there are no clients requesting them (as the traffic allocation is based on the current clients, and when there are no clients attached to the object there is no way to determine the traffic allocation). o delay pools only limits the actual data transferred and is not inclusive of overheads such as TCP overheads, ICP, DNS, icmp pings, etc. o it is possible for one connection or a small number of connections to take all the bandwidth from a given bucket and the other connections to be starved completely, which can be a major problem if there are a number of large objects being transferred and the parameters are set in a way that a few large objects will cause all clients to be starved (potentially fixed by a currently experimental patch). 19.8.1. How can I limit Squid's total bandwidth to, say, 512 Kbps? acl all src 0.0.0.0/0.0.0.0 # might already be defined delay_pools 1 delay_class 1 1 delay_access 1 allow all delay_parameters 1 64000/64000 # 512 kbits == 64 kbytes per second For an explanation of these tags please see the configuration file. The 1 second buffer (max = restore = 64kbytes/sec) is because a limit is requested, and no responsiveness to a busrt is requested. If you want it to be able to respond to a burst, increase the aggregate_max to a larger value, and traffic bursts will be handled. It is recommended that the maximum is at least twice the restore value - if there is only a single object being downloaded, sometimes the download rate will fall below the requested throughput as the bucket is not empty when it comes to be replenished. 19.8.2. How to limit a single connection to 128 Kbps? You can not limit a single HTTP request's connection speed. You can limit individual hosts to some bandwidth rate. To limit a specific host, define an acl for that host and use the example above. To limit a group of hosts, then you must use a delay pool of class 2 or 3. For example: acl only128kusers src 192.168.1.0/255.255.192.0 acl all src 0.0.0.0/0.0.0.0 delay_pools 1 delay_class 1 3 delay_access 1 allow only128kusers delay_access 1 deny all delay_parameters 1 64000/64000 -1/-1 16000/64000 For an explanation of these tags please see the configuration file. The above gives a solution where a cache is given a total of 512kbits to operate in, and each IP address gets only 128kbits out of that pool. 19.8.3. How do you personally use delay pools? We have six local cache peers, all with the options 'proxy-only no- delay' since they are fast machines connected via a fast ethernet and microwave (ATM) network. For our local access we use a dstdomain ACL, and for delay pool exceptions we use a dst ACL as well since the delay pool ACL processing is done using lookup if it would need one. Our proxy has two virtual interfaces, one which requires student authentication to connect from machines where a department is not paying for traffic, and one which uses delay pools. Also, users of the main Unix system are allowed to choose slow or fast traffic, but must pay for any traffic they do using the fast cache. Ident lookups are disabled for accesses through the slow cache since they aren't needed. Slow accesses are delayed using a class 3 delay pool to give fairness between departments as well as between users. We recognize users of Lynx on the main host are grouped together in one delay bucket but they are mostly viewing text pages anyway, so this isn't considered a serious problem. If it was we could take those hosts into a class 1 delay pool and give it a larger allocation. I prefer using a slow restore rate and a large maximum rate to give preference to people who are looking at web pages as their individual bucket fills while they are reading, and those downloading large objects are disadvantaged. This depends on which clients you believe are more important. Also, one individual 8 bit network (a residential college) have paid extra to get more bandwidth. The relevant parts of my configuration file are (IP addresses, etc, all changed): # ACL definitions # Local network definitions, domains a.net, b.net acl LOCAL-NET dstdomain a.net b.net # Local network; nets 64 - 127. Also nearby network class A, 10. acl LOCAL-IP dst 192.168.64.0/255.255.192.0 10.0.0.0/255.0.0.0 # Virtual i/f used for slow access acl virtual_slowcache myip 192.168.100.13/255.255.255.255 # All permitted slow access, nets 96 - 127 acl slownets src 192.168.96.0/255.255.224.0 # Special 'fast' slow access, net 123 acl fast_slow src 192.168.123.0/255.255.255.0 # User hosts acl my_user_hosts src 192.168.100.2/255.255.255.254 # "All" ACL acl all src 0.0.0.0/0.0.0.0 # Don't need ident lookups for billing on (free) slow cache ident_lookup_access allow my_user_hosts !virtual_slowcache ident_lookup_access deny all # Security access checks http_access [...] # These people get in for slow cache access http_access allow virtual_slowcache slownets http_access deny virtual_slowcache # Access checks for main cache http_access [...] # Delay definitions (read config file for clarification) delay_pools 2 delay_initial_bucket_level 50 delay_class 1 3 delay_access 1 allow virtual_slowcache !LOCAL-NET !LOCAL-IP !fast_slow delay_access 1 deny all delay_parameters 1 8192/131072 1024/65536 256/32768 delay_class 2 2 delay_access 2 allow virtual_slowcache !LOCAL-NET !LOCAL-IP fast_slow delay_access 2 deny all delay_parameters 2 2048/65536 512/32768 The same code is also used by a some of departments using class 2 delay pools to give them more flexibility in giving different performance to different labs or students. 19.8.4. Where else can I find out about delay pools? This is also pretty well documented in the configuration file, with examples. Since people seem to loose their config files, here's a copy of the relevant section. # DELAY POOL PARAMETERS (all require DELAY_POOLS compilation option) # ----------------------------------------------------------------------------- # TAG: delay_pools # This represents the number of delay pools to be used. For example, # if you have one class 2 delay pool and one class 3 delays pool, you # have a total of 2 delay pools. # # To enable this option, you must use --enable-delay-pools with the # configure script. #delay_pools 0 # TAG: delay_class # This defines the class of each delay pool. There must be exactly one # delay_class line for each delay pool. For example, to define two # delay pools, one of class 2 and one of class 3, the settings above # and here would be: # #delay_pools 2 # 2 delay pools #delay_class 1 2 # pool 1 is a class 2 pool #delay_class 2 3 # pool 2 is a class 3 pool # # The delay pool classes are: # # class 1 Everything is limited by a single aggregate # bucket. # # class 2 Everything is limited by a single aggregate # bucket as well as an "individual" bucket chosen # from bits 25 through 32 of the IP address. # # class 3 Everything is limited by a single aggregate # bucket as well as a "network" bucket chosen # from bits 17 through 24 of the IP address and a # "individual" bucket chosen from bits 17 through # 32 of the IP address. # # NOTE: If an IP address is a.b.c.d # -> bits 25 through 32 are "d" # -> bits 17 through 24 are "c" # -> bits 17 through 32 are "c * 256 + d" # TAG: delay_access # This is used to determine which delay pool a request falls into. # The first matched delay pool is always used, ie, if a request falls # into delay pool number one, no more delay are checked, otherwise the # rest are checked in order of their delay pool number until they have # all been checked. For example, if you want some_big_clients in delay # pool 1 and lotsa_little_clients in delay pool 2: # #delay_access 1 allow some_big_clients #delay_access 1 deny all #delay_access 2 allow lotsa_little_clients #delay_access 2 deny all # TAG: delay_parameters # This defines the parameters for a delay pool. Each delay pool has # a number of "buckets" associated with it, as explained in the # description of delay_class. For a class 1 delay pool, the syntax is: # #delay_parameters pool aggregate # # For a class 2 delay pool: # #delay_parameters pool aggregate individual # # For a class 3 delay pool: # #delay_parameters pool aggregate network individual # # The variables here are: # # pool a pool number - ie, a number between 1 and the # number specified in delay_pools as used in # delay_class lines. # # aggregate the "delay parameters" for the aggregate bucket # (class 1, 2, 3). # # individual the "delay parameters" for the individual # buckets (class 2, 3). # # network the "delay parameters" for the network buckets # (class 3). # # A pair of delay parameters is written restore/maximum, where restore is # the number of bytes (not bits - modem and network speeds are usually # quoted in bits) per second placed into the bucket, and maximum is the # maximum number of bytes which can be in the bucket at any time. # # For example, if delay pool number 1 is a class 2 delay pool as in the # above example, and is being used to strictly limit each host to 64kbps # (plus overheads), with no overall limit, the line is: # #delay_parameters 1 -1/-1 8000/8000 # # Note that the figure -1 is used to represent "unlimited". # # And, if delay pool number 2 is a class 3 delay pool as in the above # example, and you want to limit it to a total of 256kbps (strict limit) # with each 8-bit network permitted 64kbps (strict limit) and each # individual host permitted 4800bps with a bucket maximum size of 64kb # to permit a decent web page to be downloaded at a decent speed # (if the network is not being limited due to overuse) but slow down # large downloads more significantly: # #delay_parameters 2 32000/32000 8000/8000 600/64000 # # There must be one delay_parameters line for each delay pool. # TAG: delay_initial_bucket_level (percent, 0-100) # The initial bucket percentage is used to determine how much is put # in each bucket when squid starts, is reconfigured, or first notices # a host accessing it (in class 2 and class 3, individual hosts and # networks only have buckets associated with them once they have been # "seen" by squid). # #delay_initial_bucket_level 50 19.9. Can I preserve my cache when upgrading from 1.1 to 2? At the moment we do not have a script which will convert your cache contents from the 1.1 to the Squid-2 format. If enough people ask for one, then somebody will probably write such a script. If you like, you can configure a new Squid-2 cache with your old Squid-1.1 cache as a sibling. After a few days, weeks, or however long you want to wait, shut down the old Squid cache. If you want to force-load your new cache with the objects from the old cache, you can try something like this: 1. Install Squid-2 and configure it to have the same amount of disk space as your Squid-1 cache, even if there is not currently that much space free. 2. Configure Squid-2 with Squid-1 as a parent cache. You might want to enable never_direct on the Squid-2 cache so that all of Squid-2's requests go through Squid-1. 3. Enable the ``PURGE method'' on Squid-1. 4. Set the refresh rules on Squid-1 to be very liberal so that it does not generate IMS requests for cached objects. 5. Create a list of all the URLs in the Squid-1 cache. These can be extracted from the access.log, store.log and swap logs. 6. For every URL in the list, request the URL from Squid-2, and then immediately send a PURGE request to Squid-1. 7. Eventually Squid-2 will have all the objects, and Squid-1 will be empty. 19.10. Customizable Error Messages Squid-2 lets you customize your error messages. The source distribution includes error messages in different languages. You can select the language with the configure option: --enable-err-language=lang Furthermore, you can rewrite the error message template files if you like. This list describes the tags which Squid will insert into the messages: %B URL with FTP %2f hack %c Squid error code %d seconds elapsed since request received %e errno %E strerror() %f FTP request line %F FTP reply line %g FTP server message %h cache hostname %H server host name %i client IP address %I server IP address %L contents of err_html_text config option %M Request Method %p URL port \# %P Protocol %R Full HTTP Request %S squid signature from ERR_SIGNATURE %s caching proxy software with version %t local time %T UTC %U URL without password %u URL without password, %2f added to path %w cachemgr email address %z dns server error message 19.11. My squid.conf from version 1.1 doesn't work! Yes, a number of configuration directives have been renamed. Here are some of them: cache_host This is now called cache_peer. The old term does not really describe what you are configuring, but the new name tells you that you are configuring a peer for your cache. cache_host_domain Renamed to cache_peer_domain. local_ip, local_domain The functaionality provided by these directives is now implemented as access control lists. You will use the always_direct and never_direct options. The new squid.conf file has some examples. cache_stoplist This directive also has been reimplemented with access control lists. You will use the no_cache option. For example: acl Uncachable url_regex cgi ? no_cache deny Uncachable cache_swap This option used to specify the cache disk size. Now you specify the disk size on each cache_dir line. cache_host_acl This option has been renamed to cache_peer_access and the syntax has changed. Now this option is a true access control list, and you must include an allow or deny keyword. For example: acl that-AS dst_as 1241 cache_peer_access thatcache.thatdomain.net allow that-AS cache_peer_access thatcache.thatdomain.net deny all This example sends requests to your peer thatcache.thatdomain.net only for origin servers in Autonomous System Number 1241. units In Squid-1.1 many of the configuration options had implied units associated with them. For example, the connect_timeout value may have been in seconds, but the read_timeout value had to be given in minutes. With Squid-2, these directives take units after the numbers, and you will get a warning if you leave off the units. For example, you should now write: connect_timeout 120 seconds read_timeout 15 minutes 20. httpd-accelerator mode 20.1. What is the httpd-accelerator mode? Occasionally people have trouble understanding accelerators and proxy caches, usually resulting from mixed up interpretations of "incoming" and ``outgoing" data. I think in terms of requests (i.e., an outgoing request is from the local site out to the big bad Internet). The data received in reply is incoming, of course. Others think in the opposite sense of ``a request for incoming data". An accelerator caches incoming requests for outgoing data (i.e., that which you publish to the world). It takes load away from your HTTP server and internal network. You move the server away from port 80 (or whatever your published port is), and substitute the accelerator, which then pulls the HTTP data from the ``real" HTTP server (only the accelerator needs to know where the real server is). The outside world sees no difference (apart from an increase in speed, with luck). Quite apart from taking the load of a site's normal web server, accelerators can also sit outside firewalls or other network bottlenecks and talk to HTTP servers inside, reducing traffic across the bottleneck and simplifying the configuration. Two or more accelerators communicating via ICP can increase the speed and resilience of a web service to any single failure. The Squid redirector can make one accelerator act as a single front- end for multiple servers. If you need to move parts of your filesystem from one server to another, or if separately administered HTTP servers should logically appear under a single URL hierarchy, the accelerator makes the right thing happen. If you wish only to cache the ``rest of the world" to improve local users browsing performance, then accelerator mode is irrelevant. Sites which own and publish a URL hierarchy use an accelerator to improve other sites' access to it. Sites wishing to improve their local users' access to other sites' URLs use proxy caches. Many sites, like us, do both and hence run both. Measurement of the Squid cache and its Harvest counterpart suggest an order of magnitude performance improvement over CERN or other widely available caching software. This order of magnitude performance improvement on hits suggests that the cache can serve as an httpd accelerator, a cache configured to act as a site's primary httpd server (on port 80), forwarding references that miss to the site's real httpd (on port 81). In such a configuration, the web administrator renames all non- cachable URLs to the httpd's port (81). The cache serves references to cachable objects, such as HTML pages and GIFs, and the true httpd (on port 81) serves references to non-cachable objects, such as queries and cgi-bin programs. If a site's usage characteristics tend toward cachable objects, this configuration can dramatically reduce the site's web workload. Note that it is best not to run a single squid process as both an httpd-accelerator and a proxy cache, since these two modes will have different working sets. You will get better performance by running two separate caches on separate machines. However, for compatability with how administrators are accustomed to running other servers that provide both proxy and Web serving capability (eg, CERN), the Squid supports operation as both a proxy and an accelerator if you set the httpd_accel_with_proxy variable to on inside your squid.conf configuration file. 20.2. How do I set it up? First, you have to tell Squid to listen on port 80 (usually), so set the 'http_port' option: http_port 80 Next, you need to move your normal HTTP server to another port and/or another machine. If you want to run your HTTP server on the same machine, then it can not also use port 80 (except see the next FAQ entry below). A common choice is port 81. Configure squid as follows: httpd_accel_host localhost httpd_accel_port 81 Alternatively, you could move the HTTP server to another machine and leave it on port 80: httpd_accel_host otherhost.foo.com httpd_accel_port 80 You should now be able to start Squid and it will serve requests as a HTTP server. If you are using Squid has an accelerator for a virtual host system, then you need to specify httpd_accel_host virtual Finally, if you want Squid to also accept proxy requests (like it used to before you turned it into an accelerator), then you need to enable this option: httpd_accel_with_proxy on 20.3. When using an httpd-accelerator, the port number for redirects is wrong Yes, this is because you probably moved your real httpd to port 81. When your httpd issues a redirect message (e.g. 302 Moved Temporarily), it knows it is not running on the standard port (80), so it inserts :81 in the redirected URL. Then, when the client requests the redirected URL, it bypasses the accelerator. How can you fix this? One way is to leave your httpd running on port 80, but bind the httpd socket to a specific interface, namely the loopback interface. With Apache you can do it like this in httpd.conf: Port 80 BindAddress 127.0.0.1 Then, in your squid.conf file, you must specify the loopback address as the accelerator: httpd_accel_host 127.0.0.1 httpd_accel_port 80 Note, you probably also need to add an /etc/hosts entry of 127.0.0.1 for your server hostname. Otherwise, Squid may get stuck in a forwarding loop. 21. Related Software 21.1. Clients 21.1.1. Wget Wget is a command-line Web client. It supports recursive retrievals and HTTP proxies. 21.1.2. echoping If you want to test your Squid cache in batch (from a cron command, for instance), you can use the echoping program, which will tell you (in plain text or via an exit code) if the cache is up or not, and will indicate the response times. 21.2. Logfile Analysis David Marwood and Brad Duska have some cool Squid Proxy Analysis software. 21.2.1. SQUIDTIMES SQUIDTIMES by Nico Tranquilli. 21.3. Configuration Tools 21.3.1. 3Dhierarchy.pl Kenichi Matsui has a simple perl script which generates a 3D hierarchy map (in VRML) from squid.conf. 3Dhierarchy.pl . 21.4. Squid add-ons 21.4.1. transproxy transproxy is a program used in conjunction with the Linux Transparent Proxy networking feature, and ipfwadm, to transparently proxy HTTP and other requests. Transproxy is written by John Saunders . 21.4.2. Iain's redirector package A redirector package from Iain Lea to allow Intranet (restricted) or Internet (full) access with URL deny and redirection for sites that are not deemed acceptable for a userbase all via a single proxy port. 21.4.3. Junkbusters Junkbusters Corp has a copyleft privacy-enhancing, ad-blocking proxy server which you can use in conjunction with Squid. 21.4.4. Squirm Squirm is a configurable, efficient redirector for Squid by Chris Foote . Features: o Very fast o Virtually no memory usage o It can re-read it's config files while running by sending it a HUP signal o Interactive test mode for checking new configs o Full regular expression matching and replacement o Config files for patterns and IP addresses. o If you mess up the config file, Squirm runs in Dodo Mode so your squid keeps working :-) 21.4.5. chpasswd.cgi Pedro L Orso has adapated the Apache's htpasswd <../htpasswd/> into a CGI program called chpasswd.cgi . 21.4.6. jesred jesred by Jens Elkner . 21.4.7. squidGuard squidGuard is a free (GPL), flexible and efficient filter and redirector program for squid. It lets you define multiple access rules with different restrictions for different user groups on a squid cache. squidGuard uses squid standard redirector interface. 21.4.8. Central Squid Server The Smart Neighbour (or 'Central Squid Server' - CSS) is a cut-down version of Squid without HTTP or object caching functionality. The CSS deals only with ICP messages. Instead of caching objects, the CSS records the availability of objects in each of its neighbour caches. Caches that have smart neighbours update each smart neighbour with the status of their cache by sending ICP_STORE_NOTIFY/ICP_RELEASE_NOTIFY messages upon storing/releasing an object from their cache. The CSS maintains an up to date 'object map' recording the availability of objects in its neighbouring caches. 21.5. Ident Servers For Windows NT , Windows 95/98 , and Unix . 22. DISKD 22.1. What is DISKD? DISKD refers to some features in Squid-2.4 to improve Disk I/O performance. The basic idea is that each cache_dir has its own diskd child process. The diskd process performs all disk I/O operations (open, close, read, write, unlink) for the cache_dir. Message queues are used to send requests and responses between the Squid and diskd processes. Shared memory is used for chunks of data to be read and written. 22.2. Does it perform better? Yes. We benchmarked Squid-2.4 with DISKD at the Second IRCache Bake- Off . The results are also described here . At the bakeoff, we got 160 req/sec with diskd. Without diskd, we'd have gotten about 40 req/sec. 22.3. What do I need to use it? 1. Squid-2.4 2. Your operating system must support message queues. 3. Your operating system must support shared memory. 22.4. If I use DISKD, do I have to wipe out my current cache? No. Diskd uses the same storage scheme as the standard "UFS" type. It only changes how I/O is performed. 22.5. How do I configure message queues? Most Unix operating systems have message queue support by default. One way to check is to see if you have an ipcs command. However, you will likely need to increase the message queue parameters for Squid. Message queue implementations normally have the following parameters: MSGMNB Maximum number of bytes in a single queue. MSGMNI Maximum number of message queue identifiers. MSGSEG Maximum number of message segments. MSGMAX Maximum size of a message segment. MSGTQL Maximum number of messages in the whole system. The messages between Squid and diskd are 32 bytes. Thus, MSGMAX should be 32 or greater. You may want to set it to a larger value, just to be safe. We'll have two queues for each cache_dir -- one in each direction. So, MSGMNI needs to be at least two times the number of cache_dir's. MSGMNB and MSGTQL affect how many messages can be in the queues at one time. I've found that 75 messages per queue is about the limit of decent performance. Thus, MSGMNB must be at least 75*MSGMAX, and MSGTQL must be at least 75 times the number of cache_dir's. 22.5.1. FreeBSD Your kernel must have options SYSVMSG You can set the parameters in the kernel as follows. This is just an example. Make sure the values are appropriate for your system: options MSGMNB=16384 # max # of bytes in a queue options MSGMNI=41 # number of message queue identifiers options MSGSEG=2049 # number of message segments options MSGSSZ=64 # size of a message segment options MSGTQL=512 # max messages in system 22.5.2. Digital Unix Message queue support seems to be in the kernel by default. Setting the options is as follows: options MSGMNB="8192" # max # bytes on queue options MSGMNI="31" # # of message queue identifiers options MSGMAX="2049" # max message size options MSGTQL="1024" # # of system message headers by Brenden Phillips If you have a newer version (DU64), then you can probably use sysconfig instead. To see what the current IPC settings are run # sysconfig -q ipc To change them make a file like this called ipc.stanza: ipc: msg-max = 2049 msg-mni = 31 msg-tql = 1024 msg-mnb = 8192 then run # sysconfigdb -a -f ipc.stanza You have to reboot for the change to take effect. 22.5.3. Linux In my limited browsing on Linux, I didn't see any way to change message queue parameters except to modify the include files and build a new kernel. On my system, the file is /usr/src/linux/include/linux/msg.h. 22.5.4. Solaris Refer to Demangling Message Queues in Sunworld Magazine. I don't think the above article really tells you how to set the parameters. You do it in /etc/system with lines like this: set msgsys:msginfo_msgmax=2049 set msgsys:msginfo_msgmnb=8192 set msgsys:msginfo_msgmni=31 set msgsys:msginfo_msgsz=64 set msgsys:msginfo_msgtql=1024 Of course, you must reboot whenever you modify /etc/system before changes take effect. 22.6. How do I configure shared memory? Shared memory uses a set of parameters similar to the ones for message queues. The Squid DISKD implementation uses one shared memory area for each cache_dir. Each shared memory area is about 800 kilobytes in size. You may need to modify your system's shared memory parameters: SHMSEG Maximum number of shared memory segments per process. SHMMNI Maximum number of shared memory segments for the whole system. SHMMAX Largest shared memory segment size allowed. SHMALL Total amount of shared memory that can be used. For Squid and DISKD, SHMMNI and SHMMNI must be greater than or equal to the number of cache_dir's that you have. SHMMAX must be at least 800 kilobytes. SHMALL must be at least SHMMAX 800 kilobytes multiplied by the number of cache_dir's. 22.6.1. FreeBSD Your kernel must have options SYSVSHM You can set the parameters in the kernel as follows. This is just an example. Make sure the values are appropriate for your system: options SHMSEG=16 # max shared mem id's per process options SHMMNI=32 # max shared mem id's per system options SHMMAX=2097152 # max shared memory segment size (bytes) options SHMALL=4096 # max amount of shared memory (pages) 22.6.2. Digital Unix Message queue support seems to be in the kernel by default. Setting the options is as follows: options SHMSEG="16" # max shared mem id's per process options SHMMNI="32" # max shared mem id's per system options SHMMAX="2097152" # max shared memory segment size (bytes) options SHMALL=4096 # max amount of shared memory (pages) by Brenden Phillips If you have a newer version (DU64), then you can probably use sysconfig instead. To see what the current IPC settings are run # sysconfig -q ipc To change them make a file like this called ipc.stanza: ipc: shm-seg = 16 shm-mni = 32 shm-max = 2097152 shm-all = 4096 then run # sysconfigdb -a -f ipc.stanza You have to reboot for the change to take effect. 22.6.3. Linux In my limited browsing on Linux, I didn't see any way to change shared memory parameters except to modify the include files and build a new kernel. On my system, the file is /usr/src/linux/include/asm- i386/shmparam.h Oh, it looks like you can change SHMMAX by writing the file /proc/sys/kernel/shmmax. 22.6.4. Solaris Refer to Shared memory uncovered in Sunworld Magazine. To set the values, you can put these lines in /etc/system: set shmsys:shminfo_shmmax=2097152 set shmsys:shminfo_shmmni=32 set shmsys:shminfo_shmseg=16 23. Terms and Definitions 23.1. Neighbor In Squid, neighbor usually means the same thing as peer. A neighbor cache is one that you have defined with the cache_host configuration option. Neighbor refers to either a parent or a sibling. In Harvest 1.4, neighbor referred to what Squid calls a sibling. That is, Harvest had parents and neighbors. For backward compatability, the term neighbor is still accepted in some Squid configuration options. $Id: FAQ.sgml,v 1.89 2000/03/24 20:02:01 wessels Exp $