Definition of the country to the IP-address

At gathering statistics of visiting of web-pages the information on quantity{amount} of visitors from the different countries often is going to. As a rule, the country define{determine} on the domain of the first level. But such information not always corresponds{meets} to the validity, especial taking into account present tendeciju to use national domains co, tv not to destination. Besides how to be with general purpose domains net, org, com, etc.? With IP-addresses for which there are no recordings in reversnoj to a zone? Well and, at last, definition of a domain name takes away appreciable quantity{amount} of time.


The code resulted in clause{article} is distributed under the license in style Python, that is can be used for any (including commercial objectives) under condition of preservation of the remark on copyright Copyright © 2002, Denis S. Otkidach <ods@ods.pp.ru>.


The data on registration diappazonov IP-addresses are stored{kept} in databases whois. To give an opportunity of the public to analyze the traffic, RIPE NCC, ARIN and APNIC not less often, than once a month, do{make} reduced "pictures" of the databases. We also shall make local base of these data.


But all over again it is necessary effectively orgazovat` a data storage for diappazonov IP-addresses to provide to them fast access. For a basis vozmem BTree base BerkleyDB, access to which are provided with function btopen () from the standard module bsddb. As keys we shall use the beginning diappazona IP-addresses, and as values - his{its} end and the additional information. Keys and values in bsddb should be lines. Besides it is necessary to provide orderliness of keys. For this purpose very well approaches function inet_aton and inet_ntoa from the module socket.



from bsddb import btopen

from socket import inet_aton, inet_ntoa



class IPRangeDB:


def __ init __ (self, filename, mode ='r '):

self. __ db = btopen (filename, mode)


def close (self):

self. __ db.close ()


def _locate (self, ip):

db = self. __ db

try:

first, record = db.set_location (ip)

except KeyError:

try:

first, record = db.last ()

except KeyError:

raise KeyError (inet_ntoa (ip))

else:

if first! =ip:

first, record = db.previous ()

assert first <=ip

return first, record


def __ getitem __ (self, ip_str):

ip = inet_aton (ip_str)

first, record = self. _ locate (ip)

last = record [:4]

assert last> =first

if ip <=last:

return self.unpack (record [4:])

else:

raise KeyError (ip_str)


def add (self, first_str, last_str, info):

first = inet_aton (first_str)

last = inet_aton (last_str)

try:

db_first, record = self. _ locate (last)

except KeyError:

pass

else:

db_last = record [:4]

if first <=db_last:

raise ValueError (

' Range %s-% s intersects ' % (first_str, last_str) +

' with existing entry %s-% s' %

(inet_ntoa (db_first), inet_ntoa (db_last)))

self. __ db [first] = last+self.pack (info)


def pack (self, info):

return info


def unpack (self, info):

return info


The method _locate () searches for recording with the maximal bottom border, smaller or equal to the IP-address transferred{handed} as argument. The method __ getitem __ () allows to use copies of class IPRangeDB similarly to dictionaries: db [ip] will return the information about diappazone which includes ip address. Will use the interface of the dictionary for recording vrjadli good idea as recording is created one for all diappazona. To avoid mess, addition of recordings is realized through a method add (). And, at last, pair methods pack () and unpack () are determined, that the derivative class could be adopted easily for storage of the any information, the method pack () should transform object to a line.



>>> db = IPRangeDB ('test.db', 'c')

>>> db.add ('10.0.0.0', '10.255.255.255', ' our local area network ')

>>> print db ['10.1.2.3']

Our local area network

>>> print db ['123.45.67.89']

Traceback (most recent call last):

File "<stdin>", line 1, in?

File "ip2cc.py", line 38, in __ getitem __

raise KeyError (ip_str)

KeyError: 123.45.67.89


There was a business for small: to define{determine} methods for filling a database.



from urllib import urlopen

from xreadlines import xreadlines

from time import strptime

import struct



class CountryByIP (IPRangeDB):


sources = {

'arin': ('ftp://ftp.arin.net/pub/stats/arin/', ' arin. % Y%m%d '),

'ripencc': ('ftp://ftp.ripe.net/ripe/stats/', ' ripencc. % Y%m%d '),

'apnic': ('ftp://ftp.apnic.net/pub/stats/apnic/', ' apnic-% Y-of % m-% d ')

}


def fetch (self):

for name in self.sources:

fp = self. __ openRecent (name)



for line in xreadlines (fp):

parts = line.strip () .split ('|')

if len (parts) == 7 and parts [2] == 'ipv4' and

parts [6] in ('allocated', 'assigned') and

name == parts [0]:

first = parts [3]

first_int = struct.unpack ('! i ', inet_aton (first)) [0]

last_int = first_int+int (parts [4])-1

last = inet_ntoa (struct.pack ('! i ', last_int))

try:

self.add (first, last, parts [1] .upper ())

except ValueError:

pass


def __ openRecent (self, name):

uri, format = self.sources [name]

files = []

for line in xreadlines (urlopen (uri)):

file = line.split () [-1]

try:

dt = strptime (file, format)

except ValueError:

pass

else:

files.append ((dt, file))

files.sort ()

return urlopen (uri+files [-1 [1])


The method __ openRecent finds freshest "picture" and returns file object. Date of "picture" is defined{determined} by name a file on a pattern from the dictionary of sources sources. The method fetch analyzes the data, chooses necessary and adds in base. Use of the module xreadlines allows to analyze the data in process of receipt.


Now it is possible to fill base



>>> db = CountryByIP ('test.db', 'n')

>>> db.fetch ()


And to use



>>> from socket import gethostbyname

>>> db [gethostbyname ('python.org')]

'NL'


Transformation of code A2 to the name of the country under table ISO3166 let remain to you as exercise.