Monday, October 24, 2011

Free Software / Open Source Business Models

The 2nd Technical Conference of The Free Software / Open Source held on Sep 8-9, 2011 in Zanjan, Iran. I had a conference about Free Software and Open Source Business Models. I've uploaded my presentation on I hope this help newbies to understand Open Source business models.

Download Free Software / Open Source Business Models (PDF, 2.3 MB)

Monday, March 14, 2011

How to find IP locations using MySQL

Here we learn how to use MySQL for locating an IP address. At first we create a table that contains the name of the country and its IP address range. Then define a function to find country name of IP Address.

What's Ip Address?

Quoting from the "IP address" page in Wikipedia:

An Internet Protocol address (IP address) is a usually numerical label assigned to each device (e.g., computer, printer) participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing. Its role has been characterized as follows: "A name indicates what we seek. An address indicates where it is. A route indicates how to get there."

Table (database)
I created the ip_location table to save countries and IP Address ranges. Enter the following command:
CREATE TABLE `ip_location` (
`from_ip` int(15) DEFAULT NULL,
`to_ip` int(15) DEFAULT NULL,
`country` varchar(32) DEFAULT NULL,
KEY `from_ip` (`from_ip`,`country`),
KEY `to_ip` (`to_ip`,`country`)

Then you need to import ip_location data. I exported my table with data via mysqldump. You should download it and use mysql command to restore it.
Download (165KB)

getIpCountry() function
Now you need to create the function to extract country name:
CREATE FUNCTION getIpCountry(ip varchar(15)) RETURNS varchar(64)
declare a tinyint unsigned;
declare b tinyint unsigned;
declare c tinyint unsigned;
declare d tinyint unsigned;
declare total bigint;
declare result varchar(64);
select substring_index(ip, '.', 1 ) into a;
select substring_index(substring_index(ip , '.', 2 ),'.',-1) into b;
select substring_index(substring_index(ip , '.', -2 ),'.',1) into c;
select substring_index(ip, '.', -1 ) into d;
set total := (a*256*256*256) + (b*256*256) + (c*256) + d;
select SQL_CACHE country into result from ip_location where total between from_ip and to_ip limit 1;
if (result is null) or (result = '') then
set result := 'unknown';
end if; 
return result;
And done! You just need to use select command in MySQL cli:
mysql> SELECT getIpCountry('');
| getIpCountry('') |
1 row in set (0.03 sec)

mysql> SELECT getIpCountry('');
| getIpCountry('') |
| UNITED STATES           |
1 row in set (0.00 sec)
Let me know if you have other way :)

Wednesday, January 19, 2011

How to get pure content from HTML page in Java via Regex

I've written a web crawler while I was developing a search engine a few weeks ago. It extracts the contents and saves them onto the database. The HTML tags aren't so important to most of the search engines. So, I removed them successfully. To do the same, follow below steps:
1- Remove the script tags and inclusive content:
// htmlContent is full content of page with HTML codes.

String content;
Pattern pattern;

pattern = Pattern.compile(".*?", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
content = pattern.matcher(htmlContent).replaceAll("");
Note: In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.

2- Remove the style tags and inclusive content:
String content;
Pattern pattern;

pattern = Pattern.compile(".*?", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
content = pattern.matcher(content).replaceAll("");

3- Remove all HTML tags without inclusive content.
pattern = Pattern.compile("<[^>]*>");
content = pattern.matcher(content).replaceAll("");
4- Replace new lines, tabs and multiple spaces with a single space.
content = content.replaceAll("\n+", " ");
content = content.replaceAll("\t+", " ");
content = content.replaceAll("(  )+", "");

And you have a pure content now :)

Regular expression
How to Write an HTML Parser in Java