<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/include/net/inet_hashtables.h, branch linux-3.0.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-3.0.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-3.0.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2010-10-21T11:06:43Z</updated>
<entry>
<title>tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()</title>
<updated>2010-10-21T11:06:43Z</updated>
<author>
<name>Balazs Scheidler</name>
<email>bazsi@balabit.hu</email>
</author>
<published>2010-10-21T11:06:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=093d282321daeb19c107e5f1f16d7f68484f3ade'/>
<id>urn:sha1:093d282321daeb19c107e5f1f16d7f68484f3ade</id>
<content type='text'>
When __inet_inherit_port() is called on a tproxy connection the wrong locks are
held for the inet_bind_bucket it is added to. __inet_inherit_port() made an
implicit assumption that the listener's port number (and thus its bind bucket).
Unfortunately, if you're using the TPROXY target to redirect skbs to a
transparent proxy that assumption is not true anymore and things break.

This patch adds code to __inet_inherit_port() so that it can handle this case
by looking up or creating a new bind bucket for the child socket and updates
callers of __inet_inherit_port() to gracefully handle __inet_inherit_port()
failing.

Reported by and original patch from Stephen Buck &lt;stephen.buck@exinda.com&gt;.
See http://marc.info/?t=128169268200001&amp;r=1&amp;w=2 for the original discussion.

Signed-off-by: KOVACS Krisztian &lt;hidden@balabit.hu&gt;
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
</content>
</entry>
<entry>
<title>tcp: Fix a connect() race with timewait sockets</title>
<updated>2009-12-09T04:17:51Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-12-04T03:46:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9327f7053e3993c125944fdb137a0618319ef2a0'/>
<id>urn:sha1:9327f7053e3993c125944fdb137a0618319ef2a0</id>
<content type='text'>
First patch changes __inet_hash_nolisten() and __inet6_hash()
to get a timewait parameter to be able to unhash it from ehash
at same time the new socket is inserted in hash.

This makes sure timewait socket wont be found by a concurrent
writer in __inet_check_established()

Reported-by: kapil dakhane &lt;kdakhane@gmail.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: fix inet_bind_bucket_for_each</title>
<updated>2009-11-14T04:46:56Z</updated>
<author>
<name>Lucian Adrian Grijincu</name>
<email>lgrijincu@ixiacom.com</email>
</author>
<published>2009-11-12T05:07:26Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5256f2ef3a40d784b8369035bff3f4dc637a9801'/>
<id>urn:sha1:5256f2ef3a40d784b8369035bff3f4dc637a9801</id>
<content type='text'>
The first "node" is supposed to be the cursor used in the for_each.

The second "node" is ment literally and should not be macro expanded:
it's the name of the hlist_node field from the inet_bind_bucket.

This currently works because when inet_bind_bucket_for_each is called
it's argument is still "node".

Signed-off-by: Lucian Adrian Grijincu &lt;lgrijincu@ixiacom.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: rename some inet_sock fields</title>
<updated>2009-10-19T01:52:53Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-10-15T06:30:45Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c720c7e8383aff1cb219bddf474ed89d850336e3'/>
<id>urn:sha1:c720c7e8383aff1cb219bddf474ed89d850336e3</id>
<content type='text'>
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: replace ehash_size by ehash_mask</title>
<updated>2009-10-13T10:44:02Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-10-09T00:16:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f373b53b5fe67aa4a6f28f921a529cc90f88e79b'/>
<id>urn:sha1:f373b53b5fe67aa4a6f28f921a529cc90f88e79b</id>
<content type='text'>
Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: skb-&gt;dst accessors</title>
<updated>2009-06-03T09:51:04Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-06-02T05:19:30Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=adf30907d63893e4208dfe3f5c88ae12bc2f25d5'/>
<id>urn:sha1:adf30907d63893e4208dfe3f5c88ae12bc2f25d5</id>
<content type='text'>
Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb-&gt;dst)
skb-&gt;dst = NULL;

Delete skb-&gt;dst field

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: move bsockets outside of read only beginning of struct inet_hashinfo</title>
<updated>2009-02-01T20:31:33Z</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2009-02-01T20:31:33Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=24dd1fa184595ff095a92de807fdf029b2632673'/>
<id>urn:sha1:24dd1fa184595ff095a92de807fdf029b2632673</id>
<content type='text'>
And switch bsockets to atomic_t since it might be changed in parallel.

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6</title>
<updated>2009-01-30T22:31:07Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2009-01-30T22:31:07Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=05bee4737774881e027bfd9a8b5c40a7d68f6325'/>
<id>urn:sha1:05bee4737774881e027bfd9a8b5c40a7d68f6325</id>
<content type='text'>
Conflicts:
	drivers/net/e1000/e1000_main.c
</content>
</entry>
<entry>
<title>net: wrong test in inet_ehash_locks_alloc()</title>
<updated>2009-01-28T01:45:10Z</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2009-01-28T01:45:10Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=94cd3e6cbebf85903b4d53ed2147bdb4c6e08625'/>
<id>urn:sha1:94cd3e6cbebf85903b4d53ed2147bdb4c6e08625</id>
<content type='text'>
In commit 9db66bdcc83749affe61c61eb8ff3cf08f42afec (net: convert
TCP/DCCP ehash rwlocks to spinlocks), I forgot to change one
occurrence of rwlock_t to spinlock_t

I believe sizeof(raw_spinlock_t) might be &gt; 0 on !CONFIG_SMP if
CONFIG_DEBUG_SPINLOCK while sizeof(raw_rwlock_t) should be 0 in this
case.

Fortunatly, CONFIG_DEBUG_SPINLOCK adds fields to both spinlock_t and
rwlock_t, but at this might change in the future (being able to debug
spinlocks but not rwlocks for example), better to be safe.

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: Allowing more than 64k connections and heavily optimize bind(0) time.</title>
<updated>2009-01-21T22:34:31Z</updated>
<author>
<name>Evgeniy Polyakov</name>
<email>zbr@ioremap.net</email>
</author>
<published>2009-01-20T00:46:02Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a9d8f9110d7e953c2f2b521087a4179677843c2a'/>
<id>urn:sha1:a9d8f9110d7e953c2f2b521087a4179677843c2a</id>
<content type='text'>
With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.

Signed-off-by: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Tested-by: Denys Fedoryschenko &lt;denys@visp.net.lb&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
