Archive for the ‘Uncategorized’ Category
VIM Segfaulting, I don’t see that everyday
Haha!?

Just a little Test for Flattr!
no content here
test for facebook
is wordpress sharing working?
golang package cappedqueue
To quote the README.rst file:
cappedqueue
cappedqueue is a simple package to make sure you can submit to a queue. Losing items is on purpose so that you can rely on not accidentally blocking or filling up your memory
see the
cappedqueue_test.gofile for details on the usage…/serverhorror
The Importance of API “Transports”
SQL Databases are dead
There I said it. I jumped the NoSQL waggon and I refuse to use anything that is remotely uncool and old.
Of course the above is a lie. Well kind of. In reality I do think that the interface by which you communicate with a SQL database is dead (or rather should be, but so should FTP — another story — still there are a few people left using it).
I’m not really talking about the query language itself, I’m talking about the “transport” (lacking a better word).
So is (nearly) any other protocol
I’ll leave the kool new aid road now and talk about the transport stuff a bit.
My problem with API interfaces currently is that the usage of these interfaces is quite annoying. Let’s look at the products that people use all over the web.
Say you go to Facebook or Google and what to use their API to do something with the social networks represented by the data. On the API page of Facebook you’ll find some source package that implements a transport protocol, a domain specific language and to complicate things a bit further also a certain data representation.
Reality is that right now to use some service on the Internet there is only a single transport protocol. HTTP, maybe add SSL on top of that but the transport is HTTP. I’m not saying that the HTTP protocol is the ultimate protocol. It’s shortcomings are quite well known.
The point is that you can solve a whole set of problems all in one go and be done with it.
Back to the SQL Example
So you have this cool new project and decided to use some relational database system. Now you also decide to use a framework that relies on interactions with other systems being non-blocking. So you run around the web and look for some non-blocking driver for you RDBMS of choice.
Then you run around the web and look for some non-blocking driver for you storage part, then you run around the web…(you get the idea).
Now if all your satellite systems would be exposed oder HTTP you’d…run around the web and look for a non-blocking HTTP client and be done with this category of problems.
Just don’t use HTTP…please!
Please, please, please…don’t just use HTTP because I used it as an example here. A unified (or generalized) transport really needs some serious thinking.
Of course you can just run away now and scream what kind of idiot I am, but the next time you run around the web and and look for a driver because the transport of services isn’t generalized enough to be reusable.
Also please don’t confuse the transport with the represenation — which should be considered equally important and would solve another stack of problems.
And if you go that last route, that still doesn’t say that should be only a single language per domain, having a single language per domain would be not that good without wasting further thoughts about those last two statements.
/serverhorror
What is big data?
It’s actually quite simple: You have big data whenever a single host isn’t enough to either store or process your data.
What does it mean?
Suppose you have a Postgresql Database and you run into scaling problems. There’s a choice now, either get better hardware so that you can continue to work on a single host or split the database to span multiple hosts.
Suppose you have a file store and all disks are full. You can either buy larger disks or use some distributed storage system where you just add hosts to expand the total storage capacity.
In both cases you are dealing with big data.
Big data (for me) isn’t anything that says X MB of data. It’s simply the case when you need decide to use a distributed system to handle your data.
Everything is a fucking event stream!
I got quite fed up with monitoring systems lately. I do not see a need for such systems any more, and I hate to introduce yet another component that does not actually add any value!
No go away and turn around and read something else if you think this is utter bullshit, but recently I took some time to think about what a monitoring system actually does.
In my opinion a monitoring system is nothing more that a logging system. You generate some log message (which actually isn’t a log message) and send it off to a central host (possibly with several level of intermediate hosts) just to fire off a notification your log message.
So what do we have we have a log stream. But since I don’t actually really care about log messages let’s generalize a bit and call everything an event!
Everything is a fucking event stream!
I do mean everything!
Let’s begin with syslog (yup classic syslog, no bells and whistles). Have you ever thought about what the separation marker for a syslog event is? I don’t know for sure what it is. For a long time I thought that the only true answer would be a newline.
Hey, who the hell wants to read Java stack traces in a log file or Python or any “multi line message”.
I do!
Of course I don’t want to use grep or less or anything to work throu those files. I want a tool that understands those messages. Better yet I want a tool that is usable and that let’s me define reusable rules to tag log messages events.
Think of Nagios:
You write some plug-in, the plug-in has a log, the plug-in reports to stdout for Nagios messages, the plug-in writes to stderr to tell about errors, there are different levels of verbosity to debug, it has at least 2 different return values
Why on earth is there:
A stream of plugin results (usually exit codes), a stream of messages on stdout (meta results?), a stream of log messages (meta2 results?), a stream of messages for stderr (is that logging, monitoring, meta3 results, ignorable?)
On top of that all those kinds of messages are incompatible. There’s no such thing as structured logging!
Please, please just let me send everything to some remote place where it will be persisted and I have a central view on all my events.
Add as much meta data as possible.
source host, receiving host (or hosts if there were several in between), reception times, timestamps — and please: do add fractions of a second, host names and ip addresses, a possibility to extend the amount of meta data
Now when I have a place where I can look at all my events, then and only then I want to make decisions about what I’m interested in. I’m nowhere near deciding on whether this any of this is an alert.
After the fact tagging
Nagios, Icinga, Zabbix all force me to make something up. Some test, a probe or whatnot where I have to write some script (Don’t get me wrong: I like scripting — scripting takes away those repetitive tasks I hate!) and make up arbitrary values that represent a certain level of goodness or badness. OK, CRITICAL, WARNING?
WTF? Who said I need three of them?
Just let me define some criteria that will match events. Please note that I am not restricting the criteria to regular expressions. Something like “Has the meta data field X”, “Does not have meta data field Y”, “After January 1st 1992 but not before May 3rd 1982”, “Only between 13:00h and and 13:15h when the load was higher than 3 on systems with with only 2 cores” and so on. Those are equally important.
Uh, oh! I just defined some plug-ins, now I’m back at monitoring! No I’m not!
I ran a cron job (in the simplest case) that generated an event which told me the load, and another cron job that told me the number of cores in the system. This event was sent to my event sink for later processing!
I need to be able to save those criteria collections as filters/view or whatever you want to call them and I need to be able to name those things so that I can find them later on. I simply want to label my events.
I need to be able to attach as many labels to events as I see fit. Also I need the ability to find unlabeled events.
Which brings us to alerts!
So now that I know what’s interesting and gives me the ability to make educated decisions about what’s actually interesting I can decide on when it’s worth to raise something that will wake me up in the middle of the night.
I do want to be able to generate alerts and send them of to some other system…
Hey look! Another event stream!
…and then I’d rather not want to specify when things go bad. I’d rather would like to specify when things are good. Everything else is just badness enumeration.
I’d rather triple the amount of time I invest in such a system than to create yet another monitoring system that doesn’t use what’s already there.
But today’s alerts aren’t worth anything tomorrow!
Any system that silently throws away data is useless (I’m looking at you Munin and friends). I’m not saying RRDtool is a bad thing. I love it, the problem is how people use it.
Throw away data? Come on, who wants this? I do want to the finest possible resolution, we have Hadoop, GlusterFS, Ceph. Storage is something that shouldn’t be in the way. I’d rather have only 7 days of data than a year of useless junk.
Of course there’s trends over long periods of time but those shouldn’t be the default, those should be something that are added on top of existing data!
How are today’s alerts helpful if I can’t possibly tell what happened yesterday between 15:03h and 15:07h?
Yes this actually is basic stuff
But it’s just a syslog server and some scripts!
Yup you are absolutely right, and that’s the reason why companies like Splunk and Loggly make money right? Because anyone just has stuff like that. It’s a default, no more to-dos, nothing to see just go along!
Ah so you don’t actually have it? Neither do I. But I’d love to! Please someone skilled create such a system and make it open source!
On top of that: make it near real time!
/serverhorror
Awk min/max/avg
#!/usr/bin/awk -f
BEGIN
{
minimum=0;
maximum=0;
sum=0;
}
{
if($3>maximum)
{
maximum=$3;
}
if($3<minimum)
{
minimum=$3;
}
sum+=$3;
}
END
{
print "Average = ",sum/NR;
print "Max = ",maximum;
print "Min = ",minimum;
}
NON-RFC953 Hostname Service
- /etc/xinetd.d/hostname
service hostname {
disable = no
type = UNLISTED
id = hostname
port = 101
socket_type = stream
protocol = tcp
user = nobody
wait = no
server = /bin/hostname
server_args = -f
}
Monitoring Thoughts
Monitoring Thoughts
How would you scale monitoring and how would you ensure that with hundreds of thousands of events per minute you’ll still get the important ones?
A lot of stuff is missing here. This is merely a note how I think a scaling architecture for monitoring should look like. Also one should be able to do math on the events!
On Agents
RULE: events generated by Agents are stateless
- Run a monitoring agent on each node!
- Each agent performs a number of tasksThese are specifically called tasks since those are not necessarily checks. Also I associate checks with nagios checks. It’s not what we want to do!
A task does one thing, and one thing only:
- do not create tasks that are what NimSoft does (CMD – CPU/Memory/Disk)
- Each tasks generates an event
- Everything is an event!
- A successful task just a taks that ran without (programmatical errors)
- A failed task is something where a programmatical error occured!
- Create a JSON String from the event
- Submit the JSON string to some messaging middleware (preferrably RabbitMQ)
On Middleware
- Messagesmust be persistent
- It is safe to restart the server!
- What are just messages to the middleware are the guts of the system.Those are the events generated by agents
On Servers
There are 2 kinds of servers:
- PersistenceServersThese run somewhere in a rack. They will grab one event after another from a queue and store them in a safe place for later reference.
Once a Persistence Server grabbed an event from the queue it is no longer visible to other servers. Each event will reside on and exatly on Persistence Server.
- NotificationServersThese run either on physical serves in a rack and just grab one notification at a time from the messaging middleware.
- All notification server can retrieve all events.
- Notification Servers can subscribe to a certain subset of topics.
- There may be a lot of servers. We don’t want our monitoring failing
On Persistence Servers
- Subscribe to the global queue
- Start grabbing events
- Store the event on diskWhat exactly storing means is yet to be determined!
- Start over again
On Notification Servers
- Subscribe to the notification queue or a topic queue
- Start grabbing events
- Display the eventWhat exactly displaying means is yet to be determined!
- Start over again
