AltJ
(7 comments, 204 posts)
This user hasn't shared any profile information
Home page: http://altj.org
Posts by AltJ
OfficeMax coupon code
I got one of these in the mail and won’t be using it. It’s good for $30 off your online purchase of $150 or more. Expires 7/2/2011
JCV5P8J4BDJVT2ZG
Please post a comment here if you use it so that others won’t waste their time trying.
OfficeMax Coupon Code
I got one of these in the mail and won’t be using it. It’s good for $30 off your online purchase of $150 or more. Expires 2/19/2011
JCU5F5TCXC95ED56
Please post a comment here if you use it so that others won’t waste their time trying to.
Error/Fix: Disable Drag to Top to Maximize in KDE
Error:
Dragging a window in KDE to the top of the screen maximizes the window (this is another personal preference
Fix:
Go to System Settings, Desktop, Screen Edges and uncheck “Maximize windows by dragging them to the top of the screen”, then click “Apply”
Error/Fix: changing default editor to vi from nano
Error:
Default editor for common commands is nano (yes, this is an error in my eyes)
Fix:
sudo update-alternatives --config editor
Error/Fix: Unable to open env file: /etc/default/locale: No such file or directory
Error:
Unable to open env file: /etc/default/locale: No such file or directory
Fix:
update-locale
OfficeMax coupon code
I got one of these in the mail and won’t be using it. It’s good for $30 off your online purchase of $150 or more. Expires 1/08/2011
JCU4B7RHLKJ3UZAK
Please post a comment here if you use it so that others won’t waste their time trying to.
How I installed Kubuntu 10.04 (Lucid Lynx) on an HP Mini 110 with full disk encryption
Get a USB drive and make a boot disk from the kubuntu 10.04 CD image as detailed here: https://help.ubuntu.com/community/Installation/FromUSBStick
Be sure to select persistent storage, “When starting up from this disk, documents and settings will be: ‘Stored in reserved extra space’” I allocated approx 1GB of space for this (You’ll need more than the default 128MB.)
Use the ethernet connection on the netbook to connect it to the internet. Power it on, and as soon as you see the first screen, hit <esc>. Select your USB drive as the boot device.
Once booted, run:
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install cryptsetup lvm2
then reboot the netbook (from USB again)
Create the following partitions (I prefer cfdisk to partition)
- /dev/sda1, 512MB, ext2 (primary partition, bootable)
- /dev/sda2, remainder of space, (Pri/Log partition)
- /dev/sda5, entire logical partition, (Logical partition in sda2)
Setup your encrypted volumes:
cryptsetup -y --cipher aes-xts-plain --key-size 512 luksFormat /dev/sda5
cryptsetup luksOpen /dev/sda5 pvcrypt
pvcreate /dev/mapper/pvcrypt
vgcreate laptop-vg /dev/mapper/pvcrypt
lvcreate -n swap -L 3G laptop-vg
lvcreate -n root -l 100%FREE laptop-vg
mkswap /dev/mapper/laptop-vg-swap
mkfs.ext3 /dev/mapper/laptop-vg-root
Start the installer (from the icon on the desktop) and choose to setup the partitions manually:
- set /dev/sda1 to be /boot
- set /dev/mapper/laptop-vg-root to be /
- set /dev/mapper/laptop-vg-swap to be swap space
After the install is complete, do the following before rebooting
mkdir /mnt/newroot
mount /dev/mapper/laptop--vg-root /mnt/newroot
mount /dev/sda1 /mnt/newroot/boot
mount --bind /dev /mnt/newroot/dev
chroot /mnt/newroot
mount -t proc proc /proc
mount -t sysfs sys /sys
apt-get update
apt-get install cryptsetup lvm2
Then edit /etc/crypttab and add the following line to the end of the file:
pvcrypt /dev/sda5 none luks,retry=1,lvm=laptop-vg
Next, edit /etc/initramfs-tools/modules and add the following lines:
dm-crypt
aes-i586
xts
sha512_generic
ahci
Then run
update-initramfs -u
and reboot
In order to get my mic external speakers working (headphones worked fine), I had to:
apt-get install linux-backports-modules-alsa-lucid-generic
Why I prefer Ubuntu
Yet another draft post that I found recently. Yes, I’m still a fan of Ubuntu.
As many of my colleagues know, I’m a huge Debian fan. I love the stability and ease of package management. The primary thing I don’t like about Debian is it’s lack of a release schedule. Their attitude is, “We’ll release it when it’s ready.”
I don’t disagree with that attitude, Debian is a very impressive community-driven development project. If I was programming for such a project, I’d probably prefer the “release it when it’s ready” method.
Enter Ubuntu. Ubuntu is based on Debian, which means it shares its ease of package management. Ubuntu also has a release schedule. They plan to release a new version every 6 months, supporting it for 18 months (security patches, fixes for critical bugs that could cause data loss, and extra translations.) They also plan to have an Enterprise Release every 12 to 24 months (which will receive additional testing.) These Enterprise are supported for a longer period of time. The current LTS version is supported until 2015.
OCR processing millions of images on Amazon’s EC2
Note: I wrote this post nearly 2 years ago and recently discovered it in my drafts, some of the info is outdated.
Recently, I was tasked with running OCR on a huge set of images (3.4 million.) I’m going to post some brief details on how we processed these images in about a week.
Initially, we uploaded all of the images to S3 from a colocated server we have locally using s3sync. This took a long while (~1.5 TB of data.)
Once the images were all stored in S3, I retrieved all of the meta data and stored them in a MySQL database which was running on a small EC2 instance. This host became the queue manager.
Since some images were in non-English languages, I went through and specified the language (if it wasn’t english) in the database.
I wrote a simple perl script which would:
- retrieve the next image to be processed from the queue manager
- retrieve the corresponding image from S3
- run OCR on the image (with language option if it wasn’t in English)
- store the OCR output in the database on the queue manager and mark the image as processed
For cost reasons (and because the OCR output was adequate) we used tesseract to process the images. It did a good job (depending on the image quality) and handled foreign languages very well.
To ensure we were getting the most bang for our buck, I whipped up a hack-of-a-script to keep at least 8 processes running on each server. The OCR processing instances were High-CPU x-large servers.
From there, I dumped the contents of the database and handed them off to our indexing expert. Some of the content is currently posted on www.worldvitalrecords.com and the rest is in the works.
Lessons learned (things I would have done different)
- Find a way to send the hard drives which contained the images to Amazon, instead of uploading the entire 1.5TB from our datacenter. I’ve heard rumors of them doing this for large datasets, but have not verified.
- Create a better script to manage running jobs (I’d probably use a multi-threaded perl script)
- Start processing images as soon as they were successfully uploaded. For simplicity’s sake, I uploaded all of the images, then processed them all at the same time.
- Get increased allocations for resources ahead of time. I started out with a 100 instance limit on EC2 and quickly saturated that limit. Around the middle of the week, I was able to finally get that limit increased to 200 instances.
- I’d consider using Amazon’s SimpleDB and Simple Queue Service in leiu of the MySQL database I used for the queue manager and for storing the OCR output.
Migrate from Drupal 6 to WordPress 3
Since I just made the switch back, here’s the script that made it easy…
http://blog.room34.com/archives/4530
Be sure to read the script to edit or comment out sections so that they will meet the needs of your site. the only real problem I’m seeing so far is URLs aren’t always coming up as links in the posts. I may have to do some manual editing.
It was less than 2 years ago that I switched from WordPress to Drupal, why did I switch back? I’m lazy:
- I already have multiple websites that I host on this server that run WordPress. It’s easy to throw it into the mix and keep it up to date.
- I don’t anticipate ever doing anything much more than a blog here (that was one of my big reasons for switching to Drupal before, I was planning on doing a bunch of non-blog stuff on this site and others running Drupal)
- WordPress “just works” for blog sites. It’s brain dead easy to setup compared to Drupal and I like the edit/compose interface.